M2M+Biostatistics+and+Epidemiology+LOs

toc =Measures of disease in populations=
 * Learn definitions and uses of measures of rates and prevalence.
 * Learn basic concepts of population stratification and sampling.
 * Learn basic concepts of association.
 * Learn the meaning of the following terms:
 * **Case fatality** = (number of people who die of a disease) / (total number of people with that disease)
 * also = (Mortality) / (Incidence)
 * Survival rate = 1 – Case Fatality
 * Measures disease prognosis rather than a population’s risk of dying from the disease
 * **Cumulative Incidence** = (number of new cases of a disease in a specified period of time) / (number of people initially at risk)
 * a.k.a. attack rate, risk of disease, probability of getting disease
 * Proportion of a group that develops a disease over a defined period of time
 * Use incidence to get an idea of what’s happening with the evolution of new diseases.
 * e.g. Incidence of heart disease in men aged 50-60 every year is 1/1000
 * **Incidence rate** = (# of new cases of a disease in specified time period) / (total amount of person-time at risk)
 * a.k.a. incidence density, hazard rate
 * Recognizes fact that disease risk is not static throughout a lifetime. End up with rates that look a lot lower, but accounts for the time of risk.
 * Rarely used in literature.
 * "person-time" = sum of observation periods at risk for all people in group (person-years)
 * **Mortality rate** = (number of people dying from a disease in a specified time period) / (average number of people alive during that period of time)
 * Mortality = Incidence * Case fatality
 * **Neonatal Mortality**: (Annual number of deaths in first 28 days of life) / (Annual number of live births)
 * When incidence and case fatality are stable over time
 * “Risk of dying from a specific disease”
 * Denominator is all people- regardless of whether they have the disease or not.
 * e.g. Mortality Rate for heart disease in CO is 1/1000 per year. Everyone in CO’s risk of dying from heart disease in the next year.
 * **Prevalence**
 * (# of existing cases of a disease in specified time period) / (total number in group or population)
 * New AND Existing diseases. Provides snapshot of existing situation.
 * Important for planning and allocation of resources
 * **Point prevalence** = (number of persons with a disease) / (total number in a group)
 * Snapshot, prevalence at a single point in time
 * Ex. How many people in the room currently have a cold? - 6 people, out of a room of 160. So, point prevalence is 6/160.
 * **Period prevalence** = prevalence measured over a specific time interval
 * Ex. Over the last 12 months, how many people in the room have had a cold? - 140 people, out of a room of 160. So, period prevalence is 140/160.
 * **Proportionate mortality** = (number of deaths due to a disease in a specified time period) / (total number of deaths during that time period)
 * e.g. Out of 100 deaths to women this year, how many of those will be due to breast cancer? (3.9%, so out of 100 deaths, 3.9 will be from breast cancer)
 * But, be careful when using this, as it can be misleading if other mortality rates for a population are high or low. Can’t use proportionate mortality to predict disease risk: Ex. Proportional Mortality of Hodgkin’s Disease was found to be higher in teachers than in the general population, but that was simply because there were more deaths in the general population from other things, such as motor vehicle accidents, cancer, etc.
 * **Years of Potential Life Lost**
 * (average # of years lived for a population) – (age when that individual died)
 * Takes into account the age that people died. Measures premature mortality.
 * Can be a public policy tool to decide where to direct resources to minimize YPLL

=Study designs in the medical literature=
 * Explain correlational/ecological studies, cross-sectional studies, case-control (retrospective) studies, cohort (prospective) studies, interventional studies.
 * Learn the meaning of the following terms:
 * **Case-control study**
 * Compare sample of cases with sample of controls.
 * Susceptible to sampling and measurement bias.
 * Have two groups: one with individuals who have the disease, and one with individuals who don’t have the disease. Then ascertain their past exposure.
 * Can’t measure risk, because the design of it makes it so both groups have the same number of individuals (Case-controls just try to determine the association, not the risk). Estimate the risk using Odds Ratio.
 * Advantages: quick, inexpensive, easy, only practical approach for rare diseases
 * Disadvantages: recall bias, no direct measure of change in characteristics after disease has developed (e.g. change in diet).
 * Ex. An investigator examines a group of women with heart attacks (the “cases”) and compares them with a group of healthy women (the “controls”), asking about hormone use.
 * Ex. A study comparing patients with manic-depressive illness with patients with schizophrenia who are in a mental hospital, to determine which group more often showed behavior problems before the onset of illness, as recorded in their school records.
 * **Correlational/ecological study**
 * Compare group characteristics (not individual characteristics).
 * Can measure relative risk or attributable risk.
 * Not all that valid usually because groups have individuals with unique characteristics that might confound the data. So, the individuals with the exposure aren’t necessarily the ones with the outcome. This is called **Ecological Fallacy**.
 * Ex. Look at different countries and their diets and look at death rates from cancer. –This is weak, since death rates from cancer could be due to many other factors other than diet: tobacco smoking, environmental exposures, etc.
 * **Cross-sectional study**
 * Examine the relationship between diseases and other characteristics or variables as they exist in a defined population at a particular time.
 * Studies of individuals
 * Usually done through surveys
 * Don’t study new disease, just study disease that is prevalent.
 * Limitations on cause and effect inferences. Difficult to know which came first, the exposure or the outcome.
 * Can measure relative risk or attributable risk
 * Ex. An investigator examines a group of women once, observing the prevalence of a history of heart attacks in hormone users and non-users.
 * Ex. A telephone survey to be conducted next summer to determine whether peptic ulcer is then more frequent in unemployed or in employed adults.
 * **Ecological fallacy**
 * See Correlational/ecological study above
 * **Prospective/Cohort study**
 * Classify them at a period of time when none of them have a disease (Exposed vs. Not Exposed), and then follow them over time to see who gets the disease. (Measure risk factors now, then follow them forward for a period of time.)
 * Can directly measure risk and relative risk or attributable risk.
 * Can demonstrate time sequence between exposure and disease.
 * No bias in sampling and reporting
 * Doesn’t work well for rare diseases (since you would have to follow 100s of thousands of people).
 * Doesn’t work well for diseases with long latencies (since you would have to follow the individuals for that time, which might outlive the investigators).
 * Just because they are called “Prospective” doesn’t mean you can’t have a Retrospective Cohort Study. In a retrospective cohort, the cohort is defined and characterized in the past, based on data already recorded, and followed up toward the present to some cutoff time.
 * Ex. An investigator examines a cohort of women yearly for several years, observing the incidence of heart attacks in hormone users and non-users.
 * Ex. A follow-up study to determine whether 7th grade girls whose parents give permission for them to attend sex ed lectures will have a lower rate of teen pregnancy in the next 5 years than girls whose parents refuse.
 * **Randomized controlled trial**
 * Steps:
 * 1. Define hypothesis
 * 2. Select study subjects: Strict inclusion and exclusion criteria.
 * 3. Randomly allocate subjects to intervention groups - Have “allocation concealment” (blinding the investigator until the absolute last minute).
 * 4. Follow and ascertain all relevant outcomes (benefits AND harms)
 * Double-blind whenever possible (Prevents investigator bias)
 * Strongest study design of all. Our best approach to proving cause. Maximizes internal validity- can confidently assign the result that happened to the intervention. Assures random distribution of confounders.
 * In general, lengthy and very expensive. Also, blinding can be difficult.
 * Ex. An investigator randomly assigns women to receive hormone or identical placebo, then follows both groups for several years to observe the incidence of heart attacks
 * **Retrospective study**
 * See Case Control above
 * **Quasi-Experimental Study**
 * Researcher can’t assign interventions randomly
 * Have additional sources of bias (especially selection bias)
 * Ex. Have Group A that you do an intervention to, Have Group B (that you try to make look like Group A) that you don’t do an intervention to.

=Observational studies=
 * Explain situations in which each study type is most suitable.
 * Explain the various measures of association for each study type.
 * Explain the limitations of each study type as it is used in the medical literature
 * Learn the meaning of the following terms:
 * **Attributable risk**
 * See M2M 2x2 Tables for an illustration
 * (rate of disease among those exposed to factor) - (rate of disease among those not exposed to factor)
 * Risk difference between exposed and non-exposed. Has units.
 * Tells how much of the disease is due to a factor.
 * If there is no difference, attributable risk is 0.
 * e.g. Cancer incidence rate per 100,000 is 700 for smokers and 200 for non-smokers. Attributable risk is (700/100,000) – (200/100,000) = 500/100,000. Means that 500 of the 100,000 incidents of cancer are due to smoking.
 * **Number needed to treat (NNT)** = 1/Attributable Risk
 * You’d have to treat NNT people to gain one additional outcome
 * e.g. # of women that we need to screen with mammography in order to prevent one death from breast cancer is 700
 * **Bias** (sampling, measurement)
 * A systematic deviation of study measurement, results or inferences from the truth (anything that drags away from truth)
 * **Sampling/Selection Bias**
 * Happens when take samples in a non-random way.
 * Always an issue in case-control studies.
 * People entering a trial are special (they are willing to be in the trial)
 * Results because of (usually) pre-existing factors in study subjects that influence their outcome independent of exposure.
 * Selecting hospital only cases could cause bias, since mainly the very sick and very poor are the ones who come to the hospital
 * Also, can’t make it so hard to get into the study that now the study population doesn’t look like that general population.
 * Ex. People who have unhealthy behaviors are less likely to take part in a survey about health (less likely to agree to be controls in case-control study). So, might have less smoking, say, in the control group and therefore we would erroneously think that smoking might somehow cause a disease.
 * **Measurement Bias**
 * Leads to Information (Misclassification) Bias
 * e.g. If you ask mothers of babies with birth defects about their folate intake, they might lie out of guilt.
 * **Confounding**
 * There may be associations between a risk factor and a disease, but it may just be that there is a relationship between that factor and a confounder, and between that confounder and the disease.
 * e.g. There is a strong association between having gray hair and sudden death from heart disease. Confounder = age.
 * To determine whether or not something is a confounder, must be associated with the outcome of interest and with the exposure of interest, BUT not be a result of the exposure.
 * If pooled results falls outside of the range of stratum-specific results (each groups’ result individually), then there is confounding.
 * Get rid of confounding by matching groups for a certain confounder OR randomized assignment: flip a coin to assign individuals to groups (equally likely to have the confounder in each group) OR through multivariate adjustment (statistical adjustments).
 * **Odds ratio**
 * See M2M 2x2 Tables for an illustration
 * (# of diseased and exposed) x (# of non-diseased and non-exposed) / (# of non-diseased and exposed) x (# of diseased and non-exposed)
 * Estimator of relative risk that you get from a case-control study (since we can’t really compute rates of disease in case-control studies because we have pre-selected the numbers of cases and controls).
 * **Relative risk**
 * See M2M 2x2 Tables for an illustration
 * (rate of disease among those exposed to factor) / (rate of disease among those not unexposed to factor)
 * Ratio of two rates. Relative risk is unitless #.
 * Tells likelihood of getting disease if you have been exposed to a factor.
 * If there is no difference, relative risk is 1.
 * Similar to hazard ratio, except hazard ratio takes into account time that has passed. Odds ratios are relative risk estimates (approximations) that come out of case-control studies.
 * Ex. Cancer incidence rate per 100,000 is 700 for smokers and 200 for non-smokers. Relative risk is (700/100,000) / (200/100,000) = 3.5. Means you are 3.5 times more likely to get cancer if you are a smoker.
 * **Hazard Ratio**
 * (Risk of outcome for group A in study) / (Risk of outcome for group B in study)
 * Similar to relative risk or odds ratio, per unit time. Best guess of what the association is in certain studies.
 * e.g. Hazard Ratio of .74 on one drug, means you are 26% less likely to have the outcome with that drug than if you are in control group.
 * e.g. Two groups, one receives drug and one receives placebo. A hazard ratio for the drug group of .95 means that the drug group has a 5% decreased risk of having stroke recurrence compared to the placebo group.

=Interventional studies=
 * Explain methods used in randomized controlled trials.
 * Explain situations in which quasi-experimental methods are used.
 * Learn the meaning of the following terms:
 * **Blinding**
 * Neither subject nor experimenter knows who is in which group.
 * Can help to eliminate bias.
 * **Intent to treat analysis**
 * Must leave people in the group to which they were assigned, even if they cross over to another group (e.g. stop taking the drug)
 * This may underestimate the effect, but is considered necessary to protect the integrity of the study.
 * Quasi-experimental study
 * See above under Study Designs in the Medical Literature
 * Randomized controlled trial
 * See above under Study Designs in the Medical Literature
 * **Meta-Analysis**
 * Put a lot of studies together and average the results. Leads to Comparative Effectiveness Studies.

=Screening for disease=
 * Describe appropriate situations for disease screening.
 * Explain how screening differs from diagnosis.
 * Explain screening parameters.
 * Learn the meaning of the following terms:
 * **False positive**
 * Test says that you have the disease, but you actually don’t
 * **False negative**
 * Test says that you don’t have the disease, but you actually do
 * **Lead time bias**
 * People screened for a disease live “longer” with a disease, even though those who have not been screened could have had the disease and lived the same amount of time, they just didn’t know they had the disease.
 * **Length time bias**
 * In a screened group, we are more likely to find individuals with an indolent, more slowly-evolving disease. This is due to the fact that there is variation across people in how long a disease grows in an indolent phase (some diseases act faster than others). So, those detected by screening would be more likely to have slower-progressing disease regardless of whether it is detected by screening or not.
 * **Predictive values (positive and negative)**
 * See M2M 2x2 Tables for an illustration
 * If you have a test result, what is the likelihood that you have the disease (or don’t have the disease) contingent on the test result
 * **Positive**
 * (true positives) / (all positives)
 * Predicts how often individuals with positive tests actually have the disease. Can change based on the true prevalence of the disease in the group you are testing. (As the test is applied to populations with lower disease prevalence, positive predictive value decreases. Conversely, as disease prevalence increases, positive predictive value increases.)- see pg. 2 of 10.28.09 handout
 * **Negative**
 * (true negatives) / (all negatives)
 * Predicts how often individuals with negative tests are actually disease-free. Can change based on the true prevalence of the disease in the group you are testing. (As the test is applied to populations with lower disease prevalence, negative predictive value increases. Conversely, as disease prevalence increases, negative predictive value decreases.)- see pg. 2 of 10.28.09 handout

=Clinical decision making=
 * Explain measures used in clinical decision-making.
 * Explain measures of health outcomes.
 * Learn the meaning of the following terms:
 * **Decision analysis**
 * Used to assess the comparative outcomes of two or more procedures. Based on probability of outcomes based on the literature or estimates/assumptions.
 * Can be used to compare benefits and harms in deciding net benefit of an intervention.
 * Can be used to compare costs and cost effectiveness of different interventions.
 * **Sensitivity**
 * See M2M 2x2 Tables for an illustration
 * (true positives) / (total with disease)
 * Describes how often a screening test detects a disease when it is indeed present. (i.e. What proportion of those who truly have the disease are detected by the screening test?)
 * A fixed characteristic of the test- the prevalence of the disease in the population being tested doesn’t matter.
 * **Specificity**
 * See M2M 2x2 Tables for an illustration
 * (true negatives) / (total without disease)
 * Describes how often a screening test detects the absence of the disease when it is indeed absent.
 * A fixed characteristic of the test- the prevalence of the disease in the population being tested doesn’t matter.
 * **Likelihood ratio**
 * **Positive** Likelihood Ratio:
 * (Sensitivity) / (1 – Specificity)
 * **Negative** Likelihood Ratio:
 * (1 – Sensitivity) / (Specificity)
 * The likelihood that a given test result would be expected in a patient with the target disorder compared to the likelihood that the same result would be expected in a patient without the target disorder (measure of overall likelihood that the patient has the condition).
 * Almost always use Positive Likelihood Ratios
 * Used to assess how good a diagnostic test is
 * If Likelihood Ratio is greater than 1, the post-test probability will be higher than the pre-test probability (will help you make the diagnosis). If start out with at least a 30% pre-test probability, you can rule in the disease.
 * If Likelihood Ratio is less than 1, the post-test probability will be lower than the pre-test probability (will help you rule out a diagnosis)
 * **Pre-test likelihood/odds/probability**
 * The chance that a patient has a disease, given the presentation but before test is run (Ex. A 60 y.o. man with chest pain is more likely to have a heart attack than a 20 y.o. woman with chest pain).
 * (Probability) / (1 – Probability) = Odds
 * (Odds) / (Odds + 1) = Probability
 * **Post-test likelihood/odds/probability**
 * The chance or certainty that a patient has a disease given the prior pre-test probability and the test result
 * Post-test odds = Pre-test odds x Likelihood Ratio

=Descriptive statistics=
 * Explain the concept of statistical uncertainty.
 * Explain statistical measures in common use in the medical literature.
 * Learn the meaning of the following terms:
 * **Correlation coefficient**
 * Correlation is the measure between two continuous variables.
 * Can range from –1 (negative correlation) to +1 (positive correlation). If something has no correlation, r = 0.
 * **Mean**
 * (SUM (x)) / (n)
 * The arithmetic average (sum of observations divided by number of observations). “Central Tendency”
 * **Median**
 * The middle observation in a data set arranged lowest to highest
 * **Normal distribution**
 * A distribution in which the mean is equal to median. 66.7% of all observations fall within 1 standard deviation of the mean. 95% of all observations fall within 2 standard deviations of the mean.
 * **Range**
 * Highest value minus lowest value
 * **Regression**
 * Statistical technique for measuring the relationship between variables
 * Linear regression most common
 * **Sampling**
 * Selection of observations to gain some knowledge of a statistical population
 * **Sample size** (n)
 * **Variance**
 * (SUM ((x-mean)^2)) / (n – 1)
 * Standardized measure of the sum of the differences between each value and the mean value. Measure of variability in a set of data (scatter). Average of the squared differences from the mean.
 * The greater the variability in the outcome variable among the subjects, the more likely it is that the values in the groups will overlap, and the more difficult it will be to demonstrate an overall difference between them.
 * **Standard deviation**
 * Square root of the variance.

=Statistical inference=
 * Explain hypothesis testing.
 * The **Null Hypothesis**, against which you test your data, is an important component of statistical analysis.
 * The assumption that there is no difference and no association between both groups in a study. (Always start with a null hypothesis, then use statistical data to decide whether to reject the null hypothesis or not).
 * Note that you never //accept// the null or alternative hypotheses - you //reject// or //fail to reject// the null hypothesis.
 * **Explain confidence intervals.**
 * A 95% confidence interval indicates that you are 95% certain that the sample mean lies within the upper and lower bounds.
 * Increasing the strength of the confidence (e.g. 95% --> 99%) will widen the upper and lower bounds, whereas decreasing the strength of the confidence (e.g. 95% --> 90%) will narrow the interval.
 * If the value of the null hypothesis lies outside the CI, then you reject the null hypothesis (which means there is a statistically significant difference between the groups)
 * If the value of the null hypothesis is contained within the confidence interval, then you fail to reject the null hypothesis (which means there is no statistically significant difference between the two groups)
 * Comparing two groups with a ratio, fail to reject the null hypothesis if the CI contains the value 1
 * Comparing two groups with a subtraction, fail to reject the null hypothesis if the CI contains the value 0
 * Usually associated with a p-value, which should be greater than 0.05 if the CI contains the null value.
 * Learn the meaning of the following terms:
 * **Causality**
 * Can only be proven by eliminating bias in sampling & measurement, confounding, and chance. Eliminate alternatives through research.
 * Exposure must proceed the outcome and must cause a lot of disease (should be strong, i.e. high relative risk, usually 2 or above).
 * Association should be consistent, and should be strongest when you expect it to be
 * Association should be biologically plausible, including supportive data from other sources
 * According to Koch-Henle, cause must be necessary (in order to have the disease, you must have been exposed to the factor) and sufficient (all you need to have the disease is the exposure, i.e. no one could have the disease if they weren’t exposed)
 * Note that this does not apply to many risk factors, such as smoking/lung cancer - There are people who do not smoke who get lung cancer (Not necessary in all cases), and people who smoke who do not get it (Not sufficient in all cases). Huntington's Disease, however, does fit - those with a gene for HD will get the disease, those without it won't.
 * **Chi-square statistic**
 * (SUM (observed – expected)^2) / (expected)
 * Used to compare categorical data
 * Eg. Determine whether St. Mary’s Hospital and Lakeview Medical Center have different proportions of patients with previous MI undergoing cardiac surgery. This is categorical- they either had a MI or they didn’t, and were at one hospital or the other.
 * **Confidence interval**
 * Range of values within which you are x% confident that the true value lies.
 * Likelihood that the true population parameter is within a certain interval.
 * e.g.”95% CI, 0.36 to 0.64” means that they are 95% confident that the true value lies between 0.36 and 0.64.
 * **P value**
 * See M2M 2x2 Tables for an illustration
 * The probability that chance alone caused the observed association.
 * 0.05 is commonly used as alpha, the "cutoff" p value below which we will reject the null hypothesis.
 * Since there is always a non-zero p value, there’s always a chance that chance alone could have caused the association, so there’s always a chance that we could wrongfully reject the null.
 * Alpha error/Type 1 error
 * Rejecting a null hypothesis when it is actually null.
 * **Power**
 * See M2M 2x2 Tables for an illustration
 * 1 – (beta)
 * Ability of study to see a true association if it is there. Reflects confidence in making inference without error.
 * Ex. If a study has a power of 80%, there is a 20% chance that they made a type 2 error.
 * Ex. If a study has a power of 50%, it has a 50-50 chance of detecting the association.
 * **Beta error/Type 2 error**
 * Not rejecting a null hypothesis when it actually is NOT null.
 * Want to have studies with beta of 0.2 or less (want studies to have at least 80% power)
 * **t test**
 * (difference) / (Standard Error of difference)
 * Tests for significant difference in means between two continuous variables.
 * Factors in # of subjects, size of difference, and variance.

=Variability and bias=
 * Review strengths and weaknesses of various study designs.
 * Review threats to validity from each type of study.
 * Validity, reliability
 * The "truth" of the study. The degree to which a measurement or study reaches a correct conclusion.
 * **Internal Validity**
 * The extent to which the results of an investigation reflect the true situation of the study population.
 * Improved by minimization of bias
 * **External Validity**
 * The extent to which the results of an investigation are applicable to other populations.
 * "Generalizability"

=Population health=
 * Understand current major disease trends and the metrics used to describe them.
 * Understand how epidemiologic parameters can be used to estimate the fraction of disease that is preventable in the population.
 * Learn the meaning of the following terms:
 * **Population attributable risk**
 * ((total incidence of disease) – (incidence of disease in unexposed)) / (total incidence of disease)
 * Proportion of cases of a disease that is due to a given factor. Proportion of disease that could be prevented by eliminating that risk factor (given that the relationship between the factor and the disease is indeed causal).
 * “Preventative Factor”
 * e.g. 80% of lung cancer is due to smoking means that if we didn’t have smoking, 80/100 lung cancer cases wouldn’t occur.
 * e.g. Studies have shown that mutations in a particular cancer gene account for 4% of all colon cancer. 4% is the population attributable risk %.