Understanding Biostatistics

The 2x2 Table

Most of biostatistics can be understood in the context of the 2x2 table. It is absolutely crucial to understand, and that plus a general definition of what you need to calculate will usually allow you to derive the formula. The 2x2 table generally has the "truth" along the top, in columns, with whether or not the person has the disease or whether or not they improved from a treatment. Along the side, in rows, is the exposure status or test result. This defines four categories of people. For the example of a screening test, there are people who have the disease and tested positive (true positives, box A), who don't have the disease but tested positive (false positives or type I error, box B), who have the disease but tested negative (false negatives or type II error, box C), and who don't have the disease and tested negative (true negatives, box D). Marginals, or the total in a row or column, can be found by adding across a row or down a column, and the total number of people observed is found by adding together the marginals from either the rows or columns.

Example Problem

We can incorporate the concepts discussed above by considering this problem: 1% of a population of 10,000 people have disease X. If the sensitivity of a screening test for disease X is 95% and the specificity is 80%, what is the positive predictive value?
The easiest way to work out the problem is to draw out a 2x2 table. The total population is 10,000 and 1% have the disease, so the + disease marginal is 100, and the - disease column marginal is 9,900 (everyone else). The sensitivity is 95%, meaning we will detect 95 out of 100 of the people with the disease, so 95 goes in box A, and the 5 we missed (false negatives) go in box C. Specificity is 80%, meaning that we will correctly get a negative result in 80% of the 9,900 without disease - alternately, we will have 20% false positives. This means that 7920 people go in box D, true negatives, and 1980 go in box B, false positives.
We have now filled in the 2x2 square and can calculate the PPV. Remember, PPV is the likelihood that someone who tests positive actually has the disease. That is the number of people in box A, true positives, over the total number of people who tested positive, the marginal for the positive test row. 95/(1980+95)=4.6%. This means that less than 5% of people with a positive test result actually have the disease. Surprised? These numbers are still very good for a screening test - it detects 19/20 people with the disease, and the false positives can hopefully be weeded out with (more expensive) diagnostic tests and procedures. Other factors like the cost of the test, progression and potential treatments of the disease would determine whether the test was overall useful as a screening test.

For Risk Factors and Treatments

  • Relative Risk = [A/(A+B)]/[C/(C+D)] = [A•(C+D)]/[C•(A+B)]
    • Rate of disease among the exposed / rate of disease among the unexposed
    • Significance is in figuring out to what extent exposure to a factor increases or decreases risk of a disease.
  • Odds Ratio = (A/C)/(B/D) = AD/BC
    • Used in case control studies as an estimate for relative risk, because the "prevalence" of a disease in a case-control study is set by the researchers
    • Odds Ratio is also important for understanding how bias will affect study results (especially of case-control studies).
    • Figure out which cells the bias would increase or decrease and plug them into this formula to figure out whether the result would increase or decrease the odds ratio found.
      • For example, a higher rate of patients remembering an exposure in the case vs control groups would shift patients from cell C to cell A, making the odds ratio higher than it should be. (This is an example of recall bias)
  • Attributable Risk/Benefit = A/(A+B) – C/(C+D)
    • Rate of disease among the exposed – rate of disease among the unexposed

For Screening Tests

  • Sensitivity = A/(A+C)
    • How often the test detects a disease when it is present
  • Specificity = D/(D+B)
    • How often a test detects the absence of a disease when it is absent
  • Positive Predictive Value = A/(A+B)
    • How often individuals w/ a positive test truly have the disease
    • Higher prevalence elevates PPV, lower prevalence lowers PPV
  • Negative Predictive Value = D/(D+C)
    • How often individuals w/ a negative test truly don't have the disease
    • Lower prevalence elevates NPV, higher prevalence lowers NPV
  • Positive Likelihood Ratio
    • Sensitivity/1-Specificity (or Sensitivity/False Negatives)
    • A measure of the overall accuracy of a diagnostic test (higher number = better test)
    • PLR can be multiplied by the pre-test odds (not probability) to determine the post-test odds - that is, how much more or less likely the patient is to have the disease given the test result.

For Errors in Statistical Studies


Editable Tables