False Positive Paradox Calculator

Demonstrate the false positive paradox (base rate fallacy) with visual breakdowns, PPV-prevalence curves, retest analysis, and strategies for resolving the paradox.

⚠ The Paradox Is Active! With 0.1% prevalence, a positive test result is more likely to be WRONG (98.1% false) than correct (1.9% true). Despite 99% sensitivity and 95% specificity, most positive results are false positives.
PPV (Chance Test+ Is Correct)
1.94%
PARADOX: Less than 50%!
False Discovery Rate
98.06%
49,950 false positives out of 50,940
True Positives
990
out of 1,000 with condition
False Positives
49,950
out of 999,000 healthy
FP : TP Ratio
50.5 : 1
More false than true positives!
PPV After Retest
28.18%
If positive retested with same test

Visual Breakdown of Positive Results

FP: 49,950
True Positives (1.9%)False Positives (98.1%)

PPV vs. Prevalence

PrevalencePPVParadox?Bar
0.01%0.2%Yes
0.05%1.0%Yes
0.1%1.9%Yes
0.5%9.0%Yes
1%16.7%Yes
2%28.8%Yes
5%51.0%No
10%68.7%No
20%83.2%No
50%95.2%No

Resolving the Paradox

StrategyValueEffect
Current PPV1.94%Paradoxical — most positives are false
Retest if positive28.18%2nd positive test treated as new prior → higher PPV
Specificity needed for PPV ≥ 50%99.90%At current prevalence of 0.1%
Odds Ratio1,881.0Overall association strength
Planning notes, formulas, and examples

About the False Positive Paradox Calculator

The false positive paradox occurs when a test with excellent sensitivity and specificity produces more false positives than true positives — simply because the condition being tested for is rare. A 99% accurate test applied to a 0.1% prevalence population means that for every true positive, there are about 10 false positives. Most positive results are wrong.

This calculator demonstrates the paradox visually, showing the stark imbalance between true and false positives. It computes the Positive Predictive Value (PPV) at your specified prevalence, sweeps across prevalence levels to show exactly when the paradox kicks in, and models two resolution strategies: retesting positive results and computing the specificity needed to escape the paradox.

Understanding this paradox is critical for medical professionals, policy makers designing screening programs, data scientists building classifiers, and anyone interpreting the results of any binary test. The visual bar comparing true vs. false positives makes the paradox immediately intuitive.

When This Page Helps

The false positive paradox is one of the most important statistical concepts for public health, law, criminal justice, and data science — yet it's consistently misunderstood. This calculator makes the unintuitive result tangible by showing concrete numbers, visual proportions, and the trajectory across prevalence levels.

The retest analysis and specificity threshold features go beyond demonstration to show practical solutions. For policymakers evaluating screening programs, the PPV-prevalence curve reveals exactly where mass screening becomes cost-effective versus counterproductive.

How to Use the Inputs

  1. Enter the condition prevalence (how common the condition is in the tested population).
  2. Enter the test sensitivity (ability to detect true positives) and specificity (ability to detect true negatives).
  3. Adjust population size to see concrete numbers.
  4. Use presets for common paradox scenarios: rare disease, drug testing, breathalyzer, lie detector.
  5. Check the paradox warning banner — it appears when PPV drops below 50%.
  6. Review the PPV vs. Prevalence table to see the tipping point.
  7. Examine resolution strategies: retesting, required specificity, and odds ratios.
Formula used
PPV = (Sensitivity × Prevalence) / (Sensitivity × Prevalence + (1 − Specificity) × (1 − Prevalence)) Paradox condition: PPV < 50% when: (1 − Specificity) × (1 − Prevalence) > Sensitivity × Prevalence Retest PPV: uses PPV from first test as new prior probability Specificity needed for PPV ≥ 50%: Spec ≥ 1 − (Sensitivity × Prevalence) / (1 − Prevalence)

Example Calculation

Result: PPV = 1.96%, FP:TP ratio ≈ 50:1, PPV after retest = 28.5%

With 0.1% prevalence in 1,000,000 people: 1,000 truly affected, 999,000 healthy. The test finds 990 true positives but also flags 49,950 false positives. Of 50,940 total positive results, only 1.96% are genuine. Even retesting all positives only raises PPV to about 28.5%. The paradox is in full effect.

Tips & Best Practices

  • When prevalence is below 1%, even excellent tests (>99% accurate) can have PPV under 50%.
  • A two-step screen-then-confirm strategy dramatically raises PPV.
  • Target testing to high-risk subgroups raises effective prevalence and PPV.
  • Report test results with PPV context, not just sensitivity/specificity.
  • The paradox affects AI content detection, spam filters, and fraud detection equally.
  • Bayesian reasoning with natural frequencies (concrete numbers) is far more intuitive than percentages.

Historical Examples of the Paradox

In 2003, the U.S. Postal Service screened 5,000 workers for anthrax exposure after the 2001 attacks. No workers were actually infected, but screening produced hundreds of false positives, each requiring costly follow-up. The base rate of actual exposure was effectively zero, guaranteeing that every positive was false. Similar problems plague mass drug testing in workplaces with low drug use rates.

The Prosecutor's Fallacy

The false positive paradox is closely related to the prosecutor's fallacy in criminal law. If a DNA test has a 1 in 1,000,000 false match rate and is run against a database of 10,000,000 people, about 10 innocent people will match. The prosecutor arguing "this test is 99.9999% accurate" commits the fallacy of ignoring the base rate of true perpetrators in the database. The correct question is: given a match, what's the probability of guilt?

Implications for AI Detection

With the rise of large language models, AI content detectors face the same paradox. If 5% of student essays are AI-generated and a detector has 90% sensitivity and 95% specificity, only about 49% of flagged essays are actually AI-written. This means roughly half of accused students are innocent — a serious ethical problem that mirrors the medical screening paradox in an educational context.

Sources & Methodology

Last updated:

Frequently Asked Questions

  • Because 1% of a large number (healthy people) is bigger than 99% of a small number (sick people). If 1,000 are sick and 999,000 are healthy, even 5% of 999,000 healthy (49,950 false positives) dwarfs 99% of 1,000 sick (990 true positives). The test's accuracy applies to each group separately, but the groups are vastly unequal in size.