A/B Test Statistical Significance Calculator

Test whether your A/B test results are statistically significant. Enter visitors, conversions for control and variant to get the Z-score and p-value.

Control (A)

Variant (B)

Control CR (A)
3.00%
300 / 10000
Variant CR (B)
3.50%
350 / 10000
Relative Lift
+16.67%
(B โˆ’ A) / A
Z-Score
1.9938
|Z| > 1.96
p-Value
0.0462
โœ… Significant at 95%
Confidence
95.38%
Result is reliable
Planning notes, formulas, and examples

About the A/B Test Statistical Significance Calculator

After running an A/B test, you need to determine whether the observed difference between control and variant is statistically significant or could have occurred by chance. This calculator performs a two-proportion z-test, the standard method for comparing conversion rates between two groups.

Enter the number of visitors and conversions for both control (A) and variant (B). The calculator computes the z-score, p-value, and confidence level. A p-value below your significance threshold (typically 0.05) means the difference is statistically significant.

Statistical significance does not mean the result is practically important โ€” a tiny lift can be statistically significant with enough traffic. Always consider both statistical significance and the magnitude of the effect.

When This Page Helps

Making product decisions based on random noise wastes resources and damages user experience. It gives objective statistical evidence for whether your A/B test result is genuine, replacing gut feelings with mathematical rigor.

How to Use the Inputs

  1. Enter the number of visitors in the control group (A).
  2. Enter the number of conversions in group A.
  3. Enter the number of visitors in the variant group (B).
  4. Enter the number of conversions in group B.
  5. Review the z-score, p-value, and significance conclusion.
  6. A p-value below 0.05 indicates statistical significance at the 95% level.
Formula used
pฬ‚ = (x_A + x_B) / (n_A + n_B) Z = (p_B โˆ’ p_A) / โˆš[pฬ‚(1โˆ’pฬ‚)(1/n_A + 1/n_B)] p-value = 2 ร— (1 โˆ’ ฮฆ(|Z|))

Example Calculation

Result: p-value = 0.044 (statistically significant at 95%)

Control: 300/10,000 = 3.00%. Variant: 350/10,000 = 3.50%. The pooled proportion is 3.25%. Z = 1.99, p-value = 0.044. Since 0.044 < 0.05, the result is statistically significant at the 95% confidence level. The variant shows a genuine 16.7% relative improvement.

Tips & Best Practices

  • Always reach your pre-calculated sample size before checking significance.
  • A p-value of 0.05 means there is a 1-in-20 chance the result is due to randomness.
  • Check statistical significance AND practical significance (is the lift large enough to matter?).
  • Two-tailed tests (this calculator) detect both improvements and degradations.
  • Beware of multiple comparisons: testing 20 metrics guarantees at least one false positive at p<0.05.
  • If results are borderline (p = 0.04โ€“0.06), consider extending the test for more data.

Understanding Statistical Significance

Statistical significance answers one question: "Is the observed difference likely to be real, or could it be random noise?" A p-value below 0.05 means you can be at least 95% confident the difference is real. This does not mean the variant is 95% better โ€” it means you have strong evidence that it is different from the control.

Common Significance Mistakes

Peeking at results before reaching sample size inflates false positive rates dramatically. Running multiple tests simultaneously without correction leads to false discoveries. Using one-tailed tests when two-tailed is appropriate halves your threshold and misses degradations. Always pre-register your test plan before launching.

Beyond Significance: Effect Size and Confidence Intervals

Significance tells you if there is a difference; effect size tells you how big it is. Report both. A confidence interval (e.g., "the variant is 10โ€“30% better") gives more useful information than a binary "significant/not significant" declaration.

Sources & Methodology

Last updated:

Frequently Asked Questions

  • The conventional threshold is p < 0.05 (95% confidence). This means there is less than a 5% probability that the observed difference is due to random chance. More conservative tests use p < 0.01 (99% confidence).