Statistical Significance Calculator

Test whether your A/B test results are statistically significant using a two-proportion Z-test. See p-value, confidence interval, and effect size.

Control (A)

Variant (B)

%
Test Result
Variant Wins
+16.00% relative lift
p-value = 0.0123
Control Rate
5.00%
500 / 10,000
Variant Rate
5.80%
580 / 10,000
Relative Lift
+16.00%
+0.80% absolute
p-value
0.0123
Significant at 95%

95% Confidence Interval for Difference

Lower: 0.17%0%Upper: 1.43%

Test Statistics

Z-statistic2.5028
p-value (two-tailed)0.0123
Pooled proportion5.40%
Standard error0.003196
Absolute difference0.80%
Relative lift16.00%
95% CI (lower)0.17%
95% CI (upper)1.43%
Planning notes, formulas, and examples

About the Statistical Significance Calculator

After running an A/B test, you need to determine whether the observed difference between your control and variant is real or simply due to random chance. Statistical significance testing answers this question by calculating the probability that the observed difference (or a larger one) would occur if there were actually no difference between the two versions.

The standard approach for comparing two conversion rates is the two-proportion Z-test. It computes a Z-statistic from the observed rates and sample sizes, then converts it to a p-value โ€” the probability of seeing such a result under the null hypothesis of no difference. A p-value below your significance threshold (typically 0.05) means the result is statistically significant and unlikely to be due to chance alone.

This calculator takes your A/B test results (visitors and conversions for each variant), performs the two-proportion Z-test, and reports the Z-statistic, p-value, confidence interval for the difference, and a clear verdict on significance. It helps you make confident decisions about whether to implement the winning variant.

When This Page Helps

Declaring A/B test winners without proper significance testing is a recipe for implementing changes that don't actually work. This calculator gives you the statistical rigor to separate real effects from noise. It produces the p-value, confidence interval, and relative lift with clear pass/fail verdicts, so you can make data-driven decisions with confidence.

How to Use the Inputs

  1. Enter the number of visitors (sample size) for the control group.
  2. Enter the number of conversions in the control group.
  3. Enter the number of visitors for the variant (treatment) group.
  4. Enter the number of conversions in the variant group.
  5. Set your significance threshold (default 5% / 95% confidence).
  6. Review the Z-statistic, p-value, confidence interval, and significance verdict.
Formula used
pฬ… = (xโ‚ + xโ‚‚) รท (nโ‚ + nโ‚‚) [pooled proportion] Z = (pฬ‚โ‚ โˆ’ pฬ‚โ‚‚) รท โˆš(pฬ…(1โˆ’pฬ…)(1/nโ‚ + 1/nโ‚‚)) Where pฬ‚โ‚ = xโ‚/nโ‚, pฬ‚โ‚‚ = xโ‚‚/nโ‚‚ CI = (pฬ‚โ‚‚ โˆ’ pฬ‚โ‚) ยฑ Zฮฑ/2 ร— โˆš(pฬ‚โ‚(1โˆ’pฬ‚โ‚)/nโ‚ + pฬ‚โ‚‚(1โˆ’pฬ‚โ‚‚)/nโ‚‚)

Example Calculation

Result: p-value = 0.023, Significant at 95%

Control: 500/10,000 = 5.00%. Variant: 580/10,000 = 5.80%. The difference of 0.80 percentage points (16.0% relative lift) gives Z = 2.28 and p-value = 0.023. Since p < 0.05, this result is statistically significant at the 95% confidence level. The 95% CI for the difference is [0.11%, 1.49%], confirming the variant outperforms the control.

Tips & Best Practices

  • A p-value below 0.05 means statistically significant at 95% confidence โ€” not that the result is 95% likely to be true.
  • Statistical significance doesn't mean practical significance; a tiny lift might not justify implementation costs.
  • Check the confidence interval width โ€” a wide CI means the true effect could range from small to large.
  • Don't run the significance test repeatedly during the experiment; calculate once at the predetermined sample size.
  • For multiple comparisons (testing many variants), apply Bonferroni correction or control the false discovery rate.
  • Use the relative lift alongside the absolute difference to communicate results to stakeholders.
  • Consider Bayesian methods if you need to express results as probability of improvement rather than p-values.

Interpreting P-Values Correctly

The p-value is the probability of observing results as extreme as yours if the null hypothesis (no difference) were true. It is NOT the probability that the null hypothesis is true. A p-value of 0.03 doesn't mean there's a 97% chance the variant is better; it means the data would be unlikely (3% chance) if there were no real difference. This distinction is crucial for proper interpretation.

Common Pitfalls in Significance Testing

Peeking at results during the test and stopping early when significance is reached inflates false positive rates dramatically. Multiple comparison problems occur when testing many metrics without correction. Simpson's paradox can make overall results misleading when subgroups have different patterns. Always pre-register your hypothesis, primary metric, sample size, and analysis plan.

Beyond Significance: Effect Size and Practical Impact

Report effect sizes (relative lift, absolute difference) alongside p-values. A 20% relative lift that's significant is more actionable than a 2% lift that's also significant. Combine statistical results with business context: implementation cost, opportunity cost, and long-term strategic value should all factor into the decision of whether to ship the winning variant.

Sources & Methodology

Last updated:

Frequently Asked Questions

  • Statistical significance means the observed result is unlikely to have occurred by random chance alone. In A/B testing, it means the conversion rate difference between control and variant is probably real, not noise. The p-value quantifies this: a p-value of 0.03 means there's only a 3% chance of seeing this large a difference if the variants were identical.