A/B Test Revenue Impact Calculator
Estimate the annualized revenue impact of an A/B test winner. Project how a conversion rate lift translates to additional dollars over time.
Test whether your A/B test results are statistically significant. Enter visitors, conversions for control and variant to get the Z-score and p-value.
After running an A/B test, you need to determine whether the observed difference between control and variant is statistically significant or could have occurred by chance. This calculator performs a two-proportion z-test, the standard method for comparing conversion rates between two groups.
Enter the number of visitors and conversions for both control (A) and variant (B). The calculator computes the z-score, p-value, and confidence level. A p-value below your significance threshold (typically 0.05) means the difference is statistically significant.
Statistical significance does not mean the result is practically important โ a tiny lift can be statistically significant with enough traffic. Always consider both statistical significance and the magnitude of the effect.
Making product decisions based on random noise wastes resources and damages user experience. It gives objective statistical evidence for whether your A/B test result is genuine, replacing gut feelings with mathematical rigor.
pฬ = (x_A + x_B) / (n_A + n_B)
Z = (p_B โ p_A) / โ[pฬ(1โpฬ)(1/n_A + 1/n_B)]
p-value = 2 ร (1 โ ฮฆ(|Z|))Result: p-value = 0.044 (statistically significant at 95%)
Control: 300/10,000 = 3.00%. Variant: 350/10,000 = 3.50%. The pooled proportion is 3.25%. Z = 1.99, p-value = 0.044. Since 0.044 < 0.05, the result is statistically significant at the 95% confidence level. The variant shows a genuine 16.7% relative improvement.
Statistical significance answers one question: "Is the observed difference likely to be real, or could it be random noise?" A p-value below 0.05 means you can be at least 95% confident the difference is real. This does not mean the variant is 95% better โ it means you have strong evidence that it is different from the control.
Peeking at results before reaching sample size inflates false positive rates dramatically. Running multiple tests simultaneously without correction leads to false discoveries. Using one-tailed tests when two-tailed is appropriate halves your threshold and misses degradations. Always pre-register your test plan before launching.
Significance tells you if there is a difference; effect size tells you how big it is. Report both. A confidence interval (e.g., "the variant is 10โ30% better") gives more useful information than a binary "significant/not significant" declaration.
Last updated:
The conventional threshold is p < 0.05 (95% confidence). This means there is less than a 5% probability that the observed difference is due to random chance. More conservative tests use p < 0.01 (99% confidence).
The z-score measures how many standard deviations the observed difference is from zero (no difference). Higher absolute z-scores indicate stronger evidence. |Z| > 1.96 corresponds to p < 0.05, and |Z| > 2.58 corresponds to p < 0.01.
Yes. With very large sample sizes, even tiny differences (0.01% lift) can be statistically significant. Always ask whether the magnitude of the lift justifies the cost of implementation. Practical significance is as important as statistical significance.
A two-tailed test checks for any difference (better or worse). A one-tailed test only checks for improvement. Two-tailed is recommended because it catches degradations. This calculator uses a two-tailed test.
Borderline. Technically, p must be less than 0.05 to reject the null hypothesis. In practice, a p-value of 0.05 suggests weak evidence. Consider running the test longer or using a larger sample to get a clearer signal.
A negative z-score means the variant performed worse than the control. The p-value still measures significance โ a significant negative result means the variant genuinely hurt performance and should not be implemented.
Estimate the annualized revenue impact of an A/B test winner. Project how a conversion rate lift translates to additional dollars over time.
Calculate the confidence interval for a conversion rate or proportion. Support for 90%, 95%, and 99% confidence levels with sample size inputs.