What p-value means my test is significant?

The conventional threshold is p < 0.05 (95% confidence). This means there is less than a 5% probability that the observed difference is due to random chance. More conservative tests use p < 0.01 (99% confidence).

The z-score measures how many standard deviations the observed difference is from zero (no difference). Higher absolute z-scores indicate stronger evidence. |Z| > 1.96 corresponds to p 2.58 corresponds to p < 0.01.

Can a test be significant but not meaningful?

Yes. With very large sample sizes, even tiny differences (0.01% lift) can be statistically significant. Always ask whether the magnitude of the lift justifies the cost of implementation. Practical significance is as important as statistical significance.

What is a two-tailed vs. one-tailed test?

A two-tailed test checks for any difference (better or worse). A one-tailed test only checks for improvement. Two-tailed is recommended because it catches degradations. This calculator uses a two-tailed test.

My p-value is exactly 0.05. Is that significant?

Borderline. Technically, p must be less than 0.05 to reject the null hypothesis. In practice, a p-value of 0.05 suggests weak evidence. Consider running the test longer or using a larger sample to get a clearer signal.

How do I interpret negative z-scores?

A negative z-score means the variant performed worse than the control. The p-value still measures significance — a significant negative result means the variant genuinely hurt performance and should not be implemented.

A/B Test Statistical Significance Calculator

Test whether your A/B test results are statistically significant. Enter visitors, conversions for control and variant to get the Z-score and p-value.

Control (A)

Visitors

Conversions

Variant (B)

Visitors

Conversions

Control CR (A)

3.00%

300 / 10000

Variant CR (B)

3.50%

350 / 10000

Relative Lift

+16.67%

(B − A) / A

Z-Score

1.9938

|Z| > 1.96

p-Value

0.0462

✅ Significant at 95%

Confidence

95.38%

Result is reliable

Planning notes, formulas, and examples

About the A/B Test Statistical Significance Calculator

After running an A/B test, you need to determine whether the observed difference between control and variant is statistically significant or could have occurred by chance. This calculator performs a two-proportion z-test, the standard method for comparing conversion rates between two groups.

Enter the number of visitors and conversions for both control (A) and variant (B). The calculator computes the z-score, p-value, and confidence level. A p-value below your significance threshold (typically 0.05) means the difference is statistically significant.

Statistical significance does not mean the result is practically important — a tiny lift can be statistically significant with enough traffic. Always consider both statistical significance and the magnitude of the effect.

When This Page Helps

Making product decisions based on random noise wastes resources and damages user experience. It gives objective statistical evidence for whether your A/B test result is genuine, replacing gut feelings with mathematical rigor.

How to Use the Inputs

Enter the number of visitors in the control group (A).
Enter the number of conversions in group A.
Enter the number of visitors in the variant group (B).
Enter the number of conversions in group B.
Review the z-score, p-value, and significance conclusion.
A p-value below 0.05 indicates statistical significance at the 95% level.

Formula used

p̂ = (x_A + x_B) / (n_A + n_B)
Z = (p_B − p_A) / √[p̂(1−p̂)(1/n_A + 1/n_B)]
p-value = 2 × (1 − Φ(|Z|))

Example Calculation

Result: p-value = 0.044 (statistically significant at 95%)

Control: 300/10,000 = 3.00%. Variant: 350/10,000 = 3.50%. The pooled proportion is 3.25%. Z = 1.99, p-value = 0.044. Since 0.044 < 0.05, the result is statistically significant at the 95% confidence level. The variant shows a genuine 16.7% relative improvement.

Tips & Best Practices

Always reach your pre-calculated sample size before checking significance.
A p-value of 0.05 means there is a 1-in-20 chance the result is due to randomness.
Check statistical significance AND practical significance (is the lift large enough to matter?).
Two-tailed tests (this calculator) detect both improvements and degradations.
Beware of multiple comparisons: testing 20 metrics guarantees at least one false positive at p<0.05.
If results are borderline (p = 0.04–0.06), consider extending the test for more data.

Understanding Statistical Significance

Statistical significance answers one question: "Is the observed difference likely to be real, or could it be random noise?" A p-value below 0.05 means you can be at least 95% confident the difference is real. This does not mean the variant is 95% better — it means you have strong evidence that it is different from the control.

Common Significance Mistakes

Peeking at results before reaching sample size inflates false positive rates dramatically. Running multiple tests simultaneously without correction leads to false discoveries. Using one-tailed tests when two-tailed is appropriate halves your threshold and misses degradations. Always pre-register your test plan before launching.

Beyond Significance: Effect Size and Confidence Intervals

Significance tells you if there is a difference; effect size tells you how big it is. Report both. A confidence interval (e.g., "the variant is 10–30% better") gives more useful information than a binary "significant/not significant" declaration.

Sources & Methodology

Last updated: February 8, 2026

Frequently Asked Questions

The conventional threshold is p < 0.05 (95% confidence). This means there is less than a 5% probability that the observed difference is due to random chance. More conservative tests use p < 0.01 (99% confidence).