A/B Test Sample Size Calculator

Calculate the required sample size for statistically significant A/B tests. Input baseline rate, minimum detectable effect, significance, and power.

%
%
%
%
Optional — for duration estimate
Required Sample Size per Variant
29,826
Total: 59,652 across both variants
Per Variant
29,826
minimum users
Total Sample
59,652
both variants combined
Est. Duration
12 days
~1.7 weeks
Detecting
5.00% → 5.50%
Ī“ = 0.50% absolute

Test Parameters

Baseline rate: 5.00%
Target rate: 5.50%
Absolute diff: 0.50%
Relative MDE: 10%
Zα/2: 1.96
Zβ: 0.842

MDE Sensitivity Analysis

MDEPer VariantTotalDuration
5%119,303238,60648 days
8%46,60393,20619 days
10%29,82659,65212 days
15%13,25626,5126 days
20%7,45714,9143 days
25%4,7739,5462 days
30%3,3146,6282 days
50%1,1942,3881 days
Planning notes, formulas, and examples

About the A/B Test Sample Size Calculator

Running an A/B test without enough samples leads to unreliable results — you might declare a winner that isn't actually better, or miss a real improvement because you stopped too early. Sample size calculation is the critical first step in experiment design, determining how many users you need in each variant to detect a meaningful difference with statistical confidence.

The required sample size depends on four key parameters: your baseline conversion rate (what the control currently achieves), the minimum detectable effect (the smallest improvement worth detecting), the significance level (typically 5%, controlling false positive risk), and statistical power (typically 80%, controlling false negative risk). Together, these determine whether your experiment can reliably detect the effect you care about.

This calculator uses the standard normal approximation for two-proportion tests to compute the required sample size per variant. It also estimates test duration based on your daily traffic and shows how different MDE levels affect the required sample size, helping you find the right balance between sensitivity and practical test duration.

When This Page Helps

Underpowered experiments are one of the biggest wastes in growth optimization. They lead to inconclusive results, false positives, and wasted development time on changes that weren't validated. This calculator ensures your experiments are properly sized before you start, gives you realistic test duration estimates, and helps you negotiate between statistical rigor and business timelines.

How to Use the Inputs

  1. Enter your current baseline conversion rate (e.g., 5% for a 5% purchase rate).
  2. Enter the minimum detectable effect — the smallest relative improvement worth detecting (e.g., 10% means detecting a lift from 5.0% to 5.5%).
  3. Set your significance level (default 5%) and statistical power (default 80%).
  4. Optionally enter daily traffic to estimate test duration.
  5. Review the required sample size per variant and total, plus the duration estimate.
Formula used
n = (Zα/2 + Zβ)² Ɨ 2p(1āˆ’p) Ć· Γ² Where: • Zα/2 = Z-score for significance level (1.96 for 95%) • Zβ = Z-score for power (0.84 for 80%) • p = pooled baseline proportion • Ī“ = absolute difference to detect (baseline Ɨ MDE%) Total Sample = n Ɨ 2 (for two variants) Test Duration = Total Sample Ć· Daily Traffic

Example Calculation

Result: n ā‰ˆ 31,234 per variant (62,468 total)

With a 5% baseline conversion rate and wanting to detect a 10% relative improvement (from 5.0% to 5.5%), at 95% significance and 80% power, you need approximately 31,234 users per variant. With 5,000 daily visitors split evenly, the test would run for about 13 days. Reducing MDE to 5% would require ~124,000 per variant.

Tips & Best Practices

  • Don't peek at results before reaching your calculated sample size — early peeking inflates false positive rates.
  • If the required sample size is too large, increase MDE (detect larger effects) or accept lower power.
  • Most A/B tests should run at least 1–2 full weeks to capture day-of-week effects.
  • Lower baseline rates require larger samples — a test on a 1% conversion needs 5Ɨ more traffic than a 5% rate.
  • Use 95% significance and 80% power as defaults unless you have specific reasons to change them.
  • For multiple variants (A/B/C/D tests), apply Bonferroni correction: divide significance by number of comparisons.
  • Consider using sequential testing methods if you must monitor results before the full sample is collected.

Understanding Sample Size Tradeoffs

Sample size is a tradeoff between sensitivity, speed, and confidence. Larger samples detect smaller effects but take longer. The relationship is quadratic: detecting a 5% relative MDE requires roughly 4Ɨ the sample of a 10% MDE. This is why choosing the right MDE is crucial — don't over-specify sensitivity you don't need.

Practical Test Duration Planning

Beyond pure sample size, tests should run for complete weeks to capture day-of-week effects. A test that reaches sample size on a Thursday should still run through Sunday. Also account for novelty effects (early users react differently to changes) and external events (holidays, promotions) that can bias results.

Advanced Considerations

For metrics with high variance (like revenue per user), you'll need much larger samples than for binary metrics (like conversion rate). Consider variance reduction techniques like CUPED or stratified sampling to reduce required samples by 30–50%. For high-traffic sites, use multi-armed bandit methods to balance learning and earning during the experiment.

Sources & Methodology

Last updated:

Frequently Asked Questions

  • MDE is the smallest improvement your test is designed to detect. For example, 10% MDE on a 5% baseline means you'd detect a lift to 5.5% or higher. Smaller MDEs require larger samples. Choose an MDE based on the smallest improvement that would justify implementing the change.