What is minimum detectable effect (MDE)?

MDE is the smallest improvement your test is designed to detect. For example, 10% MDE on a 5% baseline means you'd detect a lift to 5.5% or higher. Smaller MDEs require larger samples. Choose an MDE based on the smallest improvement that would justify implementing the change.

What happens if I don't collect enough samples?

An underpowered test has a high risk of missing real effects (false negatives) or producing unreliable p-values. You might conclude "no difference" when there actually is one, or worse, declare a winner based on statistical noise. This leads to implementing ineffective changes or abandoning effective ones.

What is statistical power?

Statistical power is the probability of correctly detecting a real effect of the specified size. At 80% power, you have an 80% chance of detecting a true difference equal to or larger than your MDE. Higher power requires more samples. 80% is the standard default; some teams use 90% for high-stakes decisions.

Should I use one-sided or two-sided tests?

Two-sided tests are the standard because they detect both improvements and degradations. One-sided tests need fewer samples but only detect effects in one direction. Use two-sided unless you have a strong prior belief that the change can only improve (or only hurt) the metric. This calculator uses two-sided tests.

How do I handle multiple metrics?

If you're testing multiple metrics simultaneously, adjust for multiple comparisons using Bonferroni correction (divide significance by number of metrics) or False Discovery Rate control. Without correction, testing 20 metrics at 5% significance means you'll likely get at least one false positive.

Can I stop a test early if results look significant?

Standard fixed-horizon tests should not be stopped early because p-values are unreliable until the planned sample is reached. If you need to monitor results continuously, use sequential testing methods (like group sequential designs or always-valid confidence intervals) that account for repeated analysis.

A/B Test Sample Size Calculator

Calculate the required sample size for statistically significant A/B tests. Input baseline rate, minimum detectable effect, significance, and power.

Baseline Conversion Rate

Minimum Detectable Effect (relative)

Significance Level (α)

Statistical Power (1−β)

Daily Traffic (both variants)Optional — for duration estimate

Required Sample Size per Variant

29,826

Total: 59,652 across both variants

Per Variant

29,826

minimum users

Total Sample

59,652

both variants combined

Est. Duration

12 days

~1.7 weeks

Detecting

5.00% → 5.50%

δ = 0.50% absolute

Test Parameters

Baseline rate: 5.00%

Target rate: 5.50%

Absolute diff: 0.50%

Relative MDE: 10%

Zα/2: 1.96

Zβ: 0.842

MDE Sensitivity Analysis

MDE	Per Variant	Total	Duration
5%	119,303	238,606	48 days
8%	46,603	93,206	19 days
10%	29,826	59,652	12 days
15%	13,256	26,512	6 days
20%	7,457	14,914	3 days
25%	4,773	9,546	2 days
30%	3,314	6,628	2 days
50%	1,194	2,388	1 days

Planning notes, formulas, and examples

About the A/B Test Sample Size Calculator

Running an A/B test without enough samples leads to unreliable results — you might declare a winner that isn't actually better, or miss a real improvement because you stopped too early. Sample size calculation is the critical first step in experiment design, determining how many users you need in each variant to detect a meaningful difference with statistical confidence.

The required sample size depends on four key parameters: your baseline conversion rate (what the control currently achieves), the minimum detectable effect (the smallest improvement worth detecting), the significance level (typically 5%, controlling false positive risk), and statistical power (typically 80%, controlling false negative risk). Together, these determine whether your experiment can reliably detect the effect you care about.

This calculator uses the standard normal approximation for two-proportion tests to compute the required sample size per variant. It also estimates test duration based on your daily traffic and shows how different MDE levels affect the required sample size, helping you find the right balance between sensitivity and practical test duration.

When This Page Helps

Underpowered experiments are one of the biggest wastes in growth optimization. They lead to inconclusive results, false positives, and wasted development time on changes that weren't validated. This calculator ensures your experiments are properly sized before you start, gives you realistic test duration estimates, and helps you negotiate between statistical rigor and business timelines.

How to Use the Inputs

Enter your current baseline conversion rate (e.g., 5% for a 5% purchase rate).
Enter the minimum detectable effect — the smallest relative improvement worth detecting (e.g., 10% means detecting a lift from 5.0% to 5.5%).
Set your significance level (default 5%) and statistical power (default 80%).
Optionally enter daily traffic to estimate test duration.
Review the required sample size per variant and total, plus the duration estimate.

Formula used

n = (Zα/2 + Zβ)² × 2p(1−p) ÷ δ²

Where:
• Zα/2 = Z-score for significance level (1.96 for 95%)
• Zβ = Z-score for power (0.84 for 80%)
• p = pooled baseline proportion
• δ = absolute difference to detect (baseline × MDE%)

Total Sample = n × 2 (for two variants)
Test Duration = Total Sample ÷ Daily Traffic

Example Calculation

Result: n ≈ 31,234 per variant (62,468 total)

With a 5% baseline conversion rate and wanting to detect a 10% relative improvement (from 5.0% to 5.5%), at 95% significance and 80% power, you need approximately 31,234 users per variant. With 5,000 daily visitors split evenly, the test would run for about 13 days. Reducing MDE to 5% would require ~124,000 per variant.

Tips & Best Practices

Don't peek at results before reaching your calculated sample size — early peeking inflates false positive rates.
If the required sample size is too large, increase MDE (detect larger effects) or accept lower power.
Most A/B tests should run at least 1–2 full weeks to capture day-of-week effects.
Lower baseline rates require larger samples — a test on a 1% conversion needs 5× more traffic than a 5% rate.
Use 95% significance and 80% power as defaults unless you have specific reasons to change them.
For multiple variants (A/B/C/D tests), apply Bonferroni correction: divide significance by number of comparisons.
Consider using sequential testing methods if you must monitor results before the full sample is collected.

Understanding Sample Size Tradeoffs

Sample size is a tradeoff between sensitivity, speed, and confidence. Larger samples detect smaller effects but take longer. The relationship is quadratic: detecting a 5% relative MDE requires roughly 4× the sample of a 10% MDE. This is why choosing the right MDE is crucial — don't over-specify sensitivity you don't need.

Practical Test Duration Planning

Beyond pure sample size, tests should run for complete weeks to capture day-of-week effects. A test that reaches sample size on a Thursday should still run through Sunday. Also account for novelty effects (early users react differently to changes) and external events (holidays, promotions) that can bias results.

Advanced Considerations

For metrics with high variance (like revenue per user), you'll need much larger samples than for binary metrics (like conversion rate). Consider variance reduction techniques like CUPED or stratified sampling to reduce required samples by 30–50%. For high-traffic sites, use multi-armed bandit methods to balance learning and earning during the experiment.

Sources & Methodology

Last updated: February 8, 2026

Frequently Asked Questions

MDE is the smallest improvement your test is designed to detect. For example, 10% MDE on a 5% baseline means you'd detect a lift to 5.5% or higher. Smaller MDEs require larger samples. Choose an MDE based on the smallest improvement that would justify implementing the change.