Learn how power affects your experiments.
Statistical power is the probability of detecting a true effect as a Stat-sig result if it exists. It's a critical concept on which experimentation and A/B testing is built. Let's start with the simplest visual exercise of this.
Question Imagine you would like to detect a 2% effect with 80% probability - what should you set your power to?
Imagine the unthinkable: you actually know the true relative effect of the thing you are about to measure in an experiment. And for fun, you run 10 independent A/B tests.
Question If you set your statistical power to 80%, how many of your A/B tests would end up stat-sig?
Sample size and MDE (Minimum Detectable Effect) are directly linked.
Scenario: 10% baseline, 80% power, 5% alpha
Question What is the relationship between sample size and MDE?
Here's one of the most important rules in experiment design: halving your MDE requires 4x the sample size. This inverse-square relationship means detecting smaller effects gets expensive fast.
Scenario: 10% baseline, 80% power, 5% alpha
Question Looking at the 4% MDE point on the chart, how many samples would a 2% MDE need?
Higher power requires more sample size - but how much more? Explore the curve below to find a good balance between detection rate and sample cost.
Scenario: 10% baseline, 2% MDE (relative lift, i.e. 10% → 10.2%)
Question What power level gives you good detection without excessive cost?
Statistical power is the probability of detecting a true effect as a Stat-sig result if it exists. It's a critical concept on which experimentation and A/B testing is built. Let's start with the simplest visual exercise of this.
Question Imagine you would like to detect a 2% effect with 80% probability - what should you set your power to?
Now try different power levels yourself. What happens at 50%? 95%? How does the spread of measured effects change?