Working with a client recently we started to talk about A/B tests and using ML to direct marketing campaigns. We had a suspicion that their tests might have been underpowered - so perhaps they weren’t detecting successful A/B campaign variants due to weak signal and small sample sizes.
I wrote a monte carlo simulator to try both random data models (generating random sequences where you know the True/False rates) and using an estimator formula. It is pretty sobering to try to get a 90% power rate (so 9/10 tests where a difference exists will be detected) - the required sample size is often much larger than you’d like.
I liked . The section on hypothesis testing and sample sizes is great, it lists 3 ways to reduce variance which can decrease your required sample size for an effective test. Worth a read (and yes, it is from 2007 - timeless). Do you have a related resource to share?