Working with a client recently we started to talk about A/B tests and using ML to direct marketing campaigns. We had a suspicion that their tests might have been underpowered - so perhaps they weren’t detecting successful A/B campaign variants due to weak signal and small sample sizes.

I wrote a monte carlo simulator to try both random data models (generating random sequences where you know the True/False rates) and using an estimator formula. It is pretty sobering to try to get a 90% power rate (so 9/10 tests where a difference exists will be detected) - the required sample size is often much larger than you’d like.

I liked . The section on hypothesis testing and sample sizes is great, it lists 3 ways to reduce variance which can decrease your required sample size for an effective test. Worth a read (and yes, it is from 2007 - timeless). Do you have a related resource to share?

Whilst working on a recent iteration of my Higher Performance Python course I discovered that a part of why groupby operations is slow is that Pandas has to run a factorize before applying the groupby. If you’ve got a Categorical encoded dataset already then you can skip this step - so your groupbys go faster. That’ll feature again in my next class.

You get these after e.g. doing “astype(‘categorical’)” or by binning with a cut or qcut.

I'm moving to a new newsletter provider, hence the rather sudden change (I'll get my CSS in order in coming issues). It is still me (Ian Ozsvald), just sans CSS.

If you haven't seen this in a while - maybe using my last provider it hit your spam folder. There's an unsubscribe link directly above this if you no longer want these emails.

And back to the usual service...