4 minute read
This beginner-friendly book is a crash course for all the lessons you would learn in the first year of running A/B tests, spanning a full range of topics including cultural tenets, experiment design, statistics and instrumentation. Rich with examples from Microsoft and Amazon, it is a reference piece that I frequently revisit even after years of experimentation.
My rating: 5/5 stars
Table of contents
Truly a one-stop-shop, comprehensive view of A/B testing including:
- experiment design and metric selection
- statistical analysis
- development of randomization code and tracking instrumentation (both client and server-side)
- cultural/organizational tenets
- building for rapid testing at scale
- common mistakes, tripwires and paradoxes
Pros and cons
Spoiler alert, I love this book. It gives a true 360 view of everything I wish I had known after running my first dozen experiments. Not only does it act as a statistical primer, but goes on to discuss the organizational tenets and leadership discussions required the foster a culture that can achieve experimentation at scale.
Uniquely accessible, it reads like a novel while conveying complex topics that have been broken down into simple terms. The authors share their respective experiences at Microsoft, Google and LinkedIn with an abundance of real experiments, complete with screenshots of the variants. The final section of the book provides hundreds of papers that may be referenced for more advanced readers that wish to go deeper.
I'm hard-pressed to find a con but I will note that if you are at a smaller organization that does not have the luxury of large sample sizes, you will need additional consideration for the feasibility of A/B testing at scale. As previously mentioned, the authors are from large companies so this issue is only mentioned in passing. It's also pertinent to mention that controlled experiments are discussed in the sense of traditional randomized A/B tests and not quasi-experiments (aka don't expect to cover diff-in-diff or geo experiments).
One final point (and this is not a con but rather to set the expectation), the topics are more conceptual than applied, with quite high-level code and math.
One of the biggest selling points of this book is that it is a fantastic introductory text for technical and non-technical readers alike, while also holding up as a reference for experts. As a tenured data scientist, I find myself periodically revisiting chapters as a refresher as well as recommending it to my product stakeholders.
The breadth of subject matter means that code and statistical application details are not included. It provides conceptual understanding rather than acting as an implementation guide.
Content is accessible to readers without prior experience in A/B testing, statistics or engineering - no true prerequisites. A vague understanding of statistics would help with Chapter 2 & 17, but not required. Overall, the topics are discussed independently so that beginners can skim sections without loss of general understanding.
Most notable excerpts
- "Strategy and controlled experiments are synergistic. [David Collins] defines a lean strategy process, which guards against the extremes of both rigid planning and unrestrained experimentation." (Chapter 1: Strategy, Tactics, Their Relationship to Experiments)
- "If an experiment impacts only a small subset of the population, it is important to analyze just the impacted subset; even a large effect on a small set of users could be diluted and not be detectable overall." (Chapter 3: Misinterpretation of the Statistical Results)
- [On the topic of testing volume] "If you have to kiss a lot of frogs to find a prince, find more frogs and kiss them faster and faster." (Chapter 4: Experimentation Maturity Models)
- [On the topic of layering A/B tests with manual analysis] "Using multiple methods to triangulate towards a more accurate measurement - establishing a hierarchy of evidence - can lead to more robust results." (Chapter 10: Putting It All Together)
- "Finer levels of granularity for randomization creates more units, so the variance of the mean of a metric is smaller and the experiment will have more statistical power to detect smaller changes." (Chapter 14: Choosing a Randomization Unit)
As always, would love to hear your thoughts and feedback in the comments below.