[Book Review] Mostly Harmless Econometrics, Joshua D. Angrist and Jörn-Steffen Pischke

Densely written, mathematically complex, quirky and value-packed all in one - Mostly Harmless Econometrics is a masterclass on measuring causality.

by Mojan Benham

A year ago

Two-line summary

7 minute read

Authors Angrist and Pischke illustrate the derivation of quasi-experimentation techniques (including quantile regression, regression discontinuity and difference-in-differences) through the use of a series of case studies. Not for the faint of heart, readers should expect to employ the most complex applications from their graduate-level understanding of mathematical theorems to make sense of this content.

My rating: 3/5 stars

Review

Topics covered

One of the first pages of Mostly Harmless Econometrics is an Organization of this Book section that broadly explains the sequence of chapters. If you are considering this read, I'd recommend skimming this page to get a better idea of the subject matter.

In short, it starts with two preliminary chapters, then three core chapters, and concludes with extensions and standard error issues. In the preliminaries, we are introduced to the motivation for causal inference as well as some of the practical questions that drive us to research these techniques. It explores the ideal experiment (randomized trials) and how we can use this framework to inspire that of quasi-experimentation.

A majority of time is spent on the core content: regression, instrumental variables, the difference-in-differences method and the assumptions necessary for causal interpretation. The extensions toward the end of the book cover regression discontinuity and quantile regression. These techniques are typically taught in the context of examples first, then formalized through theorem definitions. The structure is less formal than a textbook, more accurately a sequence of case studies through which the topics are organically discussed.

Biggest pros

Perhaps the most delightful upside to Mostly Harmful Econometrics is its writing style. It is unexpectedly funny, thoroughly witty, playful, engaging and overall just beautifully written. In this aspect, it stands unparalleled in a sea of boring math textbooks.

Additionally, readers will be thrilled to find that unlike a conventional textbook, the topics covered here are not just summaries of commonly understood techniques. There are fresh and groundbreaking ideas. Even familiar concepts are uniquely framed in a way that made me take pause for a few "aha moments".

Much of this follows from the fact that the statistical methods presented are boiled down to their fundamental mechanics. Where most books would focus on the what and why, this focuses on the how - Chapter Three (Making Regression Make Sense) is the perfect example of this. If you're reading this book, you're already familiar with what linear regression accomplishes, the structure of the equation, the interpretation of the coefficients, etc. Instead, this chapter is centred around how we actually arrive at a regression function from sample data.

Another selling point is the intentionality of the authors in their choice of use cases. They are not chosen out of convenience to confirm what is posited by each section. The examples tackle topical (and yet timeless) causal questions: how does class size impact quality of education? Are people who seek treatment at hospitals worse off than those who do not? Do employees of workplaces with unions earn higher wages than they would in the absence of a union? This relatability and relevance fosters an intuitive grasp of how to select the appropriate technique for each scenario (when to use regression discontinuity instead of difference-in-differences, for example).

Potential obstacles for the reader

I reconsidered my original framing of the following points as "cons" since they are not shortcomings, rather a risk of misalignment between the reader's comprehension level and the complexity of the content.

First and foremost, the authors use vocabulary that transcends the typical vernacular: attenuation, homoskedastic, contemporaneous, paucity to name a few. It presents the unique requirement for the reader to not only have a strong mathematical background, but an exceptional mastery of the English language as well.

I would place the mathematical prerequisite at the graduate level. It is not enough to simply be familiar with, for example, joint probability, expected values and moments of a function; instead, a deep intuitive understanding of each concept and how it interacts with the others is needed. As a result, even basic theorems are disguised as complicated at face value. Consider the following description of the central limit theorem:

"Sample moments are asymptotically normally distributed (after subtracting the corresponding population moment and multiplying by the square root of the sample size). The asymptotic covariance matrix is given by the variance of the underlying random variable. In other words, in large enough samples, appropriately normalized sample moments are approximately normally distributed." (Chapter 3: Making Regression Make Sense)

This relatively simple premise - often taught in Stats101 courses - was barely recognizable to me at first pass.

There is also an implication that the reader will be able to make logical leaps. The authors rely on self-evident math to fill in the gaps in the narration when points are not explicitly stated. You will need to look up word definitions, consult other statistics textbooks, cross-reference the source research material, re-read a formula and its annotation several times, etc. Simply put, expect to be an active participant in each derivation.

Most notable excerpts

"In other cases, we would like an answer sooner rather than later. Much of the research we do, therefore, attempts to exploit cheaper and more readily available sources of variation. We hope to find natural or quasi-experiments that mimic a randomized trial by changing the variable of interest while other factors are kept balanced." (Chapter 2: The Experimental Ideal)
"The link between [two-stage least squares] and instrumental variables warrants a bit more elaboration in the multi-instrument case. Assuming each instrument captures the same causal effect (a strong assumption that is relaxed below), we might want to combine these alternative IV estimates into a single more precise estimate." (Chapter 4: Instrumental Variables in Action)
"Unlike full covariate matching strategies, which are based on treatment-control comparisons conditional on covariate values where there is some overlap, the validity of [regression discontinuity] turns on our willingness to extrapolate across covariate values, at least in the neighborhood of the discontinuity." (Chapter 6: Regression Discontinuity Designs)
"Just as OLS fits a linear model to Y_i by minimizing expected squared error, quantile regression fits a linear model to Y_i using the asymmetric loss function, P_T(u). If Q_T(Y_i| X_i) is in fact linear, the quantile regression minimand will find it (just as if the CEF is linear, OLS will find it)." (Chapter 7: Quantile Regression)

As always, would love to hear your thoughts and feedback in the comments below.

1 comment

Very impressed, just realized that I need to learn a lot more about mathematics,
Please keep me posted. Thanks

Max on May 20, 2024

Shopping Cart

Recommended Reading