RCTs vs. intuition

The eternal struggle

Dec 15, 2020

"Chemistry Lab" by euthman, CC BY-SA 2.0

Angus Deaton, the Nobel-winning economist, has done a lot of great work lately on “deaths of despair” in America. Recently, he went on Julia Galef’s “Rationally Speaking” podcast and discussed this research. It’s a good interview, and I recommend the whole thing.

But Deaton is also a harsh critic of randomized controlled trials (RCTs) in development economics, and he also discussed this with Galef. At the end of the podcast, they had this amusing exchange:

Julia Galef: Well, I don't know what the people you're complaining about are doing, but I imagine if you're testing a specific intervention -- like giving out anti-malarial bed nets -- the cases in different countries or different regions aren't going to be identical, but it's still pretty similar, what you're doing from one region to the other. You're giving out bed nets.
Angus Deaton: I don't agree, because all the side effects, which are the things we're talking about, are going to be different in each case. And also, just to take a case -- we know what reduces poverty, what makes people better off: it's school teachers, it's malaria pills, it's all these things.
Julia Galef: How do we know that, though?
Angus Deaton: Oh, come on.
Julia Galef: No, I'm sorry, that was not a rhetorical or a troll question.
Angus Deaton: Really? I don't know how you get out of bed in the morning. How do you know that when you stand up, you won't fall over? I mean, there's been no experiments on that. There's never been an experiment on aspirin. Have you ever taken an aspirin?
Julia Galef: So, sorry, you think that increasing the number of schoolteachers -- or paying them better, or some intervention on school teachers causes people to be better off -- that that claim is as obvious as gravity?
Angus Deaton: It's pretty obvious. But that's not the point I'm trying to make.

What the heck is Deaton talking about here??

First of all, it’s pretty bizarre to say that there’s never been an experiment on aspirin. If I go to PubMed and search for “aspirin randomized controlled trial”, I get 7,586 results. There are reportedly 700 to 1000 clinical trials conducted on aspirin every year. There were also experiments involved in the invention of aspirin; people knew that salicylic acid helped with headaches, but extracting and buffering the chemical were both non-trivial tasks.

OK, but do we really need those experiments to know that aspirin helps get rid of headaches? That’s Deaton’s intended point here — that there are some things you just know will work, because of common sense and accumulated wisdom, and you don’t need a fancy RCT to know they’ll work. Just do the things that reduce poverty — provide more school teachers and malaria pills, etc. — and don’t worry about testing to verify the obvious.

But what if it’s not obvious? We know that in general, education reduces poverty. But that doesn’t mean that specific educational interventions reduce poverty — or that they’re worth the cost, or that they’re better than alternatives.

For example, as Jason Kerwin pointed out on Twitter, Indonesia’s experiment with doubling teacher salaries didn’t improve student learning outcomes (and so probably didn’t help much with poverty either). A 2007 study in Tanzania found that “high primary enrolment rates in the past did not lead to the realisation of the associated developmental outcomes”.

And Nancy Cartwright, who is Deaton’s co-author on his most famous critique of RCTs, describes how an experiment to double the number of teachers per student in California failed to improve outcomes — despite having encouraging evidence from an RCT.

The point here isn’t that education doesn’t reduce poverty; there are plenty of other cases where it did. The point is that educational programs don’t always work. And empirical research is how you figure out which programs work and which don’t.

And no, RCTs don’t always give you the right answer (as the California example demonstrates). To really get a full picture of the evidence you need policy experiments, natural experiments, and so on. But you do need evidence! Simply falling back on our intuition and wisdom when making policy is not enough!

One vivid illustration of the inadequacy of intuition and wisdom is that different people’s wisdom leads them to very different conclusions. For example, Lant Pritchett, also a renowned development economist and also a harsh critic of RCTs, strongly criticized the awarding of the 2019 Econ Nobel to three development economists who used RCTs to study the effectiveness of antipoverty programs. In a memorable Facebook rant, he declared:

Poverty rates across countries are almost perfectly correlated with the "typical" (median) income/consumption in that country...If poverty programs are defined as those that improve poverty rates, conditional on the typical level of income in a country, they account for less than 1 percent of total variation in poverty...
A commitment to "study global poverty" would probably ask: "what accounts for the observed reductions (or lack thereof) in poverty across time and across countries?" and discover that variation in the size and efficacy of poverty programs had little or nothing to do with poverty reduction...
So a focus on applying a method to the study of the effectiveness of (mostly) NGO programs is a commitment to not study global poverty.

So to Pritchett, RCTs are next to useless, because we know what reduces poverty. It’s economic growth!

And to Deaton, RCTs are next to useless, because we know what reduces poverty. It’s schoolteachers and malaria pills!

Each of these guys believes that we know what reduces poverty, and yet their answers don’t agree. The very kinds of “NGO programs” Pritchett dismisses are the things Deaton says are the obvious solution.

It would be kind of fun to get these guys in a room and have them hash it out. But the point here is that even among people who are obviously very wise, and have obviously well-developed intuition, answers to big questions like poverty reduction can vary dramatically.

This is why we can’t replace empirical evidence with “Oh, come on”. Even the most erudite and brilliant practitioners get things wrong fairly frequently. Empirical research, at least if done properly, doesn’t rely on any one person’s intuition; it’s a group effort, with large numbers of people checking and rechecking each other’s work, and holding that work to quantifiable and rigorous standards. The collective intelligence of science is more powerful than the expertise of any sage.

This is of course true in medicine as well; the greatest doctors on Earth will swear up and down that they’ve seen this or that treatment work miracles on their patients, and then RCTs come along and find the cure was no better than a placebo. This pandemic has vividly and cruelly demonstrated the necessity of high-quality evidence when evaluating cures.

In any case, Deaton’s critiques of RCTs are good, but the answer is to supplement them with other kinds of careful empirical evidence — not to simply say “Oh, come on” and decide that we already know the answers.

____________________________________________________________________________

(By the way, remember that if you like this blog, you can subscribe here! There’s a free email list and a paid subscription too!)

23 Comments

cagnew

Dec 15, 2020Liked by Noah Smith

Hi Noah,

I have long enjoyed your blog . This is a topic near to my heart -- I wrote a thesis on the subject back in 2015.

I think you correctly summarize the upshot of Deaton and Nancy Cartwright's position. I would just like to clarify some terminology, that may make the critique easier to understand. No one, including Deaton and Cartwright as far as I'm aware, thinks that RCTs are bad tool. In fact, almost everyone thinks they are very good tools for one, specific, task: making causal inferences.

The critique is really focused on what you do with causal inferences -- sometimes known in the literature as so-called 'evidence-for-use.'

To see the problem, it's helpful to think what an ideal RCT tells you: An ideal RCT gives you an extremely strong evidence that the intervention (whatever form it takes) is the cause of the effect in the model population. What an RCT, however well designed, can never tell you is whether the same intervention will have the same effect in some other population. In order to jump from the inference in the model population to some other target population, you need to extrapolate. (I would emphasize, in passing, here that a target population is *always* distinct from the model population -- the same intervention may have different effects based solely on the time the intervention is administered!)

The difficulty is that an RCT -- by design -- does not explain *why* an intervention worked in the model population. All it tells you is that it did work. RCTs, to the extent we want them to do anything more than generate a true causal inference, have to be accompanied by some theory of mechanisms -- a theory that may need only be intuitive, as I understand Deaton to suggest. And this theory must explain why the *reason* the intervention had its effect in the model population can be expected to obtain with respect to some other population. In other words, to turn a causal inference from an RCT into evidence for, e.g., a policy's efficacy in some other (later, more widespread, geographically distinct, whatever) setting, you have to discharge the burden of showing the reason the effect occurred will hold in the target population.

To tie this back to reality, it is helpful to think about medicine -- aspirin, to use you example. Consider an RCT showing aspirin is effective in population A. We have reason to think that asprin will be effective in population B (say, all mankind more or less) because we know that the mechanism by which aspirin has its effect in population A will be unaffected by any differences in population B. People are the same in the relevant respects, across space and time. The casual pathway is essentially invariant. (Causal pathway is a term in medicine that development economists should pay greater attention to, in my view.) This assumption is more or less a fair one for the majority of medical interventions and the associated RCTs.

The same cannot be said for many RCTs in development economics. Consider deworming children. It may be that a mass deworming program has the effect of better educational outcomes (and associated human capital development) in model population A. But how do we extrapolate that result to other settings? We need to assume that causal pathway, i.e. the mechanism by which the intervention has its effect -- the drug works (okay), school children have fewer parasitic infections (okay), so they have more energy (maybe), are able to attend school earlier (are you sure?) and are more attentive in class (maybe), leading to better educational outcomes -- also obtains in the target population, often in some very different social context. In other words, the causal inference is only useful when it is accompanied by all kinds of other evidence as to whether the causal inference can survive extrapolation. (Please don't take me to be saying deworming is bad. I'm in favour of it, but not because of its affects on human capital accumulation!)

So what's the problem? Well in some sense there isn't one. RCTs are great! But they have to be accompanied by careful empirical research, as you say. But ideal (or close to ideal) RCTs are extremely expensive and time consuming. Furthermore, from a policy design perspective, placing RCTs on a pedestal may come at the cost of the other types of research necessary for good usable policies. Institutional demands for RCTs may also restrict funding for plausible evidence-based (but not RCT based!) interventions. There is nothing wrong with experimenting.

In my view, extrapolation is the real challenge for RCTs in development economics. It is a problem that medicine doesn't really need to grapple with to the same extent -- but they do. And economists should too.

Expand full comment

1 reply by Noah Smith

Rory Hester

Blue Collar Notes

Dec 15, 2020Liked by Noah Smith

I used to be an educational blogger for a few years. I am pretty familiar with educational research, and how poor it is. Even when its good... people ignore the results.

I suggest looking at "Project Follow Through" https://en.wikipedia.org/wiki/Follow_Through_(project)

It was the largest most rigorous education experiment ever conducted, and it clearly showed that Direct Instruction was the most effective teaching method for young children, yet here we are with project based learning taking over our schools.

RCTs are only good if we actually design them well, and then actually pay attention to their results even if they give answers that we disagree with.

Expand full comment

4 replies by Noah Smith and others

21 more comments...

No posts

Noahpinion

RCTs vs. intuition

The eternal struggle