Evidence and Evidence-Based Policy

Choices, Models and Morals » Lecture 7

Making Policy: Designing Interventions

Treatments and Policies

Public Policy and Economic Theory

Economics Without Economic Theory

The Situation in Medicine

Evidence-Based Medicine

the triumphs of modern medicine can easily lead us to overlook many of its ongoing problems. Even today, too much medical decision-making is based on poor evidence. There are still too many medical treatments that harm patients, some that are of little or no proven benefit, and others that are worthwhile but are not used enough. How can this be, when every year, studies into the effects of treatments generate a mountain of results? Sadly, the evidence is often unreliable and, moreover, much of the research that is done does not address the questions that patients need answered.

Part of the problem is that treatment effects are very seldom overwhelmingly obvious or dramatic. Instead, there will usually be uncertainties about how well new treatments work, or indeed whether they do more good than harm. So carefully designed fair tests – tests that set out to reduce biases and take into account the play of chance… – are necessary to identify treatment effects reliably. (Evans, Thornton, et al. 2011: xx)

Evidence-Based Medicine in Practice

Mill’s Method of Difference

Applying Mill’s Method

Randomised Controlled Trials

RCTs and Policy Evaluation

Limitations of RCTs

What works may not be seen to work

Hurdles to Methodology

What works may not be permitted to be tested

What works here may not work there

The Case of California Class Sizes

there are a number of other reasons why a pilot may not be typical of the policy as it would ultimately be implemented. … as Hasluck … points out, ‘… the resources devoted to the pilot may exceed that (sic) available at national roll out. There may also be a greater level of commitment and a “pioneering spirit” amongst staff involved in delivery’ (Sanderson 2002: 12)

Consider the California class-size reduction programme. The plan was backed up by evidence that class-size reduction is effective for improving reading scores from a well-conducted RCT in Tennessee. Yet in California when class sizes were reduced across the state reading scores did not go up. … There’s a conventional explanation. … California rolled out the programme state-wide and over a short period creating a sudden need for new teachers and new classrooms. So large numbers of poorly qualified teachers were hired and not surprisingly the more poorly qualified teachers went to the more disadvantaged schools. Also classes were held in spaces not appropriate and other educational programmes commonly taken to be conducive to learning to read were curtailed for lack of space (Cartwright 2008: 131; see Reiss 2013: 205–6)

The Curse of Causal Agnosticism

RCTs, when implemented successfully, give us knowledge “cheaply” in the sense that they require no specific background knowledge in order to identify a causal effect from the data. But this does come at an eventual cost: if the understanding of the causal structure that is being experimented on in the RCT is very limited, there are no good grounds for believing that a result will also hold in this or that population that differs from the test population.

In a sense an RCT is a “black-box” method of causal inference. A treatment is administered, an outcome observed, with no need for any understanding of what is going on in between and why a treatment produces its outcome. But if there is no knowledge of why a treatment produces a given outcome, the basis for drawing inferences beyond the immediate test population is very narrow. (Reiss 2013: 205)

The Problem of Reflexivity

Policy Evaluation and Inductive Risk

Causal Conclusions and Significance

Policy Implications and \(p\)-values

The Case of Dioxins

The deliberate choice of a level of statistical significance requires that one consider which kind of errors one is willing to tolerate. … In testing whether dioxins have a particular effect or not, an excess of false positives in such studies will mean that dioxins will appear to cause more harm to the animals than they actually do, leading to overregulation of the chemicals. An excess of false negatives will have the opposite result, causing dioxins to appear less harmful than they actually are, leading to underregulation of the chemicals. Thus, in general, false positives are likely to lead to stronger regulation than is warranted (or overregulation); false negatives are likely to lead to weaker regulation than is warranted (or underregulation). Overregulation presents excess costs to the industries that would bear the costs of regulations. Underregulation presents costs to public health and to other areas affected by damage to public health. Depending on how one values these effects, an evaluation that requires the consultation of non-epistemic values, different balances between false positives and false negatives will be preferable (Douglas 2000: 566–67)

But RCTs are better than nothing

Evidence: Measuring Economic Phenomena

Gathering Evidence

Consumer price inflation will be my main case but I will briefly discuss GDP and unemployment as comparisons. All three variables are regarded as observable by economists. But, as we will see, measuring them requires making a large number of substantial, and often contentious, assumptions. Making these assumptions requires real commitment on the part of the investigator as regards certain facts relevant to the measurement procedure, the measurement purpose as well as evaluative judgments. (Reiss 2013: 150)

The Theory-Dependence of Observation

Ways of Thinking and Evidence Reports

Consequences

Example: CPI

References

Boyd, Nora Mills and James Bogen (2021) Theory and Observation in Science, in Edward N Zalta, ed., The Stanford Encyclopedia of Philosophy. Metaphysics Research Lab, Stanford University. https://plato.stanford.edu/archives/win2021/entries/science-theory-observation/.
Cartwright, Nancy (2007) ‘Are RCTs the Gold Standard?’, BioSocieties 2: 11–20. doi:10.1017/s1745855207005029.
Cartwright, Nancy (2008) ‘Evidence-Based Policy: What’s to Be Done about Relevance?: For the 2008 Oberlin Philosophy Colloquium’, Philosophical Studies 143: 127–36. doi:10.1007/s11098-008-9311-4.
Chemla, Gilles and Christopher A Hennessy (2019) ‘Controls, Belief Updating, and Bias in Medical RCTs’, Journal of Economic Theory 184: 104929. doi:10.1016/j.jet.2019.07.016.
Douglas, Heather (2000) ‘Inductive Risk and Values in Science’, Philosophy of Science 67: 559–79. doi:10.1086/392855.
Evans, Imogen, Hazel Thornton, Iain Chalmers, and Paul Glasziou (2011) Testing Treatments, 2nd edition. Pinter & Martin.
Fodor, Jerry (1984) Observation Reconsidered, Philosophy of Science 51: 23–43.
Hanson, Norwood Russell (1958) Patterns of Discovery. Cambridge University Press.
Hempel, Carl G (1965) ‘Science and Human Values’, in Aspects of Scientific Explanation and Other Essays in the Philosophy of Science: 81–96. The Free Press.
Howick, Jeremy, Iain Chalmers, Paul Glasziou, Trish Greenhalgh, Carl Heneghan, et al. (2011) The Oxford 2011 Levels of Evidence.
Kuhn, Thomas S (1962) The Structure of Scientific Revolutions. University of Chicago Press.
Lewis, David (1973) ‘Causation’, Journal of Philosophy 70: 556–67. doi:10.2307/2025310.
Mill, John Stuart (1874/1974) A System of Logic, Ratiocinative and Inductive, Books IIII, 8th edition, The Collected Works of John Stuart Mill, John M Robson, ed. University of Toronto Press; Routledge & Kegan Paul.
Petrosino, Anthony, Carolyn Turpin-Petrosino, and Sarah Guckenburg (2010) ‘Formal System Processing of Juveniles: Effects on Delinquency’, Campbell Systematic Reviews 6: 1–88.
Reiss, Julian (2013) Philosophy of Economics. Routledge.
Sanderson, Ian (2002) ‘Evaluation, Policy Learning and Evidence‐based Policy Making’, Public Administration 80: 1–22. doi:10.1111/1467-9299.00292.