Читать книгу Experimental Evaluation Design for Program Improvement - Laura R. Peck - Страница 17
Getting Inside the Black Box
ОглавлениеAcross these four main categories of experimental evaluation, there has been substantial activity regarding moving beyond estimating the average treatment effect to understand more about how impacts vary across a variety of dimensions. For example, how do treatment effects vary across subgroups of interest? What are the mediators of treatment effects? How do treatment effects vary along dimensions of program implementation features or the fidelity of implementation to program theory? Most efforts to move beyond estimating the average treatment effect involve data analytic strategies rather than evaluation design strategies. These analytic strategies have been advanced in order to expose what is inside the “black box.”
As noted in Box 1.1, the black box refers to the program as implemented, which can be somewhat of a mystery in impact evaluations: We know that the impact was this, but we have little idea what caused the impact. In order to expose what is inside the black box, impact evaluations often are paired with implementation evaluation. The latter provides the detail needed to understand the program’s operations. That detail is helpful descriptively: It allows the user of the evaluation to associate the impact with some details of the program from which it arose. The way I have described this is at an aggregate level: The program’s average impact represents what the program as a whole did or offered. Commonly, a program is not a single thing: It can vary by setting, in terms of the population it serves, by design elements, by various implementation features, and also over time. The changing nature of interventions in practice demands that evaluation also account for that complexity.1
Within the field of program evaluation, the concept of impact variation has gained traction in recent years. The program’s average impact is one metric by which to judge the program’s worth, but that impact is likely to vary along multiple dimensions. For example, it can vary for distinct subgroups of participants. It might also vary depending on program design or implementation: Programs that offer X and Y might be more effective than those offering only X; programs where frontline staff have greater experience or where the program manager is an especially dynamic leader might be more effective than those without. These observations about what makes up a program and how it is implemented have become increasingly important as potential drivers of impact.
Accordingly, the field has expanded the way it thinks about impacts, to be increasingly interested in impact variation. Assessments of how impacts vary—what works, for whom, and under what circumstances—are currently an important topic within the field. The field has expanded its toolkit of analytic strategies for understanding impact variation to addressing “what works” questions, this book will focus on design options for examining impact variation.2
1 In Peck (2015), I explicitly discuss “programmatic complexity” and “temporal complexity” as key factors that suggest specific evaluation approaches, both in design and analysis.
2 For a useful treatment of the relevant analytic strategies—including an applied illustration using the Moving to Opportunity (MTO) demonstration—I refer the reader to Chapter 7 in New Directions for Evaluation #152 (Peck, 2016).