Читать книгу Experimental Evaluation Design for Program Improvement - Laura R. Peck - Страница 10
Chapter 1 Introduction
ОглавлениеThe concepts of cause and effect are critical to the field of program evaluation. After all, establishing a causal connection between a program and its effects is at the core of what impact evaluations do. The field of program evaluation has its roots in the social work research of the settlement house movement and in the business-sector’s efficiency movement, both at the turn of the 20th century. Evaluation as we know it today emerged from the Great Society Era, when large scale demonstrations tested new, sweeping interventions to improve many aspects of our social, political, and economic worlds. Specifically, it was the Elementary and Secondary Education Act of 1965 that first stipulated evaluation requirements (Hogan, 2007). Thereafter, a slew of scholarly journals launched and, to accompany them, academic programs to train people in evaluation methods. Since then scholars, practitioners and policymakers have increased their awareness of the diversity of questions that program evaluation pursues. This has coupled with a broadening range of evaluation approaches to address not only whether programs work but also what works, for whom, and under what circumstances (e.g., Stern et al., 2012). Program evaluation as a profession is diverse, and scholars and practitioners can be found in a wide array of settings from small, community-based nonprofits to the largest of federal agencies.
As those program administrators and policymakers seek to establish, implement, and evolve their programs and public policies, measuring the effectiveness of the programs or policies is essential to justifying ongoing funding, enacting policy changes to improve it, or terminating. In doing so, impact evaluations must isolate a program’s impact from the many other possible explanations that exist for any observed difference in outcomes. How much of the improvement in outcomes (that is, the “impact”) is due to the program involves estimating what would have happened in the program’s absence (the “counterfactual”). As of 2019, we are amid an era of “evidence-based” policy-making, which implies that the results of evaluation research inform what we choose to implement, how we choose to improve, and whether we terminate certain public and nonprofit programs and policies.
Experimentally designed evaluations—those that randomize to treatment and control groups—offer a convincing means for establishing a causal connection between a program and its effects. Over the last roughly 3 decades, experimental evaluations have been growing substantially in numbers and diversity of their application. For example, Greenberg and Shroder’s 2004 Digest of Social Experiments counted 293 such evaluations since the beginning of their use to study public policy in the 1970s. The Randomized Social Experiments eJournal that replaced the Digest beginning in 2007 identifies additional thousands of experiments since then.
The past few decades have shown that experimental evaluations are feasible in a wide variety of settings. The field has gotten quite good at executing experiments that aim to answer questions about average impacts of policies and programs. Over this same time period there has been increased awareness of a broad range of cause-and-effect questions that evaluation research examines and corresponding methodological innovation and creativity to meet increased demand from the field. That said, experimental evaluations have been subject to criticism, for a variety of reasons (e.g., Bell & Peck, 2015).
The main criticism that compels this book is that experimental evaluations are not suited to disaggregating program impacts in ways that connect to program implementation or practice. That is, experiments have earned a reputation for being a relatively blunt tool, where program implementation details are a “black box.” The complexity, implementation, and nuance of a program itself tends to be overlooked when an evaluation produces a single number (the “impact”) to represent the program’s effectiveness.