Читать книгу Practitioner's Guide to Using Research for Evidence-Informed Practice - Allen Rubin - Страница 62

3.3.4 What Intervention, Program, or Policy Has the Best Effects?

Оглавление

As we've already noted, tightly controlled experimental designs are the gold standard when we are seeking evidence about whether a particular intervention – and not some alternative explanation – is the real cause of a particular outcome. Suppose, for example, we are employing an innovative new therapy for treating survivors of a very recent traumatic event such as a natural disaster or a crime. Our aim would be to alleviate their acute trauma symptoms or to prevent the development of posttraumatic stress disorder (PTSD).

If all we know is that their symptoms improve after our treatment, we cannot rule out plausible alternative explanations for that improvement. Maybe our treatment had little or nothing to do with it. Instead, perhaps most of the improvement can be attributed to the support they received from relatives or other service providers. Perhaps the mere passage of time helped. We can determine whether we can rule out the plausibility of such alternative explanations by randomly assigning survivors to an experimental group that receives our innovative new therapy versus a control group that receives routine treatment as usual. If our treatment group has a significantly better outcome on average than the control group, we can rule out contemporaneous events or the passage of time as plausible explanations, since both groups had an equal opportunity to have been affected by such extraneous factors.

Suppose we did not randomly assign survivors to the two groups. Suppose instead we treated those survivors who were exhibiting the worst trauma symptoms in the immediate aftermath of the traumatic event and compared their outcomes to the outcomes of the survivors whom we did not treat. Even if the ones we treated had significantly better outcomes, our evidence would be more flawed than with random assignment. That's because the difference in outcome might have had more to do with differences between the two groups to begin with. Maybe our treatment group improved more simply because their immediate reaction to the trauma was so much more extreme that even without treatment their symptoms would have improved more than the less extreme symptoms of the other group.

As another alternative to random assignment, suppose we simply compared the outcomes of the survivors we treated to the outcomes of the ones who declined our services. If the ones we treated had on average better outcomes, that result very plausibly could be due to the fact that the ones who declined our treatment had less motivation or fewer support resources than those who wanted to and were able to utilize our treatment.

In each of the previous two examples, the issue is whether the two groups being compared were really comparable. To the extent that doubt exists as to their comparability, the research design is said to have a selectivity bias. Consequently, when evaluations of outcome compare different treatment groups that have not been assigned randomly, they are called quasi-experiments. Quasi-experiments have the features of experimental designs, but without the random assignment.

Not all quasi-experimental designs are equally vulnerable to selectivity biases. A design that compares treatment recipients to treatment decliners, for example, would be much more vulnerable to a selectivity bias than a design that provides the new treatment versus the routine treatment depending solely on whether the new treatment therapists have caseload openings at the time of referral of new clients. (The latter type of quasi-experimental design is called an overflow design.)

So far we have developed a pecking order of four types of designs for answering EIP questions about effectiveness. Experiments are at the top, followed by quasi-experiments with relatively low vulnerabilities to selectivity biases. Next come quasi-experiments whose selectivity bias vulnerability represents a severe and perhaps fatal flaw. At the bottom are designs that assess client change without using any control or comparison group whatsoever.

But our hierarchy is not yet complete. Various other types of studies are used to assess effectiveness. One alternative is called single-case designs. You may have seen similar labels, such as single-subject designs, single-system experiments, and so on. All these terms mean the same thing: a design in which a single client or group is assessed repeatedly at regular intervals before and after treatment commences. With enough repeated measurements in each phase, it can be possible to infer which explanation for any improvement in trauma symptoms is more plausible: treatment effects versus contemporaneous events or the passage of time. We examine this logic further later in this book. For now, it is enough to understand that when well executed, these designs can offer some useful, albeit tentative, evidence about whether an intervention really is the cause of a particular outcome. Therefore, these designs merit a sort of medium status on the evidentiary hierarchy for answering EIP questions about effectiveness.

Next on the hierarchy come correlational studies. Instead of manipulating logical arrangements to assess intervention effectiveness, correlational studies attempt to rely on statistical associations that can yield preliminary, but not conclusive, evidence about intervention effects. For example, suppose we want to learn what, if any, types of interventions may be effective in preventing risky sexual behavior among high school students. Suppose we know that in some places the students receive sex education programs that emphasize abstinence only, while in other places the emphasis is on safe-sex practices. Suppose we also know that some settings provide faith-based programs, others provide secular programs, and still others provide no sex education. We could conduct a large-scale survey with many students in many different schools and towns, asking them about the type of sex education they have received and about the extent to which they engage in safe and unsafe sex. If we find that students who received the safe-sex approach to sex education are much less likely to engage in unsafe sex than the students who received the abstinence-only approach, that would provide preliminary evidence as to the superior effectiveness of the safe-sex approach.

Correlational studies typically also analyze data on a variety of other experiences and background characteristics and then use multivariate statistical procedures to see if differences in the variable of interest hold up when those other experiences and characteristics are held constant. In the sex education example, we might find that the real explanation for the differences in unsafe-sex practices is the students' socioeconomic status or religion. Perhaps students who come from more affluent families are both more likely to have received the safe-sex approach as well as less likely to engage in unsafe sex. In that case, if we hold socioeconomic status constant using multivariate statistical procedures, we might find no difference in unsafe-sex practices among students at a particular socioeconomic level regardless of what type of sex education they received.

Suppose we had found that students who received the abstinence-only sex education approach, or a faith-based approach, were much less likely to engage in unsafe sex. Had we held religion constant in our analysis, we might have found that students of a certain religion or those who are more religious are both more likely to have received the abstinence-only or faith-based approach and less likely to engage in unsafe sex. By holding religion or religiosity constant, we might have found no difference in unsafe-sex practices among students who did and did not receive the abstinence-only or a faith-based approach.

Although correlational studies are lower on the hierarchy than experiments and quasi-experiments (some might place them on a par with or slightly above or slightly below single-case experiments on an effectiveness research hierarchy – there is not complete agreement on the exact order of hierarchies), they derive value from studying larger samples of people under real-world conditions. Their main drawback is that correlation, alone, does not imply causality. As illustrated in the sex education example, some extraneous variable – other than the intervention variable of interest – might explain away a correlation between type of intervention and a desired outcome. All other methodological things – such as quality of measurement – being equal, studies that control statistically for many extraneous variables that seem particularly likely to provide alternative explanations for correlations between type of intervention and outcome provide better evidence about possible intervention effects than studies that control for few or no such variables.

However, no matter how many extraneous variables are controlled for, there is always the chance of missing the one that really matters. Another limitation of correlational studies is the issue of time order. Suppose we find in a survey that the more contact youths have had with a volunteer mentor from a Big Brother/Big Sister program, the fewer antisocial behaviors they have engaged in. Conceivably, the differences in antisocial behaviors might explain differences in contact with mentors, instead of the other way around. That is, perhaps the less antisocial youths are to begin with, the more likely they are to spend time with a mentor, and the more motivated the mentor will be to spend time with them.

Thus, our ability to draw causal inferences about intervention effects depends on not just correlation, but also on time order and on eliminating alternative plausible explanations for differences in outcome. When experiments randomly assign an adequate number of participants to different treatment conditions, we can assume that the groups will be comparable in terms of plausible alternative explanations. Random assignment also lets us assume that the groups are comparable in terms of pretreatment differences in outcome variables. Moreover, most experiments administer pretests to handle possible pretreatment differences. This explains why experiments using random assignment rank higher on the hierarchy for assessing intervention effectiveness than do correlational studies.

At the bottom of the hierarchy are the following types of studies:

 Anecdotal case reports

 Pretest-posttest studies without control groups

 Qualitative descriptions of client experiences during or after treatment

 Surveys of clients asking what they think helped them

 Surveys of practitioners asking what they think is effective

Residing at the bottom of the hierarchy does not mean that these studies have no evidentiary value regarding the effectiveness of interventions. Each of these types of studies can have significant value. Although none of them meet the three criteria for inferring causality (i.e., establishing correlation and time order while eliminating plausible alternative explanations), they each offer some useful preliminary evidence that can inform practice decisions when higher levels of evidence are not available for a particular type of problem or practice context. Moreover, each can generate hypotheses about interventions that can then be tested in studies providing more control for alternative explanations.

Table 3.1 is an example of a research hierarchy representing the various types of studies and their levels on the evidentiary hierarchy for answering EIP questions about effectiveness and prevention. Effectiveness evidence hierarchies are the most commonly described hierarchies in research, but we could create an analogous list for each of the different types of EIP questions.

TABLE 3.1 Evidentiary Hierarchy for EIP Questions about Effectiveness

Level Type of study
1 Systematic reviews and meta-analyses
2 Multisite replications of randomized experiments
3 Randomized experiment
4 Quasi-experiments
5 Single-case experiments
6 7 Correlational studies Pretest/posttest studies without control groups
8 Other:Anecdotal case reportsQualitative descriptions of client experiences during or after treatmentSurveys of clients about what they think helped themSurveys of practitioners about what they think is effective

Note: Best evidence at Level 1.

Notice that we have not yet discussed the types of studies residing in the top two levels of that table. You might also notice that Level 3 contains the single term randomized experiment. What distinguishes that level from the top two levels is the issue of replication. We can have more confidence about the results of an experiment if its results are replicated in other experiments conducted by other investigators at other sites. Thus, a single randomized experiment is below multisite replications of randomized experiments on the hierarchy. This hierarchy assumes that each type of study is well designed. If not well designed, then a particular study would merit a lower level on the hierarchy. For example, a randomized experiment with egregiously biased measurement would not deserve to be at Level 3 and perhaps would be so fatally flawed as to merit dropping to the lowest level. The same applies to a quasi-experiment with a severe vulnerability to a selectivity bias.

Typically, however, replications of experiments produce inconsistent results (as do replications of studies using other designs). Moreover, replications of studies that evaluate different interventions relevant to the same EIP question can accumulate and produce a bewildering array of disparate findings as to which intervention approach is the most effective. The studies at the top level of the hierarchy – systematic reviews (SR) and meta-analyses – attempt to synthesize and develop conclusions from the diverse studies and their disparate findings. Thyer (2004) described systematic reviews (SR) as follows:

In an SR, independent and unbiased researchers carefully search for every published and unpublished report available that deals with a particular answerable question. These reports are then critically analyzed, and – whether positive or negative, whether consistent or inconsistent – all results are assessed, as are factors such as sample size and representativeness, whether the outcome measures were valid, whether the interventions were based on replicable protocols or treatment manuals, what the magnitude of observed effects were, and so forth. (p. 173)

Although systematic reviews often will include and critically analyze every study they find, not just randomized experiments, they should give more weight to randomized experiments than to less controlled studies in developing their conclusions. Some systematic reviews, such as those registered with the Campbell or Cochrane collaborations, require researchers to meet strict standards related to methods used to find studies and quality standards for the studies that will or will not be included in the review itself.

A more statistically oriented type of systematic review is called meta-analysis. Meta-analyses often include only randomized experiments, but sometimes include quasi-experimental designs and other types of studies as well. The main focus of meta-analysis is to aggregate the statistical findings of different studies that assess the effectiveness of a particular intervention. A prime aim of meta-analysis is to calculate the average strength of an intervention's effect by aggregating the effect strength reported in each individual study. Meta-analyses also can assess the statistical significance of the aggregated results. When meta-analyses include studies that vary in terms of methodological rigor, they also can assess whether the aggregated findings differ according to the quality of the methodology. The most powerful approach to a systematic review is the combination of the rigorous and transparent searching methods, clear criteria for inclusion an exclusion of selected studies, and statistical aggregation of data.

Some meta-analyses will compare different interventions that address the same problem. For example, a meta-analysis might calculate the average strength of treatment effect across experiments that evaluate the effectiveness of exposure therapy in treating PTSD, then do the same for the effectiveness of eye movement desensitization and reprocessing (EMDR) in treating PTSD, and then compare the two results as a basis for considering which treatment has a stronger impact on PTSD.

You can find some excellent sources for unbiased systematic reviews and meta-analyses in Table 2.2 in Chapter 2. Later in this book, Chapter 8 examines how to critically appraise systematic reviews and meta-analyses. Critically appraising them is important because not all of them are unbiased or of equal quality. It is important to remember that to merit a high level on the evidentiary hierarchy, an experiment, systematic review, or meta-analysis needs to be conducted in an unbiased manner. In that connection, what we said earlier about Table 3.1 is very important, and thus merits repeating here:

This hierarchy assumes that each type of study is well designed. If not well designed, then a particular study would merit a lower level on the hierarchy.

For example, a randomized experiment with egregiously biased measurement would not deserve to be at Level 3 and perhaps would be so fatally flawed as to merit dropping to the lowest level. The same applies to a quasi-experiment with a severe vulnerability to a selectivity bias.

Practitioner's Guide to Using Research for Evidence-Informed Practice

Подняться наверх