Читать книгу Bad Science - Ben Goldacre - Страница 20
Randomisation
ОглавлениеLet’s take this out of the theoretical, and look at some of the trials which homeopaths quote to support their practice. I’ve got a bog-standard review of trials for homeopathic arnica by Professor Edward Ernst in front of me, which we can go through for examples. We should be absolutely clear that the inadequacies here are not unique, I do not imply malice, and I am not being mean. What we are doing is simply what medics and academics do when they appraise evidence.
So, Hildebrandt et al. (as they say in academia) looked at forty-two women taking homeopathic arnica for delayed-onset muscle soreness, and found it performed better than placebo. At first glance this seems to be a pretty plausible study, but if you look closer, you can see there was no ‘randomisation’ described. Randomisation is another basic concept in clinical trials. We randomly assign patients to the placebo sugar pill group or the homeopathy sugar pill group, because otherwise there is a risk that the doctor or homeopath—consciously or unconsciously—will put patients who they think might do well into the homeopathy group, and the no-hopers into the placebo group, thus rigging the results.
Randomisation is not a new idea. It was first proposed in the seventeenth century by John Baptista van Helmont, a Belgian radical who challenged the academics of his day to test their treatments like blood-letting and purging (based on ‘theory’) against his own, which he said were based more on clinical experience: ‘Let us take out of the hospitals, out of the Camps, or from elsewhere, two hundred, or five hundred poor People, that have Fevers, Pleurisies, etc. Let us divide them into half, let us cast lots, that one half of them may fall to my share, and the other to yours… We shall see how many funerals both of us shall have.’
It’s rare to find an experimenter so careless that they’ve not randomised the patients at all, even in the world of CAM. But it’s surprisingly common to find trials where the method of randomisation is inadequate: they look plausible at first glance, but on closer examination we can see that the experimenters have simply gone through a kind of theatre, as if they were randomising the patients, but still leaving room for them to influence, consciously or unconsciously, which group each patient goes into.
In some inept trials, in all areas of medicine, patients are ‘randomised’ into the treatment or placebo group by the order in which they are recruited onto the study—the first patient in gets the real treatment, the second gets the placebo, the third the real treatment, the fourth the placebo, and so on. This sounds fair enough, but in fact it’s a glaring hole that opens your trial up to possible systematic bias.
Let’s imagine there is a patient who the homeopath believes to be a no-hoper, a heart-sink patient who’ll never really get better, no matter what treatment he or she gets, and the next place available on the study is for someone going into the ‘homeopathy’ arm of the trial. It’s not inconceivable that the homeopath might just decide—again, consciously or unconsciously—that this particular patient ‘probably wouldn’t really be interested’ in the trial. But if, on the other hand, this no-hoper patient had come into clinic at a time when the next place on the trial was for the placebo group, the recruiting clinician might feel a lot more optimistic about signing them up.
The same goes for all the other inadequate methods of randomisation: by last digit of date of birth, by date seen in clinic, and so on. There are even studies which claim to randomise patients by tossing a coin, but forgive me (and the entire evidence-based medicine community) for worrying that tossing a coin leaves itself just a little bit too open to manipulation. Best of three, and all that. Sorry, I meant best of five. Oh, I didn’t really see that one, it fell on the floor.
There are plenty of genuinely fair methods of randomisation, and although they require a bit of nous, they come at no extra financial cost. The classic is to make people call a special telephone number, to where someone is sitting with a computerised randomisation programme (and the experimenter doesn’t even do that until the patient is fully signed up and committed to the study). This is probably the most popular method amongst meticulous researchers, who are keen to ensure they are doing a ‘fair test’, simply because you’d have to be an out-and-out charlatan to mess it up, and you’d have to work pretty hard at the charlatanry too. We’ll get back to laughing at quacks in a minute, but right now you are learning about one of the most important ideas of modern intellectual history.
Does randomisation matter? As with blinding, people have studied the effect of randomisation in huge reviews of large numbers of trials, and found that the ones with dodgy methods of randomisation overestimate treatment effects by 41 per cent. In reality, the biggest problem with poor-quality trials is not that they’ve used an inadequate method of randomisation, it’s that they don’t tell you how they randomised the patients at all. This is a classic warning sign, and often means the trial has been performed badly. Again, I do not speak from prejudice: trials with unclear methods of randomisation overstate treatment effects by 30 per cent, almost as much as the trials with openly rubbish methods of randomisation.
In fact, as a general rule it’s always worth worrying when people don’t give you sufficient details about their methods and results. As it happens (I promise I’ll stop this soon), there have been two landmark studies on whether inadequate information in academic articles is associated with dodgy, overly flattering results, and yes, studies which don’t report their methods fully do overstate the benefits of the treatments, by around 25 per cent. Transparency and detail are everything in science. Hildebrandt et al., through no fault of their own, happened to be the peg for this discussion on randomisation (and I am grateful to them for it): they might well have randomised their patients. They might well have done so adequately. But they did not report on it.
Let’s go back to the eight studies in Ernst’s review article on homeopathic arnica—which we chose pretty arbitrarily—because they demonstrate a phenomenon which we see over and over again with CAM studies: most of the trials were hopelessly methodologically flawed, and showed positive results for homeopathy; whereas the couple of decent studies—the most ‘fair tests’—showed homeopathy to perform no better than placebo.*
So now you can see, I would hope, that when doctors say a piece of research is ‘unreliable’, that’s not necessarily a stitch-up; when academics deliberately exclude a poorly performed study that flatters homeopathy, or any other kind of paper, from a systematic review of the literature, it’s not through a personal or moral bias: it’s for the simple reason that if a study is no good, if it is not a ‘fair test’ of the treatments, then it might give unreliable results, and so it should be regarded with great caution.
There is a moral and financial issue here too: randomising your patients properly doesn’t cost money. Blinding your patients to whether they had the active treatment or the placebo doesn’t cost money. Overall, doing research robustly and fairly does not necessarily require more money, it simply requires that you think before you start. The only people to blame for the flaws in these studies are the people who performed them. In some cases they will be people who turn their backs on the scientific method as a ‘flawed paradigm’; and yet it seems their great new paradigm is simply ‘unfair tests’.
These patterns are reflected throughout the alternative therapy literature. In general, the studies which are flawed tend to be the ones that favour homeopathy, or any other alternative therapy; and the well-performed studies, where every controllable source of bias and error is excluded, tend to show that the treatments are no better than placebo.
This phenomenon has been carefully studied, and there is an almost linear relationship between the methodological quality of a homeopathy trial and the result it gives. The worse the study—which is to say, the less it is a ‘fair test’—the more likely it is to find that homeopathy is better than placebo. Academics conventionally measure the quality of a study using standardised tools like the ‘Jadad score’, a seven-point tick list that includes things we’ve been talking about, like ‘Did they describe the method of randomisation?’ and ‘Was plenty of numerical information provided?’
This graph, from Ernst’s paper, shows what happens when you plot Jadad score against result in homeopathy trials. Towards the top left, you can see rubbish trials with huge design flaws which triumphantly find that homeopathy is much, much better than placebo. Towards the bottom right, you can see that as the Jadad score tends towards the top mark of 5, as the trials become more of a ‘fair test’, the line tends towards showing that homeopathy performs no better than placebo.
There is, however, a mystery in this graph: an oddity, and the makings of a whodunnit. That little dot on the right-hand edge of the graph, representing the ten best-quality trials, with the highest Jadad scores, stands clearly outside the trend of all the others. This is an anomalous finding: suddenly, only at that end of the graph, there are some good-quality trials bucking the trend and showing that homeopathy is better than placebo.
What’s going on there? I can tell you what I think: some of the papers making up that spot are a stitch-up. I don’t know which ones, how it happened, or who did it, in which of the ten papers, but that’s what I think. Academics often have to couch strong criticism in diplomatic language. Here is Professor Ernst, the man who made that graph, discussing the eyebrow-raising outlier. You might decode his Yes, Minister diplomacy, and conclude that he thinks there’s been a stitch-up too.
There may be several hypotheses to explain this phenomenon. Scientists who insist that homeopathic remedies are in every way identical to placebos might favour the following. The correlation provided by the four data points (Jadad score 1–4) roughly reflects the truth. Extrapolation of this correlation would lead them to expect that those trials with the least room for bias (Jadad score = 5) show homeopathic remedies are pure placebos. The fact, however, that the average result of the 10 trials scoring 5 points on the Jadad score contradicts this notion, is consistent with the hypothesis that some (by no means all) methodologically astute and highly convinced homeopaths have published results that look convincing but are, in fact, not credible.
But this is a curiosity and an aside. In the bigger picture it doesn’t matter, because overall, even including these suspicious studies, the ‘meta-analyses’ still show, overall, that homeopathy is no better than placebo.
Meta-analyses?