Читать книгу Stat-Spotting - Joel Best - Страница 12
ОглавлениеC
BLUNDERS
Some bad statistics are the products of flubs—elementary errors. While some of these mistakes might involve intentional efforts to deceive, they often reflect nothing more devious than innocent errors and confusion on the part of those presenting the numbers. For instance, after Alberta’s health minister told students at a high school that they lived in the “suicide capital of Canada,” a ministry spokesperson had to retract the claim and explain that the minister had spoken to a doctor and “misinterpreted what they talked about.” In fact, a health officer assured the press, the local “suicide rate is among the lowest in the region and has been on a steady decline since the mid-1990s.”1
Innumeracy—the mathematical equivalent of illiteracy—affects most of us to one degree or another.2 Oh, we may have a good grasp of the basics, such as simple arithmetic and percentages, but beyond those, things start to get a little fuzzy, and it is easy to become confused. This confusion can affect everyone—those who produce figures, the journalists who repeat them, and the audience that hears them. An error—say, misplacing a decimal point—may go unnoticed by the person doing the calculation. Members of the media may view their job as simply to repeat accurately what their sources say; they may tell themselves it isn’t their responsibility to check their sources’ arithmetic. Those of us in the audience may assume that the media and their sources are the ones who know about this stuff, and that what they say must be about right. And because we all have a tendency to assume that a number is a hard fact, everyone feels free to repeat the figure. Even if someone manages to correct the mistake in newspaper A, the blunder takes on a life of its own and continues to crop up on TV program B, Web site C, and blog D, which can lead still more people to repeat the error.
And yet it can be remarkably easy to spot basic blunders. In some cases, nothing more than a moment’s thought is enough to catch a mistake. In others, our statistical benchmarks can provide a rough and ready means for checking the plausibility of numbers.
C1The Slippery Decimal Point
The decimal point is notoriously slippery. Move it just one place to the right and—wham!—you have ten times as many of whatever you were counting. Move it just one digit to the left and—boom!—only a tenth as many. For instance, the Associated Press reported that the final Harry Potter book sold at a magical clip on the day it was released, averaging “300,000 copies in sales per hour—more than 50,000 a minute.”3 Of course, the correct per-minute figure was only 5,000 copies, but this obvious mistake was overlooked not only by the reporter who wrote the sentence but also by the editors at AP and at the various papers that ran the story unchanged.
Misplacing a decimal point is an easy mistake to make. Sometimes our sense of the world—our set of mental benchmarks—leads us to suspect that some number is improbably large (or small), but errors can be harder to spot when we don’t have a good sense of the correct number in the first place.
LOOK FORNumbers that seem surprisingly large–or surprisingly small |
EXAMPLE: HOW MANY MINUTES BETWEEN TEEN SUICIDES?
“Today, a young person, age 14–26, kills herself or himself every 13 minutes in the United States.”–Headline on a flyer advertising a book
When I first read this headline, I wasn’t sure whether the statistic was accurate. Certainly, all teen suicide is tragic; whatever the frequency of these acts, it is too high. But could this really be happening every 13 minutes?
A bit of fiddling with my calculator showed me that there are 525,600 minutes in a year (365 days × 24 hours per day × 60 minutes per hour = 525,600). Divide that by 13 (the supposed number of minutes between young people’s suicides), and we get 40,430 suicides per year. That sure seems like a lot–in fact, you may remember from our discussion of statistical benchmarks that the annual total number for suicides by people of all ages is only about 38,000. So right away we know something’s wrong.
In fact, government statistics tell us that there were only 4,010 suicides by young people age 15–24 in 2002 (the year the headline appeared).4 That works out to one every 131–not 13–minutes. Somebody must have dropped a decimal point during their calculations and, instead of producing a factoid, created what we might call a fictoid–a colorful but completely erroneous statistic. (Sharp-eyed readers may have noticed that, in the process, the age category 15–24 [fairly standard in government statistical reports] morphed into 14–26.)
You’ve probably seen other social problems described as occurring “every X minutes.” This is not a particularly useful way of thinking. In the first place, most of us have trouble translating these figures into useful totals, because we don’t have a good sense of how many minutes there are in a year. Knowing that there are roughly half a million–525,600–minutes in a year is potentially useful–a good number to add to our list of benchmarks. Thus, you might say to yourself, “Hmm. Every 13 minutes would be roughly half a million divided by 13, say, around 40,000. That seems like an awful lot of suicides by young people.”
Moreover, we should not compare minutes-between-events figures from one year to the next. For instance, below the headline quoted above, the flyer continued: “Thirty years ago the suicide rate in the same group was every 26 minutes. Why the epidemic increase?” The problem here is that the population rises each year, but the number of minutes per year doesn’t change. Even if young people continue to commit suicide at the same rate (about 9.9 suicides per 100,000 young people in 2002), as the number of young people increases, their number of suicides will also rise, and the number of minutes between those suicides will fall. While we intuitively assume that a declining number of minutes between events must mean that the problem is getting worse, that decline might simply reflect the growing population. The actual rate at which the problem is occurring might be unchanged–or even declining.
C2Botched Translations
It is not uncommon for people to repeat a statistic they don’t actually understand. Then, when they try to explain what this number means, they get it wrong, so that their innumeracy suddenly becomes visible. Or, at least it would be apparent if someone understood the blunder and pointed it out.
LOOK FORExplanations that convert statistics into simpler language with surprising implications |
EXAMPLE: MANGLING THE THREAT OF SECONDHAND SMOKE
In a press release, the British Heart Foundation’s director for Scotland was quoted as saying: “We know that regular exposure to second-hand smoke increases the chances of developing heart disease by around 25%. This means that, for every four non-smokers who work in a smoky environment like a pub, one of them will suffer disability and premature death from a heart condition because of second-hand smoke.”5
Well, no, that isn’t what it means–not at all. People often make this blunder when they equate a percentage increase (such as a 25 percent increase in risk of heart disease) with an actual percentage (25 percent will get heart disease). We can make this clear with a simple example (the numbers that I am about to use are made up). Suppose that, for every 100 nonsmokers, 4 have heart disease; that means the risk of having heart disease is 4 per 100. Now let’s say that exposure to secondhand smoke increases a nonsmoker’s risk of heart disease by 25 percent. What’s 25 percent of 4? One. So, among nonsmokers exposed to secondhand smoke, the risk of heart disease is 5 per 100 (that is, the initial risk of 4 plus an additional 1 [25 percent of 4]). The official quoted in the press release misunderstands what it means to speak of an increased risk and thinks that the risk of disease for nonsmokers exposed to secondhand smoke is 25 per 100. To use more formal language, the official is conflating relative and absolute risk.
The error was repeated in a newspaper story that quoted the press release. It is worth noting that at no point did the reporter quoting this official note the mistake (nor did an editor at the paper catch the error).6 Perhaps they understood that the official had mangled the statistic but decided that the quote was accurate. Or–one suspects this may be more likely–perhaps they didn’t notice that anything was wrong. We can’t count on the media to spot and correct every erroneous number.
Translating statistics into more easily understood terms can help us get a feel for what numbers mean, but it may also reveal that those doing the translation don’t understand what they’re saying.
C3Misleading Graphs
The computer revolution has made it vastly easier for journalists not just to create graphs but to produce jazzy, eye-catching displays of data. Sometimes the results are informative (think about the weather maps—pioneered by USA Today—that show different-colored bands of temperature and give a wonderfully clear sense of the nation’s weather pattern).
But a snazzy graph is not necessarily a good graph. A graph is no better than the thinking that went into its design. And even the most familiar blunders—the errors that every guidebook on graph design warns against—are committed by people who really ought to know better.7
LOOK FORGraphs that are hard to decipherGraphs in which the image doesn’t seem to fit the data |
EXAMPLE: SIZING UP METH CRYSTALS
The graph shown here appeared in a major news magazine.8 It depicts the results of a study of gay men in New York City that divided them into two groups: those who tested positive for HIV, and those who tested negative. The men were asked whether they had ever tried crystal meth. About 38 percent of the HIV-positive men said they had, roughly twice the percentage (18 percent) among HIV-negative men.
Although explaining these findings takes a lot less than a thousand words, Newsweek decided to present them graphically. The graph illustrates findings for each group using blobs–presumably representing meth crystals. But a glance tells us that the blob/crystal for the HIV-positive group is too large; it should be about twice the size of the HIV-negative group’s crystal, but it seems much larger than that.
Graph with figures giving misleading impression.
We can guess what happened. Someone probably figured that the larger crystal needed to be twice as tall and twice as wide as its smaller counterpart. But of course that’s wrong: a figure twice as wide and twice as tall is four–not two–times larger than the original. That’s a familiar error, one that appears in many graphs. And it gets worse: the graph tries to portray the crystals as three dimensional. To the degree that this illusion is successful, the bigger crystal seems twice as wide, twice as tall, and twice as deep–eight times larger.
But what makes this graph really confusing is its use of different-sized fonts to display the findings. The figure “37.8%” is several times larger than “18%.” Adding to the problem is the decision to print the larger figure as three digits plus a decimal point, while its smaller counterpart has only two digits. The result is an image that manages to take a simple, easily understood comparison between two percentages and convey a wildly misleading impression.
We can suspect that the ease with which graphic artists can use computer software to manipulate the sizes of images and fonts contributed to this mangled image. Attractive graphs are preferable to ugly graphs–but only so long as artistic considerations don’t obscure the information the graph is supposed to convey.
C4Careless Calculations
Many statistics are the result of strings of calculations. Numbers—sometimes from different sources—are added, multiplied, or otherwise manipulated until a new result emerges. Often the media report only that final figure, and we have no easy way of retracing the steps that led to it. Yet when statistics seem incredible, when we find ourselves wondering whether things can possibly be that bad, it can be worth trying to figure out how a number was brought into being. Sometimes we can discover that the numbers just don’t add up, that someone almost certainly made a mistake.
LOOK FORAs with other sorts of blunders, numbers that seem surprisingly high or lowNumbers that seem hard to produce–how could anyone calculate that? |
EXAMPLE: DO UNDERAGE DRINKERS CONSUME 18 PERCENT OF ALCOHOL?
A 2006 study published in a medical journal concluded that underage and problem drinkers accounted for more than a third of the money spent on alcohol in the United States.9 The researchers calculated that underage drinkers (those age 12–20) consume about 18 percent of all alcoholic drinks–more than 20 billion drinks per year. Right away, we notice that that’s a really big number. But does it make sense?
Our benchmarks tell us that each recent age cohort contains about 4 million people (that is, there are about 4 million 12-year-olds, 4 million 13-year-olds, and so on). So we can figure there are about 36 million young people age 12–20. If we divide 36 million into 20 billion, we get more than 550 drinks per person per year. That is, young people would have to average 46 drinks per month. That sure seems like a lot.
Of course, many underage people don’t drink at all. In fact, the researchers calculated that only 47.1 percent were drinkers. That would mean that there are only about 17 million underage drinkers (36 million × .471): in order for them to consume 20 billion drinks per year, those young drinkers would have to average around 1,175 drinks per year–nearly 100 drinks per month, or about one drink every eight hours.
But this figure contradicts the researchers’ own data. Their article claims that underage drinkers consume an average of only 35.2 drinks per month. Let’s see: if we use the researchers’ own figures, we find that 17 million underage drinkers × 35.2 drinks per month equals a total of just under 600 million drinks per month, × 12 months per year = equals 7.2 billion drinks by underage drinkers per year–not 20 billion. Somehow, somewhere, someone made a simple arithmetic error, one that nearly tripled the estimate of what underage drinkers consume. According to the researchers, Americans consume 111 billion drinks per year. If youths actually drink 7.2 billion of those, that would mean that underage drinkers account for about 6.5 percent–not 18 percent–of all the alcohol consumed.
The fact that we can’t make the researchers’ own figures add up to 20 billion drinks is not the end of the story.10 One could go on to question some of the study’s other assumptions. For example, although there are some young people who drink daily, we might suspect that drinking–and frequency of drinking–increases with age, that even a large proportion of youths who are “current drinkers” find their opportunities to drink limited mostly to weekends. One might suspect that young drinkers average less than 35 drinks per month. Reducing the estimate by only 5 drinks per month would cut our estimate for total drinks consumed in a year by underage drinkers by another billion. The assumptions that analysts make–even when they don’t make calculation errors–shape the resulting figures.