Читать книгу Applied Univariate, Bivariate, and Multivariate Statistics Using Python - Daniel J. Denis - Страница 17

1.3 Quantifying Error Rates in Decision-Making: Type I and Type II Errors

Оглавление

As discussed thus far, decision-making is risky business. Virtually all decisions are made with at least some degree of risk of being wrong. How that risk is distributed and calibrated, and the costs of making the wrong decision, are the components that must be considered before making the decision. For example, again with the coin, if we start out assuming the coin is fair (null hypothesis), then reject that hypothesis after obtaining a large number of heads out of 100 flips, though the decision is logical, reality itself may not agree with our decision. That is, the coin may, in reality, be fair. We simply observed a string of heads that may simply be due to chance fluctuation. Now, how are we ever to know if the coin is fair or not? That’s a difficult question, since according to frequentist probabilists, we would literally need to flip the coin forever to get the true probability of heads. Since we cannot study an infinite population of coin flips, we are always restricted on betting based on the sample, and hoping our bet gets us a lucky outcome.

What may be most surprising to those unfamiliar with statistical inference, is that quite remarkably, statistical inference in science operates on the same philosophical principles as games of chance in Vegas! Science is a gamble and all decisions have error rates. Again, consider the idea of a potential treatment being advanced for COVID-19 in 2020, the year of the pandemic. Does the treatment work? We hope so, but if it does not, what are the risks of it not working? With every decision, there are error rates, and error rates also imply potential opportunity costs. Good decisions are made with an awareness of the benefits of being correct or the costs of being wrong. Beyond that, we roll the proverbial dice and see what happens.

If we set up a null hypothesis, then reject it, we risk a false rejection of the null. That is, maybe in truth the null hypothesis should not have been rejected. This type of error, a false rejection of the null, is what is known as a type I error. The probability of making a type I error is typically set at whatever the level of significance is for the statistical test. Scientists usually like to limit the type I error rate, keeping it at a nominal level such as 0.05. This is the infamous p < 0.05 level. However, this is an arbitrarily set level and there is absolutely no logic or reason to be setting it at 0.05 for every experiment you run. How the level is set should be governed by, you guessed it, your tolerance for risk of making a wrong decision. However, why is minimizing type I error rates usually preferred? Consider the COVID-19 treatment. If the null hypothesis is that it does not work, and we reject that null hypothesis, we probably want a relatively small chance of being wrong. That is, you probably do not want to be taking medication that is promised to work when it does not and nor does the scientific community want to fill their publication space with presumed treatments that in actuality are not effective. Hence, we usually wish to keep type I error rates quite low. It was R.A. Fisher, pioneer of modern-day NHST, who suggested 0.05 as a convenient level of significance. Scientists, afterward, adopted it as “gospel” without giving it further thought (Denis, 2004). As historians of statistics have argued, adopting “p < 0.05” was more of a social and historical phenomenon than a rational and scientific one.

However, error rates go both ways. Researchers often wish to minimize the risk of a type I error, often ignoring the type II error rate. A type II error is failing to reject a false null hypothesis. For our COVID-19 example, this would essentially mean failing to detect that a treatment is effective when in fact it is effective and could potentially save lives. If in reality the null hypothesis is false, yet through our statistical test we fail to detect its falsity, then we could potentially be missing out on a treatment that is effective. So-called “experimental treatments” for a disease (i.e. the “right to try”) are often well-attuned to the risk of making type II errors. That is, the risk of not acting, even on something that has a relatively small probability of working out, may be high, because if it does work out, then the benefits could be substantial.

Virtually all decisions involve a certain degree of risk. A classical hypothesis test involves two error rates. The first is a type I error, which is a false rejection of the null hypothesis. The probability of making a type I error is equal to the significance level set for the test. The second is a type II error, which is failing to reject a false null hypothesis.

Applied Univariate, Bivariate, and Multivariate Statistics Using Python

Подняться наверх