Читать книгу Applied Univariate, Bivariate, and Multivariate Statistics Using Python - Daniel J. Denis - Страница 15

1.1 How Statistical Inference Works

Оглавление

Armed with some examples of the COVID-19 pandemic, we can quite easily illustrate the process of statistical inference on a very practical level. The traditional and classical workhorse of statistical inference in most sciences is that of null hypothesis significance testing (NHST), which originated with R.A. Fisher in the early 1920s. Fisher is largely regarded as the “father of modern statistics.” Most of the classical techniques used today are due to the mathematical statistics developed in the early 1900s (and late 1800s). Fisher “packaged” the technique of NHST for research workers in agriculture, biology, and other fields, as a way to grapple with uncertainty in evaluating hypotheses and data. Fisher’s contributions revolutionized how statistics are used to answer scientific questions (Denis, 2004).

Though NHST can be used in several different contexts, how it works is remarkably the same in each. A simple example will exemplify its logic. Suppose a treatment is discovered that purports to cure the COVID-19 virus and an experiment is set up to evaluate whether it does or not. Two groups of COVID-19 sufferers are recruited who agree to participate in the experiment. One group will be the control group, while the other group will receive the novel treatment. Of the subjects recruited, half will be randomly assigned to the control group, while the other half to the experimental group. This is an experimental design and constitutes the most rigorous means known to humankind for establishing the effectiveness of a treatment in science. Physicists, biologists, psychologists, and many others regularly use experimental designs in their work to evaluate potential treatment effects. You should too!

Carrying on with our example, we set up what is known as a null hypothesis, which in our case will state that the number of individuals surviving in the control group will be the same as that in the experimental group after 30 days from the start of the experiment. Key to this is understanding that the null hypothesis is about population parameters, not sample statistics. If the drug is not working, we would expect, under the most ideal of conditions, the same survival rates in each condition in the population under the null hypothesis. The null hypothesis in this case happens to specify a difference of zero; however, it should be noted that the null hypothesis does not always need to be about zero effect. The “null” in “null hypothesis” means it is the hypothesis to be nullified by the statistical test. Having set up our null, we then hypothesize a statement contrary to the null, known as the alternative hypothesis. The alternative hypothesis is generally of two types. The first is the statistical alternative hypothesis, which is essentially and quite simply a statement of the complement to the null hypothesis. That is, it is a statement of “not the null.” Hence, if the null hypothesis is rejected, the statistical alternative hypothesis is automatically inferred. For our data, suppose after 30 days, the number of people surviving in the experimental group is equal to 50, while the number of people surviving in the control group is 20. Under the null hypothesis, we would have expected these survival rates to be equal. However, we have observed a difference in our sample. Since it is merely sample data, we are not really interested in this particular result specifically. Rather, we are interested in answering the following question:

What is the probability of observing a difference such as we have observed in our sample if the true difference in the population is equal to 0?

The above is the key question that repeats itself in one form or another in virtually every evaluation of a null hypothesis. That is, state a value for a parameter, then evaluate the probability of the sample result obtained in light of the null hypothesis. You might see where the argument goes from here. If the probability of the sample result is relatively high under the null, then we have no reason to reject the null hypothesis in favor of the statistical alternative. However, if the probability of the sample result is low under the null, then we take this as evidence that the null hypothesis may be false. We do not know if it is false, but we reject it because of the implausibility of the data in light of it. A rejection of the null hypothesis does not necessarily mean the null is false. What it does mean is that we will act as though it is false or potentially make scientific decisions based on its presumed falsity. Whether it is actually false or not usually remains an unknown in many cases.

For our example, if the number of people surviving in each group in our sample were equal to 50 spot on, then we definitely would not have evidence to reject the null hypothesis. Why not? Because a sample result of 50 and 50 lines up exactly with what we would expect under the null hypothesis. That is, it lines up perfectly with expectation under the null model. However, if the numbers turned up as they did earlier, 50 vs. 20, and we found the probability of this result to be rather small under the null, then it could be taken as evidence to possibly reject the null hypothesis and infer the alternative that the survival rates in each group are not the same. This is where the substantive or research alternative hypothesis comes in. Why were the survival rates found to be different? For our example, this is an easy one. If we did our experiment properly, it is hopefully due to the treatment. However, had we not performed a rigorous experimental design, then concluding the substantive or research hypothesis becomes much more difficult. That is, simply because you are able to reject a null hypothesis does not in itself lend credit to the substantive alternative hypothesis of your wishes and dreams. The substantive alternative hypothesis should naturally drop out or be a natural consequence of the rigorous approach and controls implemented for the experiment. If it does not, then drawing a substantive conclusion becomes very much more difficult if not impossible. This is one reason why drawing conclusions from correlational research can be exceedingly difficult, if not impossible. If you do not have a bullet-proof experimental design, then logically it becomes nearly impossible to know why the null was rejected. Even if you have a strong experimental design such conclusions are difficult under the best of circumstances, so if you do not have this level of rigor, you are in hot water when it comes to drawing strong conclusions. Many published research papers feature very little scientific support for purported scientific claims simply based on a rejection of a null hypothesis. This is due to many researchers not understanding or appreciating what a rejection of the null means (and what it does not mean). As we will discuss later in the book, rejecting a null hypothesis is, usually, and by itself, no big deal at all.

The goal of scientific research on a statistical level is generally to learn about population parameters. Since populations are usually quite large, scientists typically study statistics based on samples and make inferences toward the population based on these samples. Null hypothesis significance testing (NHST) involves putting forth a null hypothesis and then evaluating the probability of obtained sample evidence in light of that null. If the probability of such data occurring is relatively low under the null hypothesis, this provides evidence against the null and an inference toward the statistical alternative hypothesis. The substantive alternative hypothesis is the research reason for why the null was rejected and typically is known or hypothesized beforehand by the nature of the research design. If the research design is poor, it can prove exceedingly difficult or impossible to infer the correct research alternative. Experimental designs are usually preferred for this (and many other) reasons.

Applied Univariate, Bivariate, and Multivariate Statistics Using Python

Подняться наверх