Читать книгу Applied Univariate, Bivariate, and Multivariate Statistics - Daniel J. Denis - Страница 30

2.1.2 Binomial Distributions

The binomial distribution is given by:

where,

p(r) is the probability of observing r occurrences out of n possible occurrences,²

p is the probability of a “success” on any given trial, and

1 − p is the probability of a failure on any given trial, often simply referred to by “q” (i.e., q = 1 − p).

The binomial setting provides an ideal context to demonstrate the essentials of hypothesis‐testing logic, as we will soon see. In a binomial setting, the following conditions must hold:

The variable under study must be binary in nature. That is, the outcome of the experiment can result in only one category or another. That is, the outcome categories are mutually exclusive. For instance, the flipping of a coin has this characteristic, because the coin can either come up “head” or “tail” and nothing else (yes, we are ruling out the possibility that it lands on its side, and I think it is safe to do so).

The probability of a “success” on each trial remains constant (or stationary) from trial to trial. For example, if the probability of head is equal to 0.5 on our first flip, we assume it is also equal to 0.5 on the second, third, fourth flips, and so on.

Each trial is independent of each other trial. That is, the fact that we get a head on our first flip of the coin in no way changes the probability of getting a head or tail on the next flip, and so on for the other flips (i.e., no outcome is ever “due” to occur, as the gambler sometimes believes).

We can easily demonstrate hypothesis testing in a binomial setting using R. For instance, let us return to the coin‐flipping experiment. Suppose you would like to know the probability of obtaining two heads on five flips of a fair coin, where each flip is assumed to have a probability of heads equal to 0.5. In R, we can compute this as follows:

> dbinom(2, size = 5, prob = 0.5) [1] 0.3125

Figure 2.3 Fisher's overlay of normal density on empirical observations.

Source: Fisher (1925, 1934).

where dbinom calls the “density for the binomial,” “2” is the number of successes we are specifying, “size = 5” represents the number of trials we are taking, and “prob = 0.5” is the probability of success on any given trial, which recall is assumed constant from trial to trial.

Suppose instead of two heads, we were interested in the probability of obtaining five heads:

> dbinom(5, size = 5, prob = 0.5) [1] 0.03125

Notice that the probability of obtaining five heads out of five flips on a fair coin is quite a bit less than that of obtaining two heads. We can continue to obtain the remaining probabilities and obtain the complete binomial distribution for this experiment:

Heads	0	1	2	3	4	5
Prob	0.03125	0.15625	0.3125	0.3125	0.15625	0.03125	∑1.0

A plot of this binomial distribution is given in Figure 2.4.

Suppose that instead of wanting to know the probability of getting two heads out of five flips, we wanted to know the probability of getting two or more heads out of five flips. Because the events 2 heads, 3 heads, 4 heads, and 5 heads are mutually exclusive events, we can add their probabilities by the probability rule that says : 0.3125 + 0.3125 + 0.15625 + 0.03125 = 0.8125. Hence, the probability of obtaining two or more heads on a fair coin on five flips is equal to 0.8125.

Figure 2.4 Binomial distribution for the probability of the number of heads on a fair coin.

Binomial distributions are useful in a great variety of contexts in modeling a wide number of phenomena. But again, remember that the outcome of the variable must be binary, meaning it must have only two possibilities. If it has more than two possibilities or is continuous in nature, then the binomial distribution is not suitable. Binomial data will be featured further in our discussion of logistic regression in Chapter 10.

One can also appreciate the general logic of hypothesis testing through the binomial. If our null hypothesis is that the coin is fair, and we obtain five heads out of five flips, this result has only a 0.03125 probability of occurring. Hence, because the probability of this data is so low under the model that the coin is fair, we typically decide to reject the null hypothesis and infer the statistical alternative hypothesis that p(H) ≠ 0.5. Substantively, we might infer that the coin is not fair, though this substantive alternative also assumes it is the coin that is to “blame” for it coming up five times heads. If the flipper was responsible for biasing the coin, for instance, or a breeze suddenly came along that helped the result occur in this particular fashion, then inferring the substantive alternative hypothesis of “unfairness” may not be correct. Perhaps the nature of the coin is such that it is fair. Maybe the flipper or other factors (e.g., breeze) are what are ultimately responsible for the rejection of the null. This is one reason why rejecting null hypotheses is quite easy, but inferring the correct substantive alternative hypothesis (i.e., the hypothesis that explains why the null was rejected) is much more challenging (see Denis, 2001). As concluded by Denis, “Anyone can reject a null, to be sure. The real skill of the scientist is arriving at the true alternative.”

The binomial distribution is also well‐suited for comparing proportions. For details on how to run this simple test in R, see Crawley (2013, p. 365). One can also use binom.test in R to test simple binomial hypotheses, or the prop.test for testing null hypotheses about proportions. A useful test that employs binomial distributions is the sign test (see Siegel and Castellan, 1988, pp. 80–87 for details). For a demonstration of the sign test in R, see Denis (2020).

Applied Univariate, Bivariate, and Multivariate Statistics

Подняться наверх