Читать книгу Applied Biostatistics for the Health Sciences - Richard J. Rossi - Страница 70
THE BINOMIAL CONDITIONS
ОглавлениеThe binomial distribution can be used to model the number of successes in n trials when
1 each trial of the experiment results in one of the two outcomes, denoted by S for success and F for failure,
2 the trials will be repeated n times under identical conditions and each trial is independent of the others,
3 the probability of a success is the same on each of the n trials,
4 the random variable of interest, say X, is the number of successes in the n trials.
A random variable satisfying the above conditions is called a binomial random variable. Note that a binomial random variable X simply counts the number of successes that occurred in n trials. The probability distribution for a binomial random variable X is given by the mathematical expression
where p(x) is the probability that X is equal to the value x. In this formula
n!x!(n−x)! is the number of ways for there to be x successes in n trials,
n!=n(n−1)(n−2)⋯3⋅2⋅1 and 0!=1 by definition,
p is the probability of a success on any of the n trials,
px is the probability of having x successes in n trials,
1−p is the probability of a failure on any of the n trials,
(1−p)n−x is the probability of getting n − x failures in n trials.
Examples of the binomial distribution are given in Figure 2.24. Note that a binomial distribution will have a longer tail to the right when p < 0.5, a longer tail to the left when p > 0.5, and is symmetric when p = 0.5.
Figure 2.24 Three binomial distributions: (a) n=25,p=0.1; (b) n=25,p=0.5; (c) n=25,p=0.9.
Because the computations for the probabilities associated with a binomial random variable are tedious, it is best to use a statistical computing package such as MINITAB for computing binomial probabilities.
Example 2.31
Hair loss is a common side effect of chemotherapy. Suppose that there is an 80% chance that an individual will lose their hair during or after receiving chemotherapy. Let X be the number of individuals who retain their hair during or after receiving chemotherapy. If 10 individuals are selected at random, use the MINITAB output given in Table 2.10 to determine
Table 2.10 The Binomial Distribution for n= 10 Trials and p = 0.20
Binomial with n = 10 and p = 0.2 | |
---|---|
x | P( X = x ) |
0 | 0.107374 |
1 | 0.268435 |
2 | 0.301990 |
3 | 0.201327 |
4 | 0.088080 |
5 | 0.026424 |
6 | 0.005505 |
7 | 0.000786 |
8 | 0.000074 |
9 | 0.000004 |
10 | 0.000000 |
1 the probability that exactly seven will retain their hair (i.e., X = 7),
2 the probability that between four and eight (inclusive) will retain their hair (i.e., 4≤X≤8),
3 the probability that at most three will retain their hair (i.e., X≤3),
4 the probability that at least six will retain their hair (i.e., X≥6),
5 the most likely number of patients to retain their hair (i.e., the mode).
Solutions
Based on the MINITAB output in Table 2.10, the probability that
1 exactly seven will retain their hair (i.e., X = 7) is
2 between four and eight (inclusive) will retain their hair (i.e., 4≤X≤8) is
3 at most three will retain their hair (i.e., X≤3) is
4 at least six will retain their hair (i.e., X≥6) is
5 the most likely number of patients to retain their hair is X = 2.
The mean of a binomial random variable based on n trials and probability of success p is μ=np and the standard deviation is σ=n⋅p⋅(1−p). The mean of a binomial is the expected number of successes in n trials, and the values of a binomial random variable are concentrated near its mean. The standard deviation measures the spread about the mean and is largest when p = 0.5; as p moves away from 0.5 toward 0 or 1, the variability of a binomial random variable decreases. Furthermore, when np and n(1−p) are both greater than 5, the apply and
roughly 68% of the binomial distribution lies between the values closest to the np−n⋅p⋅(1−p) and np+n⋅p⋅(1−p),
roughly 95% of the binomial distribution lies between the values closest to np−2n⋅p⋅(1−p) and np+2n⋅p⋅(1−p),
roughly 99% of the binomial distribution lies between the values closest to np−3n⋅p⋅(1−p) and np+3n⋅p⋅(1−p).
Example 2.32
Suppose the relapse rate within 3 months of treatment at a drug rehabilitation clinic is known to be 40%. If the clinic has 25 patients, then the mean number of patients to relapse within 3 months is μ=25⋅0.40=10 and the standard deviation is σ=25⋅0.40⋅(1−0.40)=2.45. Now, since np=25(0.4)=10 and n(1−p)=25(0.6)=15, by applying the Empirical Rules roughly 95% of the time between 5 and 15 patients will relapse within 3 months of treatment. Using MINITAB, the actual percentage of a binomial distribution with n = 25 and p = 0.40 falling between 5 and 15 is 98%.
An important restriction in the setting for a binomial random variable is that the probability of success remains constant over the n trials. In many biomedical studies, the probability of success will be different for each individual in the experiment because the individuals are different. For example, in a study of the survival of patients having suffered heart attacks, the probability of survival will be influenced by many factors including severity of heart attack, delay in treatment, age, and ability to change diet and lifestyle following a heart attack. Because each individual is different, the probability of survival is not going to be constant over the n individuals in the study, and hence, the binomial probability model does not apply.