Читать книгу Biostatistics Decoded - A. Gouveia Oliveira - Страница 22
1.14 The Normal Distribution
ОглавлениеConsider some attribute that may take only two values, say 1 and 2, and those values occur with equal frequency. Technically speaking, we say a random variable taking values 1 and 2 with equal probability; this is the probability distribution for that variable (see Figure 1.24, upper part). Consider also, say, four variables that behave exactly like this one, that is, with the same probability distribution. Now let us create a fifth variable that is the sum of all four variables. Can we predict what will be the probability distribution of this variable?
We can, and the result is also presented in Figure 1.24. We simply write down all the possible combinations of values of the four equal variables and see in each case what the value of the fifth variable is. If all four variables have value 1, then the fifth variable will have value 4. If three variables have value 1 and one has value 2, then the fifth variable will have value 5. This may occur in four different ways – either the first variable had the value 2, or the second, or the third, or the fourth. If two variables have the value 1 and two have the value 2, then the sum will be 6, and this may occur in six different ways. If one variable has value 1 and three have value 2, then the result will be 7 and this may occur in four different ways. Finally, if all four variables have value 2, the result will be 8 and this can occur in only one way.
Figure 1.24 The origin of the normal distribution.
So, of the 16 different possible ways or combinations, in one the value of the fifth variable is 4, in four it is 5, in six it is 6, in four it is 7, and in one it is 8. If we now graph the relative frequency of each of these results, we obtain the graph shown in the lower part of Figure 1.24. This is the graph of the probability distribution of the fifth variable. Do you recognize the bell shape?
If we repeat the experiment with not two, but a much larger number of variables, the variable that results from adding all those variables will have not just five different values, but many more. Consequently, the graph will be smoother and more bell‐shaped. The same will happen if we add variables taking more than two values.
If we have a very large number of variables, then the variable resulting from adding those variables will take an infinite number of values and the graph of its probability distribution will be a perfectly smooth curve. This curve is called the normal curve. It is also called the Gaussian curve after the German mathematician Karl Gauss who described it.