Читать книгу Biostatistics Decoded - A. Gouveia Oliveira - Страница 25

1.17 Probability Distribution of Sample Means

Оглавление

The reason for the pattern of variation of sample means observed in Section 1.13 can easily be understood. We know that a mean is calculated by summing a number of observations on a variable and dividing the result by the number of observations. Normally, we look at the values of an attribute as observations from a single variable. However, we could also view each single value as an observation from a distinct variable, with all variables having an identical distribution. For example, suppose we have a sample of size 100. We can think that we have 100 independent observations from a single random variable, or we can think that we have single observations on 100 variables, all of them with identical distribution. This is illustrated in Figure 1.28. What do we have there, one observation on a single variable – the value of a throw of six dice – or one observation on each of six identically distributed variables – the value of the throw of one dice? Either way we look at it we are right.

So what would be the consequences of that change of perspective? With this point of view, a sample mean would correspond to the sum of a large number of observations from variables with identical distribution, each observation being divided by a constant amount which is the sample size. Under these circumstances, the central limit theorem applies and, therefore, we must conclude that the sample means have a normal distribution, regardless of the distribution of the attribute being studied.

Because the normal distribution of sample means is a consequence of the central limit theorem, certain restrictions apply. According to the theorem, this result is valid only under two conditions. First, there must be a large number of variables. Second, the variables must be mutually independent. Transposing these restrictions to the case of sample means, this implies that a normal distribution can be expected only if there is a large number of observations, and if the observations are mutually independent.

In the case of small samples, however, the means will also have a normal distribution provided the attribute has a normal distribution. This is not because of the central limit theorem, but because of the properties of the normal distribution. If the means are sums of observations on identical normally distributed variables, then the sample means have a normal distribution whatever the number of observations, that is, the sample size.


Figure 1.28 The total obtained from the throw of six dice may be seen as the sum of observations on six identically distributed variables.

Biostatistics Decoded

Подняться наверх