Читать книгу Biostatistics Decoded - A. Gouveia Oliveira - Страница 26
1.18 The Standard Error of the Mean
ОглавлениеWe now know that the means of large samples may be defined as observations from a random variable with normal distribution. We also know that the normal distribution is completely characterized by its mean and variance. The next step in the investigation of sampling distributions, therefore, must be to find out whether the mean and variance of the distribution of sample means can be determined.
We can conduct an experiment simulating a sampling procedure. With the help of the random number generator of a computer, we can create a random variable with normal distribution with mean 0 and variance 1. Incidentally, this is called a standard normal variable. Then, we obtain a large number of random samples of size 4 and calculate the means of those samples. Next, we calculate the mean and standard deviation of the sample means. We repeat the procedure with samples of size 9, 16, and 25. The results of the experiment are shown in Figure 1.29.
As expected, as the variable we used had a normal distribution, the sample means also have a normal distribution. We can see that the average value of the sample means is, in all cases, the same value as the population mean, that is, 0. However, the standard deviations of the values of sample means are not the same in all four runs of the experiment. In samples of size 4 the standard error is 0.50, in samples of size 9 it is 0.33, in samples of size 16 it is 0.25, and in samples of size 25 it is 0.20.
Figure 1.29 Distribution of sample means of different sample sizes.
If we look more closely at these results, we realize that those values have something in common. Thus, 0.50 is 1 divided by 2, 0.33 is 1 divided by 3, 0.25 is 1 divided by 4, and 0.20 is 1 divided by 5. Now, can you see the relation between the divisors and the sample size, that is, 2 and 4, 3 and 9, 4 and 16, 5 and 25? The divisors are the square root of the sample size and 1 is the value of the population standard deviation. This means that the standard deviation of the sample means is equal to the population standard deviation divided by the square root of the sample size. Therefore, there is a fixed relationship between the standard deviation of the sample means of an attribute and the standard deviation of that attribute, where the former is equal to the latter divided by the square root of the sample size.
In the next section we will present an explanation for this relationship, but for now let us consolidate some of the concepts we have discussed so far.
The standard deviation of the sample means has its own name of standard error of the mean or, simply, standard error. If the standard error is equal to the population standard deviation divided by the square root of the sample size, then the variance of the sample means is equal to the population variance divided by the sample size.
Now we can begin to see why people tend to get confused with statistics. We have been talking about different means and different standard deviations, and students often become disoriented with so many measures. Let us review the meaning of each one of those measures.
There is the sample mean, which is not equal in value to the population mean. Sample means have a probability distribution whose mean has the same value as the population mean.
There is a statistical notation to represent the quantities we have encountered so far, whereby population parameters, which are constants and have unknown values, are represented by Greek characters; statistics obtained from samples, which are variables with known value, are represented by Latin characters. Therefore, the value of the sample mean is represented by the letter m, and the value of the population mean by the letter μ (“m” in the Greek alphabet). Following the same rule for proportions, the symbol for the sample proportion is p and for the population proportion is π (“p” in the Greek alphabet).
Next, there is the sample standard deviation, which is not equal in value to the population standard deviation. Sample means have a distribution whose standard deviation, also known as standard error, is different from the sample standard deviation and from the population standard deviation. The usual notation for the sample standard deviation is the letter s, and for the population standard deviation is the letter σ (“s” in the Greek alphabet). There is no specific notation for the standard error.
Then there is the sample variance, which is also not equal to the population variance. These quantities are usually represented by the symbols s2 and σ2, respectively. In the case of proportions, the sample and population variances should also be represented by s2 and σ2, but instead the general practice is to represent them by the formulas used for their computation. Therefore, the sample proportion is represented by p(1 − p) and the population variance by π(1 − π).
If one looks at the formulae for each of the above statistics, it becomes readily apparent why the sample statistics do not have the same value as their population counterparts. The reason is because they are computed differently, as shown in Figure 1.30.
Figure 1.30 Comparison of the computation of sample and population statistics.
Sample means also have variance, which is the square of the standard error, but the variance of sample means has neither a specific name, nor a specific notation.
From all of the above, we can conclude the following about sampling distributions:
Sample means have a normal distribution, regardless of the distribution of the attribute, but on the condition that they are large.
Small samples have a normal distribution only if the attribute has a normal distribution.
The mean of the sample means is the same as the population mean, regardless of the distribution of the variable or the sample size.
The standard deviation of the sample means, or standard error, is equal to the population standard deviation divided by the square root of the sample size, regardless of the distribution of the variable or the sample size.
Both the standard deviation and the standard error are measures of dispersion: the first measures the dispersion of the values of an attribute and the second measures the dispersion of the sample means of an attribute.
The above results are valid only if the observations in the sample are mutually independent.