Читать книгу Experimental Design and Statistical Analysis for Pharmacology and the Biomedical Sciences - Paul J. Mitchell - Страница 32
The Central Limit Theorem
ОглавлениеLuckily, however, the small differences that arise as a result of taking samples from a population are not a huge issue thanks to what is known as the Central Limit Theorem, which states that, given a large enough sample size, then the sampling distribution of the sample mean will approximate to a normal distribution regardless of the variable's distribution in the given population. I know I have not described or explained the nature of the normal distribution as yet (sorry!), but have a quick look at Figure 4.7 later in this chapter and compare the shape to the distributions of data sets shown in Figures 5.3 and 5.4 in Chapter 5; can you see the differences in shape?
So, what does this theorem mean? Well, for any set of observations we can easily produce a scatterplot of the magnitude of the observation on the x‐axis against the frequency of occurrence for each value on the y‐axis. The resulting scatterplot is called the frequency distribution for that variable. Interestingly, the values of a variable in any given population may follow different distributions including a normal distribution ( Figure 4.7 ) or distributions that show a right or left skew (Figures 5.3 and 5.4, respectively) in the frequency distribution scatterplot.
However, if we take a sufficiently large number of random samples from a population and record the mean of those samples (i.e. this is what is known as the sample mean) and then repeat this process a number of times (making sure we replace the random values each time to maintain the population size and distribution), then the distribution of the sample means (if plotted as a histogram) will approximate to a normal distribution, irrespective of the inherent distribution of all the samples in the original population. The shape of the resulting histogram is known as the sampling distribution of the mean.
Unfortunately, the shape of the sampling distribution depends on the number of samples taken each time from the population. In most cases a sample size of 30 is sufficient for the sampling distribution of the mean to approximate to a normal distribution. However, with smaller sample sizes, the resulting sampling distribution is generally different from the normal distribution and instead approximates to a t‐distribution where the shape of the sampling distribution depends on the sample size (see Figure 4.9; notice that as the sample size increases, so the shape of the curve approximates to a normal distribution!). The Central Limit Theorem is important in statistics since it links the distribution of the variable in the population to the sampling distribution of the mean. Furthermore, it is vital to understand the theorem when we start to consider the confidence intervals of different statistical parameters (see later in Chapter 22).