Читать книгу Applied Univariate, Bivariate, and Multivariate Statistics - Daniel J. Denis - Страница 49

2.11.1 Sampling Distribution of the Mean

Оглавление

Since we regularly calculate and analyze sample means in our data, we are often interested in the sampling distribution of the mean. If we regularly computed medians, we would be equally as interested in the sampling distribution of the median.

Recall that when we consider any distribution, whether theoretical or empirical, we are usually especially interested in knowing two things about that distribution: a measure of central tendency and a measure of dispersion or variability. Why do we want to know such things? We want to know these two things because they help summarize our observations, so that instead of looking at each individual data point to get an adequate description of the objects under study, we can simply request the mean and standard deviation as telling the story (albeit an incomplete one) of the obtained observations. Similarly, when we derive a sampling distribution, we are interested in the mean and standard deviation of that theoretical distribution of a statistic.

We already know how to calculate means and standard deviations for real empirical distributions. However, we do not know how to calculate means and standard deviations for sampling distributions. It seems reasonable that the mean and standard deviation of a sampling distribution should depend in some way on the given population from which we are sampling. For instance, if we are sampling from a population that has a mean μ = 20.0 and population standard deviation σ = 5, it seems plausible that the sampling distribution of the mean should look different than if we were sampling from a population with μ = 10.0 and σ = 2. It makes sense that different populations should give rise to different theoretical sampling distributions.

What we need then is a way to specify the sampling distribution of the mean for a given population. That is, if we draw sample means from this population, what does the sampling distribution of the mean look like for this population? To answer this question, we need both the expectation of the sampling distribution (i.e., its mean) as well as the standard deviation of the sampling distribution (i.e., its standard error (SE)). We know that the expectation of the sample mean is equal to the population mean μ. That is, . For example, for a sample mean , the expected value of the sample mean is equal to the population mean μ of 20.0.

To understand why should be true, consider first how the sample mean is defined:


Incorporating this into the expectation for , we have:


There is a rule of expectations that says that the expectation of the sum of random variables is equal to the sum of individual expectations. This being the case, we can write the expectation of the sample mean as:


Since the expectation of each y1 through yn is E(y1) = μ, E(y2) = μ, … E(yn) = μ, we can write


We note that the n values in numerator and denominator cancel, and so we end up with


Using the fact that E(yi) = μ, we can also say that the expected value of a sampling distribution of the mean is equal to the mean of the population from which we did the theoretical sampling. That is, is true, since given , it stands that if we have say, five sample means , the expectation of each of these means should be equal to μ, from which we can easily deduce . That is, the mean of all the samples we could draw is equal to the population mean.

We now need a measure of the dispersion of a sampling distribution of the mean. At first glance, it may seem reasonable to assume that the variance of the sampling distribution of means should equal the variance of the population from which the sample means were drawn. However, this is not the case. What is true is that the variance of the sampling distribution of means will be equal to only a fraction of the population variance. It will be equal to of it, where n is equal to the size of samples we are collecting for each sample mean. Hence, the variance of means of the sampling distribution is equal to


or simply,


The mathematical proof of this statistical fact is in most mathematical statistics texts. A version of the proof can also be found in Hays (1994). The idea, however, can be easily and perhaps even more intuitively understood by recourse to what happens as n changes. We consider first the most trivial and unrealistic of examples to strongly demonstrate the point. Suppose that we calculate the sample mean from a sample size of n = 1, sampled from a population with μ = 10.0 and σ2 = 2.0. Suppose the sample mean we obtain is equal to 4.0. Therefore, the sampling variance of the corresponding sampling distribution is equal to:


That is, the variance in means that you can expect to see if you sampled an infinite number of means based on samples of size n = 1 repeatedly from this population is equal to 2. Notice that 2 is exactly equal to the original population variance. In this case, the variance in means is based on only a single data point.

Consider now the case where n > 1. Suppose we now sampled a mean from the population based on sample size n = 2, yielding


What has happened? What has happened is that the variance in sample means has decreased by 1/2 of the original population variance (i.e., 1/2 of 2 is 1). Why is this decrease reasonable? It makes sense, because we already know from the law of large numbers that as the sample size grows larger, one gets closer and closer to the true probability in estimating a parameter. That is, for a consistent estimator, our estimate of the true population mean (i.e., the expectation) should get better and better as sample size increases. This is exactly what happens as we increase n, our precision of that which is being estimated increases. In other words, the sampling variance of the estimator decreases. It's less variable, it doesn't “bounce around as much” on average from sample to sample.

Analogous to how we defined the standard deviation as the square root of the variance, it is also useful to take the square root of the variance of means:


which we call the standard error of the mean, σM. The standard error of the mean is the standard deviation of the sampling distribution of the mean. Lastly, it is important to recognize that is not “the” standard error. It is merely the standard error of the mean. Other statistics will have different SEs.

Applied Univariate, Bivariate, and Multivariate Statistics

Подняться наверх