Читать книгу Applied Regression Modeling - Iain Pardoe - Страница 20

1.4.2 Central limit theorem—t‐version

Оглавление

One major drawback to the normal version of the central limit theorem is that to use it we have to assume that we know the value of the population standard deviation, . A generalization of the standard normal distribution called Student's t‐distribution solves this problem. The density curve for a t‐distribution looks very similar to a normal density curve, but the tails tend to be a little “thicker,” that is, t‐distributions are a little more spread out than the normal distribution. This “extra variability” is controlled by an integer number called the degrees of freedom. The smaller this number, the more spread out the t‐distribution density curve (conversely, the higher the degrees of freedom, the more like a normal density curve it looks).

For example, the following table shows critical values (i.e., horizontal axis values or percentiles) and tail areas for a t‐distribution with 29 degrees of freedom: Probabilities (tail areas) and percentiles (critical values) for a t‐distribution with degrees of freedom.

Upper‐tail area 0.1 0.05 0.025 0.01 0.005 0.001
Critical value of 1.311 1.699 2.045 2.462 2.756 3.396
Two‐tail area 0.2 0.1 0.05 0.02 0.01 0.002

Compared with the corresponding table for the normal distribution in Section 1.2, the critical values are slightly larger in this table.

We will use the t‐distribution from this point on because it will allow us to use an estimate of the population standard deviation (rather than having to assume this value). A reasonable estimate to use is the sample standard deviation, . Since we will be using an estimate of the population standard deviation, we will be a little less certain about our probability calculations—this is why the t‐distribution needs to be a little more spread out than the normal distribution, to adjust for this extra uncertainty. This extra uncertainty will be of particular concern when we are not too sure if our sample standard deviation is a good estimate of the population standard deviation (i.e., in small samples). So, it makes sense that the degrees of freedom increases as the sample size increases. In this particular application, we will use the t‐distribution with degrees of freedom in place of a standard normal distribution in the following t‐version of the central limit theorem.

Suppose that a random sample of data values, represented by , comes from a population that has a mean of . Imagine taking a large number of random samples of data values and calculating the mean and standard deviation for each sample. As before, we will let represent the imagined list of repeated sample means, and similarly, we will let represent the imagined list of repeated sample standard deviations. Define


Under very general conditions, t has an approximate t‐distribution with degrees of freedom. The two differences from the normal version of the central limit theorem that we used before are that the repeated sample standard deviations, , replace an assumed population standard deviation, , and that the resulting sampling distribution is a t‐distribution (not a normal distribution).

To illustrate, let us repeat the calculations from Section 1.4.1 based on an assumed population mean, , but rather than using an assumed population standard deviation, , we will instead use our observed sample standard deviation, 53.8656 for . To find the 90th percentile of the sampling distribution of the mean sale price, :


Thus, the 90th percentile of the sampling distribution of is (to the nearest ).

Turning this around, what is the probability that is greater than 292.893?


So, the probability that is greater than 292.893 is 0.10.

So far, we have focused on the sampling distribution of sample means, , but what we would really like to do is infer what the observed sample mean, , tells us about the population mean, . Thus, while the preceding calculations have been useful for building up intuition about sampling distributions and manipulating probability statements, their main purpose has been to prepare the ground for the next two sections, which cover how to make statistical inferences about the population mean, .

Applied Regression Modeling

Подняться наверх