Читать книгу Applied Univariate, Bivariate, and Multivariate Statistics - Daniel J. Denis - Страница 45

2.8 VARIANCE

Оглавление

Returning to our discussion of moments, the variance is the second moment of a distribution. For the discrete case, variance is defined as:


while for the continuous case,


Since E(yi) = μ, it stands that we may also write E(yi) as μ. We can also express σ2 as since, when we distribute expectations, we obtain:


Recall that the uncorrected and biased sample variance is given by:


As earlier noted, taking the expectation of S2, we find that E(S2) ≠ σ2. The actual expectation of S2 is equal to:


which implies the degree to which S2 is biased is equal to:


We have said that S2 is biased, but you may have noticed that as n increases, (n − 1)/n approaches 1, and so E(S2) will equal σ2 as n increases without bound. This was our basis for earlier writing . That is, we say that the estimator S2, though biased for small samples, is asymptotically unbiased because its expectation is equal to σ2 as n → ∞.

When we lose a degree of freedom in the denominator and rename S2 to s2, we get


Recall that when we take the expectation of s2, we find that E(s2) = σ2 (see Wackerly, Mendenhall, and Scheaffer (2002, pp. 372–373) for a proof).

The population standard deviation is given by the positive square root of σ2, that is, . Analogously, the sample standard deviation is given by .

Recall the interpretation of a standard deviation. It tells us on average how much scores deviate from the mean. In computing a measure of dispersion, we initially squared deviations so as to avoid our measure of dispersion always equaling zero for any given set of observations, since the sum of deviations about the mean is always equal to 0. Taking the average of this sum of squares gave us the variance, but since this is in squared units, we wish to return them to “unsquared” units. This is how the standard deviation comes about. Studying the analysis of variance, the topic of the following chapter, will help in “cementing” some of these ideas of variance and the squaring of deviations, since ANOVA is all about generating different sums of squares and their averages, which go by the name of mean squares.

The variance and standard deviation are easily obtained in R. We compute for parent in Galton's data:

> var(parent) [1] 3.194561 > sd(parent) [1] 1.787333

One may also wish to compute what is known as the coefficient of variation, which is a ratio of the standard deviation to the mean. We can estimate this coefficient for parent and child respectively in Galton's data:

> cv.parent <- sd(parent)/mean(parent) > cv.parent [1] 0.02616573 > cv.child <- sd(child)/mean(child) > cv.child [1] 0.03698044

Computing the coefficient of variation is a way of comparing the variability of competing distributions relative to each distribution's mean. We can see that the dispersion of child relative to its mean (0.037) is slightly larger than that of the dispersion of parent relative to its mean (0.026).

Applied Univariate, Bivariate, and Multivariate Statistics

Подняться наверх