Читать книгу Applied Univariate, Bivariate, and Multivariate Statistics - Daniel J. Denis - Страница 68
2.25 COMPOSITE VARIABLES: LINEAR COMBINATIONS
ОглавлениеIn many statistical techniques, especially multivariate ones, statistical analyses take place not on individual variables, but rather on linear combinations of variables. A linear combination in linear algebra can be denoted simply as:
where a ' = (a1, a2, …, ap). These values are scalars, and serve to weight the respective values of y1 through yp, which are the variables.
Just as we did for “ordinary” variables, we can compute a number of central tendency and dispersion statistics on linear combinations. For instance, we can compute the mean of a linear combination ℓi as
We can also compute the sample variance of a linear combination:
for ℓi = a′y, i = 1, 2, …, n, and where S is the sample covariance matrix. Though the form a′Sa for the variance may be difficult to decipher at this point, it will become clearer when we consider techniques such as principal components later in the book.
For two linear combinations,
and
we can obtain the sample covariance between such linear combinations as follows:
The correlation of these linear combinations (Rencher and Christensen, 2012, p. 76) is simply the standardized version of :
As we will see later in the book, if is the maximum correlation between linear combinations on the same variables, it is called the canonical correlation, discussed in Chapter 12. The correlation between linear combinations plays a central role in multivariate analysis. Substantively, and geometrically, linear combinations can be interpreted as “projections” of one or more variables onto new dimensions. For instance, in simple linear regression, the fitting of a least‐squares line is such a projection. It is the projection of points such that it guarantees that the sum of squared deviations from the given projected line or “surface” (in the case of higher dimensions) is kept to a minimum.
If we can assume multivariate normality of a distribution, that is, Y ∼ N[μ, ∑], then we know linear combinations of Y are also normally distributed, as well as a host of other useful statistical properties (see Timm, 2002, pp. 86–88). In multivariate methods especially, we regularly need to make assumptions about such linear combinations, and it helps to know that so long as we can assume multivariate normality, we have some idea of how such linear combinations will be distributed.