Читать книгу Exploratory Factor Analysis - W. Holmes Finch - Страница 14
EFA and Other Multivariate Data Reduction Techniques
ОглавлениеFactor analysis belongs to a larger family of statistical procedures known collectively as data reduction techniques. In general, all data reduction techniques are designed to take a larger set of observed variables and combine them in some way so as to yield a smaller set of variables. The differences among these methods lies in the criteria used to combine the initial set of variables. We discuss this criterion for EFA at some length in Chapter 3, namely the effort to find a factor structure that yields accurate estimates of the covariance matrix of the observed variables using a smaller set of latent variables. Another statistical analysis with the goal of reducing the number of observed variables to a smaller number of unobserved variates is discriminant analysis (DA). DA is used in situations where a researcher has two or more groups in the sample (e.g., treatment and control groups) and would like to gain insights into how the groups differ on a set of measured variables. However, rather than examining each variable separately, it is more statistically efficient to consider them collectively. In order to reduce the number of variables to consider in this case, DA can be used. As with EFA, DA uses a heuristic to combine the observed variables with one another into a smaller set of latent variables that are called discriminant functions. In this case, the algorithm finds the combination(s) that maximize the group mean difference on these functions. The number of possible discriminant functions is the minimum of p and J-1, where p is the number of observed variables, and J is the number of groups. The functions resulting from DA are orthogonal to one another, meaning that they reflect different aspects of the shared group variance associated with the observed variables. The discriminant functions in DA can be expressed as follows:
Dfi = wf1 x1i + wf2 x2i + ⋅⋅⋅ + wfp xpi (Equation 1.1)
where
Dfi = Value of discriminant function f for individual i
wfp= Discriminant weight relating function f and variable p
xpi = Value of variable p for individual i.
For each of these discriminant functions (Df), there is a set of weights that are akin to regression coefficients and correlations between the observed variables and the functions. Interpretation of the DA results usually involves an examination of these correlations. An observed variable having a large correlation with a discriminant function is said to be associated with that function in much the same way that indicator variables with large loadings are said to be associated with a particular factor. Quite frequently, DA is used as a follow-up procedure to a statistically significant multivariate analysis of variance (MANOVA). Variables associated with discriminant functions with statistically significantly different means among the groups can be concluded to contribute to the group mean difference associated with that function. In this way, the functions can be characterized just as factors are, by considering the variables that are most strongly associated with them.
Canonical correlation (CC) works in much the same fashion as DA, except that rather than having a set of continuous observed variables and a categorical grouping variable, CC is used when there are two sets of continuous variables for which we want to know the relationship. As an example, consider a researcher who has collected intelligence test data that yields five subtest scores. In addition, she has also measured executive functioning for each subject in the sample, using an instrument that yields four subtests. The research question to be addressed in this study is, how strongly related are the measures of intelligence and executive functioning? Certainly, individual correlation coefficients could be used to examine how pairs of these variables are related to one another. However, the research question in this case is really about the extent and nature of relationships between the two sets of variables. CC is designed to answer just this question, by combining each set into what are known as canonical variates. As with DA, these canonical variates are orthogonal to one another so that they extract all of the shared variance between the two sets. However, whereas DA created the discriminant function by finding the linear combinations of the observed indicators that maximized group mean differences for the functions, CC finds the linear combinations for each variable set that maximize the correlation between the resulting canonical variates. Just as with DA, each observed variable is assigned a weight that is used in creating the canonical variates. The canonical variate is expressed as in Equation 1.2.
Cvi = wc1 x1i + wc2 x2i + ⋅⋅⋅ + wcp xpi (Equation 1.2)
where
Cvi = Value of canonical variate v for individual i
wcp = Canonical weight relating variate v and variable p
xpi = Value of variable p for individual i.
Note how similar Equation 1.1 is to Equation 1.2. In both cases, the observed variables are combined to create one or more linear combination scores. The difference in the two approaches is in the criteria used to obtain the weights. As noted above, for DA the criteria involve maximizing group separation on the means of Df, whereas for CC the criteria is the maximization of correlation between Cv for the two sets of variables.
The final statistical model that we will contrast with EFA is partial least squares (PLS), which is similar to CC in that it seeks to find linear combinations of two sets of variables such that the relationship between the sets will be maximized. This goal stands in contrast to EFA, in which the criterion for determining factor loadings is the optimization of accuracy in reproducing the observed variable covariance/correlation matrix. PLS differs from CC in that the criterion it uses to obtain weights involves both the maximization of the relationship between the two sets of variables as well as maximizing the explanation of variance for the variables within each set. CC does not involve this latter goal. Note that PCA, which we discuss in Chapter 3, also involved the maximization of variance explained within a set of observed variables. Thus, PLS combines, in a sense, the criteria of both CC and PCA (maximizing relationships among variable sets and maximizing explained variance within variable sets) in order to obtain linear combinations of each set of variables.