Читать книгу Exploratory Factor Analysis - W. Holmes Finch - Страница 13
Comparison of Exploratory and Confirmatory Factor Analysis
ОглавлениеFactor analysis models, as a whole, exist on a continuum. At one extreme is the purely exploratory model, which incorporates no a priori information, such as the possible number of factors or how indicators are associated with factors. At the other extreme lies a purely confirmatory factor model in which the number of factors, as well as the way in which the observed indicators group onto these factors, is provided by the researcher. These modeling frameworks differ both conceptually and statistically. From a conceptual standpoint, exploratory models are used when the researcher has little or no prior information regarding the expected latent structure underlying a set of observed indicators. For example, if very little prior empirical work has been done with a set of indicators, or there is not much in the way of a theoretical framework for a factor model, then by necessity the researcher would need to engage in an exploratory investigation of the underlying factor structure. In other words, without prior information on which to base the factor analysis, the researcher cannot make any presuppositions regarding what the structure might look like, even with regard to the number of factors underlying the observed indicators. In other situations, there may be a strong theoretical basis upon which a hypothesized latent structure rests, such as when a scale has been developed using well-established theories. However, if very little prior empirical work exists exploring this structure, the researcher may not be able to use a more confirmatory approach and thus would rely on exploratory factor analysis (EFA) to examine several possible factor solutions, which might be limited in terms of the number of latent variables by the theoretical framework upon which the model is based. Conceptually, a confirmatory factor analysis (CFA) approach would be used when there is both a strong theoretical expectation regarding the expected factor structure and prior empirical evidence (usually in the form of multiple EFA studies) supporting this structure. In such cases, CFA is used to (a) ascertain how well the hypothesized latent variable model fits the observed data and (b) compare a small number of models with one another in order to identify the one that yields the best fit to the data.
From a statistical perspective, EFA and CFA differ in terms of the constraints that are placed upon the factor structure prior to estimation of the model parameters. With EFA there are few, if any, constraints placed on the model parameters. Observed indicators are typically allowed to have nonzero relationships with all of the factors, and the number of factors is not constrained to be a particular number. Thus, the entire EFA enterprise is concerned with answering the question of how many factors underlie an observed set of indicators, and what structure the relationship between factors and indicators takes. In contrast, CFA models are highly constrained. In most instances, each indicator variable is allowed to be associated with only a single factor, with relationships to all other factors set to 0. Furthermore, the specific factor upon which an indicator is allowed to load is predetermined by the researcher. This is why having a strong theory and prior empirical evidence is crucial to the successful fitting of CFA models. Without such strong prior information, the researcher may have difficulty in properly defining the latent structure, potentially creating a situation in which an improper model is fit to the data. The primary difficulty with fitting an incorrect model is that it may appear to fit the data reasonably well, based on statistical indices, and yet may not be the correct model. Without earlier exploration of the likely latent structure, however, it would not be possible for the researcher to know this. CFA does have the advantage of being a fully determined model, which is not the case with EFA, as we have already discussed. Thus, it is possible to come to more definitive determinations regarding which of several CFA models provides the best fit to a set of data because they can be compared directly using familiar tools such as statistical hypothesis testing. Conversely, determining the optimal EFA model for a set of data is often not a straightforward or clear process, as we will see later in the book.
In summary, EFA and CFA sit at opposite ends of a modeling continuum, separated by the amount of prior information and theory available to the researcher. The more such information and the stronger the theory, the more appropriate CFA will be. Conversely, the less that such prior evidence is available, and the weaker the theories about the latent structure, the more appropriate will be EFA. Finally, researchers should take care not to use both EFA and CFA on the same set of data. In cases where a small set of CFA models do not fit a set of sample data well, a researcher might use EFA in order to investigate potential alternative models. This is certainly an acceptable approach; however, the same set of data used to investigate these EFA-based alternatives should not then be used with an additional CFA model to validate what exploration has suggested might be optimal models. In such cases, the researcher would need to obtain a new sample upon which the CFA would be fit in order to investigate the plausibility of the EFA findings. If the same data were used for both analyses, the CFA model would likely yield spuriously good fit to the sample for the model, given that the sample data had already yielded the factor structure that is being tested, through the EFA.