Читать книгу Applied Univariate, Bivariate, and Multivariate Statistics - Daniel J. Denis - Страница 55

2.17 PSYCHOMETRIC VALIDITY, RELIABILITY: A COMMON USE OF CORRELATION COEFFICIENTS

Оглавление

Correlation coefficients, specifically the Pearson correlation, are employed in virtually all fields of study, and without the invention or discovery of correlation, most modern‐day statistics would simply not exist. This is especially true for the field of psychometrics, which is the science that deals with the measurement of psychological qualities such as intelligence, self‐esteem, motivation, among others. Psychometrics features the development of psychometric tests purported to measure the construct of interest. For an excellent general introduction to psychometrics, consult McDonald (1999).

When developing psychometric instruments, two statistical characteristics of these tests are especially important: (1) validity, and (2) reliability. Validity of a test takes many forms, including face validity, criterion validity, and most notably, construct validity. Construct validity attempts to assess whether a purported psychometric test actually measures what it was designed to measure, and one way of evaluating construct validity is to correlate the newly developed measure with that of an existing measure that is already known to successfully measure the construct.

For example, in the area of depression assessment, the Beck Depression Inventory (BDI) is a popular self‐report measure often used in evaluating one's level or symptoms of depression. Now, if we were to develop a new test, in order to learn whether that new test measures something called “depression,” we may wish to compute a Pearson correlation of that measure with the BDI. To the extent that the correlation is relatively high, we might tentatively conclude that the new measure is assessing the same (or at least a similar) construct as that of the BDI. Not surprisingly, these correlations in this context often go by the name of validities in the psychometric literature. If a test lacks construct validity, then there is little guarantee that it is measuring the construct under investigation. Fields such as psychology depend on such construct validation to gain some sense of certainty that their measures are tapping into what they are most interested in. Clinical psychology, especially, depends on the strength of such things as construct validity to secure a sense of sureness that their diagnostic tests are measuring what they are thought to measure. Without psychometrics, clinical testing in this way would be no more advanced than folk or “pop” psychology tests we often find on the internet, which are usually wholly unscientific.

The second area of concern, that of reliability, is just as important. Two popular and commonly used forms of reliability in psychometrics are those of test–retest and internal consistency reliability. Test–retest reliability evaluates the consistency of test scores across one or more measurement time points. For example, if I measured your IQ today, and the test was worth its salt, I should expect that a measurement of your IQ a month from now should, within a reasonable margin of error, generate a similar score, assuming it was administered under standardized conditions both times. If not, we might doubt the test's reliability. The Pearson correlation coefficient is commonly used to evaluate test–retest reliability, where a higher‐than‐not coefficient between testings is desirable. In addition to test–retest, we often would like a measure of what is known as the internal consistency of a measure, which, though having potentially several competing meanings (e.g., see Tang et al., 2014), can be considered to assess how well items on a scale “hang together,” which is informal language for whether or not items on a test are interrelated (Schmitt, 1996). For this assessment, we can compute Cronbach's alpha, which we will now briefly demonstrate in SPSS.

As a very small‐scale example, suppose we have a test having only five items (items 1 through 5 in the SPSS data view), and would like to assess the internal consistency of the measure using Cronbach's alpha. Suppose the scores on the items are as follows:

Item_1 Item_2 Item_3 Item_4 Item_5
1 10.00 12.00 15.00 11.00 12.00
2 12.00 18.00 12.00 12.00 1.00
3 8.00 16.00 14.00 14.00 4.00
4 6.00 8.00 16.00 8.00 6.00
5 4.00 7.00 8.00 7.00 5.00
6 6.00 6.00 3.00 7.00 3.00
7 3.00 4.00 6.00 5.00 8.00
8 7.00 3.00 7.00 9.00 9.00
9 8.00 9.00 4.00 10.00 10.00
10 9.00 5.00 6.00 11.00 12.00

To compute a Cronbach's alpha, and obtain a handful of statistics useful for conducting an item analysis, we code in SPSS:

RELIABILITY /VARIABLES=Item_1 Item_2 Item_3 Item_4 Item_5 /SCALE('ALL VARIABLES') ALL /MODEL=ALPHA /STATISTICS=DESCRIPTIVE SCALE CORR /SUMMARY=TOTAL.

The MODEL = ALPHA statement requests SPSS to compute a Cronbach's alpha. Select output now follows:

Reliability Statistics
Cronbach's Alpha Cronbach's Alpha Based on Standardized Items No of Items
0.633 0.691 5
Item Statistics
Mean Std. Deviation N
Item_1 7.3000 2.71006 10
Item_2 8.8000 5.05085 10
Item_3 9.1000 4.74810 10
Item_4 9.4000 2.71621 10
Item_5 7.0000 3.80058 10
Inter‐Item Correlation Matrix
Item_1 Item_2 Item_3 Item_4 Item_5
Item_1 1.000 0.679 0.351 0.827 0.022
Item_2 0.679 1.000 0.612 0.743 −0.463
Item_3 0.351 0.612 1.000 0.462 −0.129
Item_4 0.827 0.743 0.462 1.000 −0.011
Item_5 0.022 −0.463 −0.129 −0.011 1.000

We can see that SPSS reports a raw reliability coefficient of 0.633 and 0.691 based on standardized items. SPSS also reports item statistics, which include the mean and standard deviation of each item, as well as the inter‐item correlation matrix, which, not surprisingly, has values of 1.0 down the main diagonal (i.e., the correlation of an item with itself is equal to 1.0).

Next, SPSS features Item‐Total Statistics, which contains useful information for potentially dropping items and seeking to ameliorate reliability:

Item‐Total Statistics
Scale Mean if Item Deleted Scale Variance if Item Deleted Corrected Item‐Total Correlation Squared Multiple Correlation Cronbach's Alpha if Item Deleted
Item_1 34.3000 108.900 0.712 0.726 0.478
Item_2 32.8000 80.400 0.558 0.841 0.476
Item_3 32.5000 88.278 0.512 0.448 0.507
Item_4 32.2000 104.844 0.796 0.776 0.445
Item_5 34.6000 164.267 −0.228 0.541 0.824

The most relevant column of the above is the last one on the far right, “Cronbach's Alpha if Item Deleted.” What this reports is how much alpha would change if the given item were excluded. We can see that for all items, alpha would decrease if the given item were excluded, but for item 5, alpha would increase. If we drop item 5 then, we should expect alpha to increase. We recompute alpha after removing item 5:

RELIABILITY /VARIABLES=Item_1 Item_2 Item_3 Item_4 /SCALE('ALL VARIABLES') ALL /MODEL=ALPHA /STATISTICS=DESCRIPTIVE SCALE CORR /SUMMARY=TOTAL.

Reliability Statistics
Cronbach's Alpha Cronbach's Alpha Based on Standardized Items Not Items
0.824 0.863 4

As we can see, alpha indeed did increase to 0.824 as indicated it would based on our previous output. Hence, according to coefficient alpha, dropping item 5 may be worthwhile in the hopes of improving the instrument and making its items a bit more interrelated.

Though we have provided an easy demonstration of Cronbach's alpha, it would be negligent at this point to not issue a few cautions and caveats regarding its everyday use. According to Green and Yang (2009), the regular employment of coefficient alpha for assessing reliability should be discouraged based on the fact that assumptions for the statistic are rarely ever met, and hence the statistic can exhibit a high degree of bias. What is more, according to a now classic paper by Schmitt (1996), alpha should not be used to conclude anything about unidimensionality of a test, and thus should not be interpreted as such. Confirmatory factor analysis models (Chapter 15) are typically better suited for assessing and establishing the dimensionality of a set of items. What is more, cut‐offs for alpha regarding what is low versus high internal consistency can be very difficult to define, and as argued by Schmitt, low levels of alpha may still be useful. Hence, though easily computable in SPSS and other software, the reader should be cautious about the unrestricted employment of alpha in their work. For more details on how it should be used, in addition to the aforementioned sources, Cortina (1993) and Miller (1995) are very informative readings and should be read before you readily and regularly adopt alpha in your everyday statistical toolkit.

Applied Univariate, Bivariate, and Multivariate Statistics

Подняться наверх