Читать книгу Applied Univariate, Bivariate, and Multivariate Statistics - Daniel J. Denis - Страница 57

2.19 OTHER CORRELATION COEFFICIENTS

It often happens that once we hear of Pearson's r, this becomes the only correlation coefficient in one's vocabulary, and too often the concept, rather than calculation, of a correlation is automatically linked to Pearson's r. Pearson r is but one of many correlation coefficients available at one's disposal in applied research. Recall that Pearson r captures linear relationships between (typically) continuous variables. If the relationship is not linear, or one or more variables are not continuous, or again if the data are in the form of ranks, then other correlation coefficients are generally more suitable. We briefly review Spearman's rho, although a host of other correlation coefficients exist that are well‐suited for a variety of particular types of data.⁸

Spearman's r_s (“rho”), named after Charles Spearman who developed the coefficient in 1904,⁹is a correlation coefficient suitable for data on two variables that are expressed in terms of ranks rather than actual measurements on a continuous scale. Mathematically, the Spearman correlation coefficient is equivalent to a Pearson r when the data are ranked. There are important differences between these two coefficients. Spearman's r_s can be defined as:

where R_x and R_y are the ranks on x_i and y_i for the i^th individual in the data, are squared rank deviations, and n is the number of pairs of ranks (Kirk, 2008). When we compute r_s on the Galton data, we obtain:

> cor.test(parent, child, method = "spearman") Spearman's rank correlation rho data: parent and child S = 76569964, p-value < 2.2e-16 alternative hypothesis: true rho is not equal to 0 sample estimates: rho 0.4251345

We see that r_s of 0.425 is slightly less than was Pearson r of 0.459.

To understand why Spearman's rank correlation and Pearson coefficient differ, consider data (Table 2.5) on the rankings of favorite movies for two individuals. In parentheses are subjective scores of “favorability” of these movies, scaled 1–10, where 1 = least favorable and 10 = most favorable.

From the table, we can see that Bill very much favors Star Wars (rating of 10) while least likes Batman (rating of 2.1). Mary's favorite movie is Scarface (rating of 9.7) while her least favorite movie is Batman (rating of 7.6). We will refer to these subjective scores in a moment. For now, we focus only on the ranks. For instance, Bill's ranking of Scarface is third, while Mary's ranking of Star Wars is third.

Table 2.5 Favorability of Movies for Two Individuals in Terms of Ranks

Movie	Bill	Mary
Batman	5 (2.1)	5 (7.6)
Star Wars	1 (10.0)	3 (9.0)
Scarface	3 (8.4)	1 (9.7)
Back to the Future	4 (7.6)	4 (8.5)
Halloween	2 (9.5)	2 (9.6)

Actual scores on the favorability measure are in parentheses.

To compute Spearman's r_s in R the “long way,” we generate two vectors that contain the respective rankings:

> bill <- c(5, 1, 3, 4, 2) > mary <- c(5, 3, 1, 4, 2)

Because the data are already in the form of ranks, both Pearson r and Spearman rho will agree:

> cor(bill, mary) [1] 0.6 > cor(bill, mary, method = “spearman”) > 0.6

Note that by default, R returns the Pearson correlation coefficient. One has to specify method = “spearman” to get r_s. Consider now what happens when we correlate, instead of rankings, the actual subjective favorability scores corresponding to the respective ranks. When we plot the favorability data, we obtain:

> bill.sub <- c(2.1, 7.6, 8.4, 9.5, 10.0) > mary.sub <- c(7.6, 8.5, 9.0, 9.6, 9.7) > plot(mary.sub, bill.sub)

Note that though the relationship is not perfectly linear, each increase in Bill's subjective score is nonetheless associated with an increase in Mary's subjective score. When we compute Pearson's r on this data, we obtain:

> cor(bill.sub, mary.sub) [1] 0.9551578

However, when we compute r_s, we get:

> cor(bill.sub, mary.sub, method = "spearman") [1] 1

Spearman's r_s is equal to 1.0 because the rankings of movie preferences are perfectly monotonically increasing (i.e., for each increase in movie preference along the abscissa corresponds an increase in movie preference along the ordinate). In the case of Pearson's, the correlation is less than 1.0 because r captures the linear relationship among variables and not simply a monotonically increasing one. Hence, a high magnitude coefficient for Spearman's essentially tells us that two variables are “moving together,” but it does not necessarily imply the relationship is a linear one. A similar test that measures rank correlation is that of Kendall's rank‐order correlation. See Siegel and Castellan (1988, p. 245) for details.

Applied Univariate, Bivariate, and Multivariate Statistics

Подняться наверх