Читать книгу Introduction to Linear Regression Analysis - Douglas C. Montgomery - Страница 45

2.6 COEFFICIENT OF DETERMINATION

The quantity

is called the coefficient of determination. Since SS_T is a measure of the variability in y without considering the effect of the regressor variable x and SS_Res is a measure of the variability in y remaining after x has been considered, R² is often called the proportion of variation explained by the regressor x. Because 0 ≤ SS_Res ≤ SS_T, it follows that 0 ≤ R² ≤ 1. Values of R² that are close to 1 imply that most of the variability in y is explained by the regression model. For the regression model for the rocket propellant data in Example 2.1, we have

that is, 90.18% of the variability in strength is accounted for by the regression model.

The statistic R² should be used with caution, since it is always possible to make R² large by adding enough terms to the model. For example, if there are no repeat points (more than one y value at the same x value), a polynomial of degree n − 1 will give a “perfect” fit (R² = 1) to n data points. When there are repeat points, R² can never be exactly equal to 1 because the model cannot explain the variability related to “pure” error.

Although R² cannot decrease if we add a regressor variable to the model, this does not necessarily mean the new model is superior to the old one. Unless the error sum of squares in the new model is reduced by an amount equal to the original error mean square, the new model will have a larger error mean square than the old one because of the loss of one degree of freedom for error. Thus, the new model will actually be worse than the old one.

The magnitude of R² also depends on the range of variability in the regressor variable. Generally R² will increase as the spread of the x’s increases and decrease as the spread of the x’s decreases provided the assumed model form is correct. By the delta method (also see Hahn 1973), one can show that the expected value of R² from a straight-line regression is approximately

Clearly the expected value of R² will increase (decrease) as Sxx (a measure of the spread of the x’s) increases (decreases). Thus, a large value of R² may result simply because x has been varied over an unrealistically large range. On the other hand, R² may be small because the range of x was too small to allow its relationship with y to be detected.

There are several other misconceptions about R². In general, R² does not measure the magnitude of the slope of the regression line. A large value of R² does not imply a steep slope. Furthermore, R² does not measure the appropriateness of the linear model, for R² will often be large even though y and x are nonlinearly related. For example, R² for the regression equation in Figure 2.3b will be relatively large even though the linear approximation is poor. Remember that although R² is large, this does not necessarily imply that the regression model will be an accurate predictor.

Introduction to Linear Regression Analysis

Подняться наверх