Читать книгу Introduction to Linear Regression Analysis - Douglas C. Montgomery - Страница 35
2.3.3 Analysis of Variance
ОглавлениеWe may also use an analysis-of-variance approach to test significance of regression. The analysis of variance is based on a partitioning of total variability in the response variable y. To obtain this partitioning, begin with the identity
Squaring both sides of Eq. (2.31) and summing over all n observations produces
Note that the third term on the right-hand side of this expression can be rewritten as
since the sum of the residuals is always zero (property 1, Section 2.2.2) and the sum of the residuals weighted by the corresponding fitted value is also zero (property 5, Section 2.2.2). Therefore,
The left-hand side of Eq. (2.32) is the corrected sum of squares of the observations, SST, which measures the total variability in the observations. The two components of SST measure, respectively, the amount of variability in the observations yi accounted for by the regression line and the residual variation left unexplained by the regression line. We recognize as the residual or error sum of squares from Eq. (2.16). It is customary to call the regression or model sum of squares.
Equation (2.32) is the fundamental analysis-of-variance identity for a regression model. Symbolically, we usually write
Comparing Eq. (2.33) with Eq. (2.18) we see that the regression sum of squares may be computed as
The degree-of-freedom breakdown is determined as follows. The total sum of squares, SST, has dfT = n − 1 degrees of freedom because one degree of freedom is lost as a result of the constraint on the deviations . The model or regression sum of squares, SSR, has dfR = 1 degree of freedom because SSR is completely determined by one parameter, namely, [see Eq. (2.34)]. Finally, we noted previously that SSR has dfRes = n − 2 degrees of freedom because two constraints are imposed on the deviations as a result of estimating and . Note that the degrees of freedom have an additive property:
We can use the usual analysis-of-variance F test to test the hypothesis H0: β1 = 0. Appendix C.3 shows that (1) SSRes = (n − 2)MSRes/σ2 follows a distribution; (2) if the null hypothesis H0: β1 = 0 is true, then SSR/σ2 follows a distribution; and (3) SSRes and SSR are independent. By the definition of an F statistic given in Appendix C.1,
follows the F1,n−2 distribution. Appendix C.3 also shows that the expected values of these mean squares are
These expected mean squares indicate that if the observed value of F0 is large, then it is likely that the slope β1 ≠ 0. Appendix C.3 also shows that if β1 ≠ 0, then F0 follows a noncentral F distribution with 1 and n − 2 degrees of freedom and a non-centrality parameter of
This noncentrality parameter also indicates that the observed value of F0 should be large if β1 ≠ 0. Therefore, to test the hypothesis H0: β1 = 0, compute the test statistic F0 and reject H0 if
The test procedure is summarized in Table 2.4.
TABLE 2.4 Analysis of Variance for Testing Significance of Regression
Source of Variation | Sum of Squares | Degrees of Freedom | Mean Square | F 0 |
Regression | 1 | MS R | MSR/MSRes | |
Residual | n − 2 | MS Res | ||
Total | SS T | n − 1 |