Читать книгу Introduction to Linear Regression Analysis - Douglas C. Montgomery - Страница 35

2.3.3 Analysis of Variance

We may also use an analysis-of-variance approach to test significance of regression. The analysis of variance is based on a partitioning of total variability in the response variable y. To obtain this partitioning, begin with the identity

(2.31)

Squaring both sides of Eq. (2.31) and summing over all n observations produces

Note that the third term on the right-hand side of this expression can be rewritten as

since the sum of the residuals is always zero (property 1, Section 2.2.2) and the sum of the residuals weighted by the corresponding fitted value is also zero (property 5, Section 2.2.2). Therefore,

(2.32)

The left-hand side of Eq. (2.32) is the corrected sum of squares of the observations, SS_T, which measures the total variability in the observations. The two components of SS_T measure, respectively, the amount of variability in the observations yi accounted for by the regression line and the residual variation left unexplained by the regression line. We recognize as the residual or error sum of squares from Eq. (2.16). It is customary to call the regression or model sum of squares.

Equation (2.32) is the fundamental analysis-of-variance identity for a regression model. Symbolically, we usually write

(2.33)

Comparing Eq. (2.33) with Eq. (2.18) we see that the regression sum of squares may be computed as

(2.34)

The degree-of-freedom breakdown is determined as follows. The total sum of squares, SS_T, has df_T = n − 1 degrees of freedom because one degree of freedom is lost as a result of the constraint on the deviations . The model or regression sum of squares, SS_R, has df_R = 1 degree of freedom because SS_R is completely determined by one parameter, namely, [see Eq. (2.34)]. Finally, we noted previously that SS_R has df_Res = n − 2 degrees of freedom because two constraints are imposed on the deviations as a result of estimating and . Note that the degrees of freedom have an additive property:

(2.35)

We can use the usual analysis-of-variance F test to test the hypothesis H₀: β₁ = 0. Appendix C.3 shows that (1) SS_Res = (n − 2)MS_Res/σ² follows a distribution; (2) if the null hypothesis H₀: β₁ = 0 is true, then SS_R/σ² follows a distribution; and (3) SS_Res and SS_R are independent. By the definition of an F statistic given in Appendix C.1,

(2.36)

follows the F_1,n−2 distribution. Appendix C.3 also shows that the expected values of these mean squares are

These expected mean squares indicate that if the observed value of F₀ is large, then it is likely that the slope β₁ ≠ 0. Appendix C.3 also shows that if β₁ ≠ 0, then F₀ follows a noncentral F distribution with 1 and n − 2 degrees of freedom and a non-centrality parameter of

This noncentrality parameter also indicates that the observed value of F₀ should be large if β₁ ≠ 0. Therefore, to test the hypothesis H₀: β₁ = 0, compute the test statistic F₀ and reject H₀ if

The test procedure is summarized in Table 2.4.

TABLE 2.4 Analysis of Variance for Testing Significance of Regression

Source of Variation	Sum of Squares	Degrees of Freedom	Mean Square	F ₀
Regression		1	MS _R	MS_R/MS_Res
Residual		n − 2	MS _Res
Total	SS _T	n − 1

Introduction to Linear Regression Analysis

Подняться наверх