Читать книгу Handbook of Regression Analysis With Applications in R - Samprit Chatterjee - Страница 18

1.2.3 ASSUMPTIONS

Оглавление

The least squares criterion will not necessarily yield sensible results unless certain assumptions hold. One is given in (1.1) — the linear model should be appropriate. In addition, the following assumptions are needed to justify using least squares regression.

1 The expected value of the errors is zero ( for all ). That is, it cannot be true that for certain observations the model is systematically too low, while for others it is systematically too high. A violation of this assumption will lead to difficulties in estimating . More importantly, this reflects that the model does not include a necessary systematic component, which has instead been absorbed into the error terms.

2 The variance of the errors is constant ( for all ). That is, it cannot be true that the strength of the model is greater for some parts of the population (smaller ) and less for other parts (larger ). This assumption of constant variance is called homoscedasticity, and its violation (nonconstant variance) is called heteroscedasticity. A violation of this assumption means that the least squares estimates are not as efficient as they could be in estimating the true parameters, and better estimates are available. More importantly, it also results in poorly calibrated confidence and (especially) prediction intervals.

3 The errors are uncorrelated with each other. That is, it cannot be true that knowing that the model underpredicts (for example) for one particular observation says anything at all about what it does for any other observation. This violation most often occurs in data that are ordered in time (time series data), where errors that are near each other in time are often similar to each other (such time‐related correlation is called autocorrelation). Violation of this assumption means that the least squares estimates are not as efficient as they could be in estimating the true parameters, and more importantly, its presence can lead to very misleading assessments of the strength of the regression.

4 The errors are normally distributed. This is needed if we want to construct any confidence or prediction intervals, or hypothesis tests, which we usually do. If this assumption is violated, hypothesis tests and confidence and prediction intervals can be very misleading.

Since violation of these assumptions can potentially lead to completely misleading results, a fundamental part of any regression analysis is to check them using various plots, tests, and diagnostics.

Handbook of Regression Analysis With Applications in R

Подняться наверх