Читать книгу Handbook of Regression Analysis With Applications in R - Samprit Chatterjee - Страница 16

1.2 Concepts and Background Material 1.2.1 THE LINEAR REGRESSION MODEL

Оглавление

The data consist of observations, which are sets of observed values that represent a random sample from a larger population. It is assumed that these observations satisfy a linear relationship,

(1.1)

where the coefficients are unknown parameters, and the are random error terms. By a linear model, it is meant that the model is linear in the parameters; a quadratic model,


paradoxically enough, is a linear model, since and are just versions of and .

It is important to recognize that this, or any statistical model, is not viewed as a true representation of reality; rather, the goal is that the model be a useful representation of reality. A model can be used to explore the relationships between variables and make accurate forecasts based on those relationships even if it is not the “truth.” Further, any statistical model is only temporary, representing a provisional version of views about the random process being studied. Models can, and should, change, based on analysis using the current model, selection among several candidate models, the acquisition of new data, new understanding of the underlying random process, and so on. Further, it is often the case that there are several different models that are reasonable representations of reality. Having said this, we will sometimes refer to the “true” model, but this should be understood as referring to the underlying form of the currently hypothesized representation of the regression relationship.


FIGURE 1.1: The simple linear regression model. The solid line corresponds to the true regression line, and the dotted lines correspond to the random errors .

The special case of (1.1) with corresponds to the simple regression model, and is consistent with the representation in Figure 1.1. The solid line is the true regression line, the expected value of given the value of . The dotted lines are the random errors that account for the lack of a perfect association between the predictor and the target variables.

Handbook of Regression Analysis With Applications in R

Подняться наверх