Читать книгу Elementary Regression Modeling - Roger A. Wojtkiewicz - Страница 28
Odds and Log Odds
ОглавлениеLinear regression, also known as ordinary least-squares regression, is appropriate when the dependent variable is interval. It is possible to apply ordinary least-squares regression to a nominal or ordinal variable coded 0 or 1 (called a dichotomous variable). However, there are problems statistically in using linear regression with a dichotomous dependent variable. Among these problems is that the predicted values when using linear regression with a dichotomous dependent variable may fall outside the range of possible values such as predicting a negative proportion or a proportion more than 1.0. Another problem is that the standard errors that linear regression will estimate for the regression coefficients are biased since they do not on average capture the population standard error.2
Linear regression with an interval dependent variable can be viewed as a method for comparing means. Research life would be simpler if we could use linear regression with a dichotomous dependent variable as a method for comparing proportions. Since using linear regression with a dichotomous variable has major drawbacks, statisticians have devised an approach for analyzing dichotomous variables called logistic regression that alleviates these drawbacks.
We can view logistic regression as a method for comparing log odds. Using log odds in logistic regression makes logistic regression more difficult to understand because we do not typically use log odds for analyzing frequency distributions.
Although the proportion compares f with the total N in a sample or a subgroup of a sample, the odds ratio compares f with “not f.” That is, if f is the frequency in a group, not f is the frequency not in the group or is N – f:
Where
fI is frequency in group
f2 is frequency not in group
The odds ratio in horse racing describes a horse’s chances of losing. An odds of 4:1 indicates that a horse is expected to lose four times for each win.
However, the dependent variable in logistic regression is not the odds, which would be difficult enough to interpret, but the dependent variable is the log odds, where the log is taken to the base e and is thus the natural logarithm (ln):
where
ln is the natural logarithm to the base e
Logistic regression is the standard regression method when the dependent variable is dichotomous. Logistic regression provides predicted values within the range of possible values and unbiased standard errors for the regression coefficient can be calculated.
I emphasize linear regression and logistic regression in this book. Understanding these two procedures provides a solid base for further use of regression analysis.
Table 2.1