Читать книгу Business Experiments with R - B. D. McCullough - Страница 22
Software Details
ОглавлениеTo reproduce Figure 1.2, load the data file credit.csv
…
boxplot(limit∼default, xlab="default", ylab="credit limit", data=df)
We have thus far looked at how the four variables are associated with default, individually. How might we examine the effects of all the variables at one time in order to answer the two fundamental questions?
The answer, of course, is to use regression to relate default to all four variables at once. Since default is a categorical variable with two levels, linear regression is not appropriate. We would have to use logistic regression instead. As for the independent variables, credit limit and age are continuous and require no special treatment before being included in the regression (though it may be advantageous to turn each into a categorical variables with, say, categories “low,” “medium,” and “high”). Sex and marital status are categorical variables and will have to be included as dummy variables. If you are unfamiliar with the creation of dummy variables, sex can be represented by a single dummy variable, say, :
Marital status (married, single, or divorced/widowed) will be represented by two dummy variables, and :
For a married person, and , for a person who is divorced/widowed and , while for a single person and .