Читать книгу Business Experiments with R - B. D. McCullough - Страница 21
Try it!
ОглавлениеWe encourage you to replicate the analysis in this chapter using the data in the file credit.csv
. Computing crosstabs can be done in a spreadsheet using pivot tables. Most statistical tools also have a cross‐tabulation function.
df <- read.csv("credit.csv",header=TRUE) # Table 1.1 table1 <- table(df$default,df$sex) # to get the counts table1 # to print out the table prop.table(table1,2) # to get column proportions prop.table(table1,1) # to get row proportions
In addition to the categorical variables in our data set like sex and marital status, we also have continuous variables like age. Perusing the boxplots in Figure 1.2, it appears that persons who do not default have higher credit limits than persons who default, while age appears to have no association with default status.
Figure 1.2 Boxplot of default vs. non‐default for credit limit and ages.
If it is really the case that persons with higher credit limits are less likely to default, can we decrease the default rate simply by giving everybody a higher credit limit?