Читать книгу Business Experiments with R - B. D. McCullough - Страница 40

Section 1.2 “Case: Credit Card Defaults”

Оглавление

• The credit card data set is the “default of credit card clients Data Set” from https://archive.ics.uci.edu/ml/index.html. For the education variable, numbers 4, 5, and 6 were converted to “other,” similarly for the marriage variable values 0 and 4.

• It is not a good idea to run a linear regression with the variable default on the left‐hand side because default is binary (takes on only the values zero and one) and linear regression is for continuous dependent variables. There is a special method for binary dependent variables called “logistic regression,” but that's something for an advanced statistics course.

• The idea of the garden of forking paths is discussed clearly and nontechnically in Gelman and Loken (2014), which article was included in Best Math Writing of 2015; although nontechnical, it's an excellent read for the statistically inclined person, too.

• In general, lurking variables affect observational data and confounding variables affect designed experiments. A lurking variable connects two otherwise unconnected variables, creating the appearance of a causal relation between two other variables. Consider the firefighter example, where the number of firefighters is highly correlated with the damage caused by the fire. Adding more firefighters doesn't increase the amount of damage (the variables are really unconnected). Rather, the lurking variable “intensity of the fire” connects them. A lurking variable (say, ) creates the illusion of a causal relationship between two other variables, and . A good article on how to detect lurking variables is Joiner (1981).

• Variables are confounded when we cannot separate their respective effects on the response. A confounding variable has an effect on the response , but another variable also has an effect on , and we are unable to separate the effects of and . For example, might be store sales and is a store promotion, while is bad weather. We cannot determine the true effect of the promotion on sales because it is confounded with the weather.

Confounding can occur in a poorly designed experiment. Suppose you wish to determine the effects of fees and interest rates on credit card use. Suppose you offer a low fee and low interest rate to one group and a high fee and a high interest rate to the other group. The first group will have more credit card use and the second group less use, but you won't be able to tell whether the low fee or the low rate caused more use in the first group or whether the high fee and the high interest rate caused less use in the second group. The rate and the fee are confounded.

On the other hand, we could isolate the effect of rate by offering low fee and high rate to the first group and low fee and low rate to the second group. In more advanced designs, we sometimes will have many effects and be unable to isolate them all. In such a situation, we will deliberately confound the effects that we don't care about so much so that we can isolate the effects that we do care about. We will address this in Chapter 8.

Business Experiments with R

Подняться наверх