Читать книгу Business Experiments with R - B. D. McCullough - Страница 16

Software Details

Reproduce the above graphs using the data file WorldBankData.csv …

Below is code for the first graph. To create the next graph, you will have to create a new variable, the natural logarithm of newspapersper1000.

df <- read.csv("WorldBankData.csv") # "df" is the data frame. plot(df$newspapersper1000,df$lifeexp,xlab="Newspapers per1000", ylab="Life Expectancy",pch=19,cex.axis=1.5,cex.lab=1.15) abline(lm(lifeexp∼newspapersper1000,data=df),lty=2) lines(lowess(df$newspapersper1000,df$lifeexp)) plot(log(df$newspapersper1000),df$lifeexp,xlab="log(Newspapers per1000)",ylab="Life Expectancy",pch=19,cex.axis=1.5, cex.lab=1.15) abline(lm(lifeexp∼log(newspapersper1000),data=df),lty=2) lines(lowess(log(df$newspapersper1000),df$lifeexp))

To analyze these data, we can run a regression of life expectancy (LE) in years against the natural logarithm of the number of newspapers per 1000 persons (LN) for a large number of countries in a given year. The results are

(1.1)

where standard errors are in parentheses, so both the coefficients have very high ‐statistics and are significant. This means that there is a relationship between life expectancy and the number of newspapers per 1000 people. But does this show that a country having more newspapers leads to longer lives for its citizens? Common sense says probably not. The natural logarithm of the number of newspapers is probably a proxy for other variables that drive life expectancy; countries that can afford newspapers can probably also afford better food, housing, and medical services. What we are observing is most likely a mere correlation, and, unfortunately, this sort of observational analysis should not be interpreted as causal.

Подняться наверх