Читать книгу Probability with R - Jane M. Horgan - Страница 56

3.5 THE LINE OF BEST FIT

Оглавление

Returning to Fig. 3.14, we can see that there is a in these data. One variable increases with the other; not surprisingly, students doing well in Programming in Semester 1 are likely to do well also in Programming in Semester 2, and those doing badly in Semester 1 will tend to do badly in Semester 2. We might ask, if it is possible to estimate the Semester 2 results from those obtained in Semester 1.

In the case of the Programming subjects, we have a set of points (, ), and having established, from the scatter plot, that a linear trend exists, we attempt to fit a line that best fits the data. In R

lm(prog2∼prog1)

calculates what is referred to as the linear model (lm) of on , or simply the line


that best fits the data.

The output is

Call: lm(formula = prog2∼prog1) Coefficients: (Intercept) prog1 -5.455 0.960

Therefore, the line that best fits these data is


To draw this line on the scatter diagram, write

plot(prog2, prog1) abline(lm(prog2∼prog1))

which gives Fig. 3.16.


Figure 3.16 The Line of Best Fit

The line of best fit may be used to make predictions. For example, we might be able to predict how students will do in Semester 2 from the results that they obtained in Semester 1. If the mark on Programming 1 for a particular student is 70, that student would be expected to do well also in Programming 2, estimated to obtain . A student doing badly in Programming 1, 30 say, would also be expected to do badly in Programming 2. . These predictions may not be exact but, if the linear trend is strong and past trends continue, they will be reasonably close.

A word of warning is appropriate here. The estimated values are based on the assumption that the past trend continues. This may not always be the case. For example, students who do badly in Semester 1, may get such a shock that they work harder in Semester 2, and change the pattern. Similarly, students getting high marks in Semester 1 may be lulled into a sense of false security and take it easy in Semester 2. Consequently, they may not do as well as expected. Hence, the Semester 1 trends may not continue, and the model may no longer be valid.

Probability with R

Подняться наверх