Читать книгу Introduction to Linear Regression Analysis - Douglas C. Montgomery - Страница 48
2.9 USING SAS® AND R FOR SIMPLE LINEAR REGRESSION
ОглавлениеThe purpose of this section is to introduce readers to SAS and to R. Appendix D gives more details about using SAS, including how to import data from both text and EXCEL files. Appendix E introduces the R statistical software package. R is becoming increasingly popular since it is free over the Internet.
Table 2.7 gives the SAS source code to analyze the rocket propellant data that we have been analyzing throughout this chapter. Appendix D provides detail explaining how to enter the data into SAS. The statement PROC REG tells the software that we wish to perform an ordinary least-squares linear regression analysis. The “model” statement specifies the specific model and tells the software which analyses to perform. The variable name to the left of the equal sign is the response. The variables to the right of the equal sign but before the solidus are the regressors. The information after the solidus specifies additional analyses. By default, SAS prints the analysis-of-variance table and the tests on the individual coefficients. In this case, we have specified three options: “p” asks SAS to print the predicted values, “clm” (which stands for confidence limit, mean) asks SAS to print the confidence band, and “cli” (which stands for confidence limit, individual observations) asks SAS to print the prediction band.
Table 2.8 gives the SAS output for this analysis. PROC REG always produces the analysis-of-variance table and the information on the parameter estimates. The “p clm cli” options on the model statement produced the remainder of the output file.
TABLE 2.7 SAS Code for Rocket Propellant Data
data rocket; | |
input shear age; | |
cards; | |
2158.70 | 15.50 |
1678.15 | 23.75 |
2316.00 | 8.00 |
2061.30 | 17.00 |
2207.50 | 5.50 |
1708.30 | 19.00 |
1784.70 | 24.00 |
2575.00 | 2.50 |
2357.90 | 7.50 |
2256.70 | 11.00 |
2165.20 | 13.00 |
2399.55 | 3.75 |
1779.80 | 25.99 |
2336.75 | 9.75 |
1765.30 | 22.00 |
2053.50 | 18.00 |
2414.40 | 6.00 |
2200.50 | 12.50 |
2654.20 | 2.00 |
1753.70 | 21.50 |
proc reg; | |
model shear=age/p clm cli; | |
run; |
TABLE 2.8 SAS Output for Analysis of Rocket Propellant Data.
SAS also produces a log file that provides a brief summary of the SAS session. The log file is almost essential for debugging SAS code. Appendix D provides more details about this file.
R is a popular statistical software package, primarily because it is freely available at www.r-project.org. An easier-to-use version of R is R Commander. R itself is a high-level programming language. Most of its commands are prewritten functions. It does have the ability to run loops and call other routines, for example, in C. Since it is primarily a programming language, it often presents challenges to novice users. The purpose of this section is to introduce the reader as to how to use R to analyze simple linear regression data sets.
The first step is to create the data set. The easiest way is to input the data into a text file using spaces for delimiters. Each row of the data file is a record. The top row should give the names for each variable. All other rows are the actual data records. For example, consider the rocket propellant data from Example 2.1 given in Table 2.1. Let propellant.txt be the name of the data file. The first row of the text file gives the variable names:
strength age
The next row is the first data record, with spaces delimiting each data item:
2158.70 15.50
The R code to read the data into the package is:
prop <- read.table(“propellant.txt”,header=TRUE, sep=””)
The object prop is the R data set, and “propellant.txt” is the original data file. The phrase, header=TRUE tells R that the first row is the variable names. The phrase sep=”” tells R that the data are space delimited.
The commands
prop.model <- lm(strength~age, data=prop) summary(prop.model)
tell R
to estimate the model, and
to print the analysis of variance, the estimated coefficients, and their tests.
R Commander is an add-on package to R. It also is freely available. It provides an easy-to-use user interface, much like Minitab and JMP, to the parent R product. R Commander makes it much more convenient to use R; however, it does not provide much flexibility in its analysis. R Commander is a good way for users to get familiar with R. Ultimately, however, we recommend the use of the parent R product.
Figure 2.9 Two influential observations.
Figure 2.10 A point remote in x space.