Introduction to Linear Regression Analysis
Реклама. ООО «ЛитРес», ИНН: 7719571260.
Оглавление
Douglas C. Montgomery. Introduction to Linear Regression Analysis
Table of Contents
List of Illustrations
Guide
Pages
INTRODUCTION TO LINEAR REGRESSION ANALYSIS
PREFACE
CHANGES IN THE SIXTH EDITION
USING THE BOOK AS A TEXT
ACKNOWLEDGMENTS
ABOUT THE COMPANION WEBSITE
CHAPTER 1. INTRODUCTION. 1.1 REGRESSION AND MODEL BUILDING
1.2 DATA COLLECTION
Example 1.1
Retrospective Study
Observational Study
Designed Experiment
1.3 USES OF REGRESSION
1.4 ROLE OF THE COMPUTER
CHAPTER 2. SIMPLE LINEAR REGRESSION. 2.1 SIMPLE LINEAR REGRESSION MODEL
2.2 LEAST-SQUARES ESTIMATION OF THE PARAMETERS
2.2.1 Estimation of β0 and β1
Example 2.1 The Rocket Propellant Data
Computer Output
2.2.2 Properties of the Least-Squares Estimators and the Fitted Regression Model
2.2.3 Estimation of σ2
Example 2.2 The Rocket Propellant Data
2.2.4 Alternate Form of the Model
2.3 HYPOTHESIS TESTING ON THE SLOPE AND INTERCEPT
2.3.1 Use of t Tests
2.3.2 Testing Significance of Regression
Example 2.3 The Rocket Propellant Data
Minitab Output
2.3.3 Analysis of Variance
Example 2.4 The Rocket Propellant Data
More About the t Test
2.4 INTERVAL ESTIMATION IN SIMPLE LINEAR REGRESSION
2.4.1 Confidence Intervals on β0, β1, and σ2
Example 2.5 The Rocket Propellant Data
2.4.2 Interval Estimation of the Mean Response
Example 2.6 The Rocket Propellant Data
2.5 PREDICTION OF NEW OBSERVATIONS
Example 2.7 The Rocket Propellant Data
2.6 COEFFICIENT OF DETERMINATION
2.7 A SERVICE INDUSTRY APPLICATION OF REGRESSION
2.8 DOES PITCHING WIN BASEBALL GAMES?
2.9 USING SAS® AND R FOR SIMPLE LINEAR REGRESSION
2.10 SOME CONSIDERATIONS IN THE USE OF REGRESSION
2.11 REGRESSION THROUGH THE ORIGIN
Example 2.8 The Shelf-Stocking Data
2.12 ESTIMATION BY MAXIMUM LIKELIHOOD
2.13 CASE WHERE THE REGRESSOR x IS RANDOM
2.13.1 x and y Jointly Distributed
2.13.2 x and y Jointly Normally Distributed: Correlation Model
Example 2.9 The Delivery Time Data
PROBLEMS
CHAPTER 3. MULTIPLE LINEAR REGRESSION
3.1 MULTIPLE REGRESSION MODELS
3.2 ESTIMATION OF THE MODEL PARAMETERS. 3.2.1 Least-Squares Estimation of the Regression Coefficients
Example 3.1 The Delivery Time Data
Computer Output
3.2.2 A Geometrical Interpretation of Least Squares
3.2.3 Properties of the Least-Squares Estimators
3.2.4 Estimation of σ2
Example 3.2 The Delivery Time Data
3.2.5 Inadequacy of Scatter Diagrams in Multiple Regression
3.2.6 Maximum-Likelihood Estimation
3.3 HYPOTHESIS TESTING IN MULTIPLE LINEAR REGRESSION
3.3.1 Test for Significance of Regression
Example 3.3 The Delivery Time Data
Minitab Output
R2and Adjusted R2
3.3.2 Tests on Individual Regression Coefficients and Subsets of Coefficients
Example 3.4 The Delivery Time Data
Minitab Output
Example 3.5 The Delivery Time Data
3.3.3 Special Case of Orthogonal Columns in X
3.3.4 Testing the General Linear Hypothesis
Example 3.6 Testing Equality of Regression Coefficients
Example 3.7
3.4 CONFIDENCE INTERVALS IN MULTIPLE REGRESSION
3.4.1 Confidence Intervals on the Regression Coefficients
Example 3.8 The Delivery Time Data
3.4.2 CI Estimation of the Mean Response
Example 3.9 The Delivery Time Data
3.4.3 Simultaneous Confidence Intervals on Regression Coefficients
Example 3.10 The Rocket Propellant Data
Example 3.11 The Rocket Propellant Data
3.5 PREDICTION OF NEW OBSERVATIONS
Example 3.12 The Delivery Time Data
3.6 A MULTIPLE REGRESSION MODEL FOR THE PATIENT SATISFACTION DATA
3.7 DOES PITCHING AND DEFENSE WIN BASEBALL GAMES?
3.8 USING SAS AND R FOR BASIC MULTIPLE LINEAR REGRESSION
3.9 HIDDEN EXTRAPOLATION IN MULTIPLE REGRESSION
Example 3.13 Hidden Extrapolation—The Delivery Time Data
3.10 STANDARDIZED REGRESSION COEFFICIENTS
Unit Normal Scaling
Unit Length Scaling
Example 3.14 The Delivery Time Data
3.11 MULTICOLLINEARITY
3.12 WHY DO REGRESSION COEFFICIENTS HAVE THE WRONG SIGN?
PROBLEMS
CHAPTER 4. MODEL ADEQUACY CHECKING. 4.1 INTRODUCTION
4.2 RESIDUAL ANALYSIS. 4.2.1 Definition of Residuals
4.2.2 Methods for Scaling Residuals
Standardized Residuals
Studentized Residuals
PRESS Residuals
R-Student
Example 4.1 The Delivery Time Data
4.2.3 Residual Plots
Normal Probability Plot
Example 4.2 The Delivery Time Data
Plot of Residuals against the Fitted Values
Example 4.3 The Delivery Time Data
Plot of Residuals against the Regressor
Example 4.4 The Delivery Time Data
Plot of Residuals in Time Sequence
4.2.4 Partial Regression and Partial Residual Plots
Example 4.5 The Delivery Time Data
Some Comments on Partial Regression Plots
Partial Residual Plots
4.2.5 Using Minitab®, SAS, and R for Residual Analysis
4.2.6 Other Residual Plotting and Analysis Methods
Statistical Tests on Residuals
4.3 PRESS STATISTIC
Example 4.6 The Delivery Time Data
R2for Prediction Based on PRESS
Using PRESS to Compare Models
4.4 DETECTION AND TREATMENT OF OUTLIERS
Example 4.7 The Rocket Propellant Data
4.5 LACK OF FIT OF THE REGRESSION MODEL
4.5.1 A Formal Test for Lack of Fit
Example 4.8 Testing for Lack of Fit
Example 4.9 Testing for Lack of Fit in JMP
4.5.2 Estimation of Pure Error from Near Neighbors
Example 4.10 The Delivery Time Data
PROBLEMS
CHAPTER 5. TRANSFORMATIONS AND WEIGHTING TO CORRECT MODEL INADEQUACIES. 5.1 INTRODUCTION
5.2 VARIANCE-STABILIZING TRANSFORMATIONS
Example 5.1 The Electric Utility Data
5.3 TRANSFORMATIONS TO LINEARIZE THE MODEL
Example 5.2 The Windmill Data
5.4 ANALYTICAL METHODS FOR SELECTING A TRANSFORMATION
5.4.1 Transformations on y: The Box-Cox Method
Computational Procedure
An Approximate Confidence Interval for λ
Example 5.3 The Electric Utility Data
5.4.2 Transformations on the Regressor Variables
Example 5.4 The Windmill Data
5.5 GENERALIZED AND WEIGHTED LEAST SQUARES
5.5.1 Generalized Least Squares
5.5.2 Weighted Least Squares
5.5.3 Some Practical Issues
Example 5.5 Weighted Least Squares
5.6 REGRESSION MODELS WITH RANDOM EFFECTS. 5.6.1 Subsampling
Example 5.6 The Helicopter Subsampling Study
5.6.2 The General Situation for a Regression Model with a Single Random Effect
Example 5.7 The Delivery Time Data Revisited
5.6.3 The Importance of the Mixed Model in Regression
PROBLEMS
CHAPTER 6. DIAGNOSTICS FOR LEVERAGE AND INFLUENCE. 6.1 IMPORTANCE OF DETECTING INFLUENTIAL OBSERVATIONS
6.2 LEVERAGE
Example 6.1 The Delivery Time Data
6.3 MEASURES OF INFLUENCE: COOK’S D
Example 6.2 The Delivery Time Data
6.4 MEASURES OF INFLUENCE: DFFITS AND DFBETAS
A Remark on Cutoff Values
Example 6.3 The Delivery Time Data
6.5 A MEASURE OF MODEL PERFORMANCE
Example 6.4 The Delivery Time Data
6.6 DETECTING GROUPS OF INFLUENTIAL OBSERVATIONS
6.7 TREATMENT OF INFLUENTIAL OBSERVATIONS
PROBLEMS
CHAPTER 7. POLYNOMIAL REGRESSION MODELS. 7.1 INTRODUCTION
7.2 POLYNOMIAL MODELS IN ONE VARIABLE. 7.2.1 Basic Principles
Example 7.1 The Hardwood Data
7.2.2 Piecewise Polynomial Fitting (Splines)
Example 7.2 Voltage Drop Data
Example 7.3 Piecewise Linear Regression
7.2.3 Polynomial and Trigonometric Terms
7.3 NONPARAMETRIC REGRESSION
7.3.1 Kernel Regression
7.3.2 Locally Weighted Regression (Loess)
Example 7.4 Applying Loess Regression to the Windmill Data
7.3.3 Final Cautions
7.4 POLYNOMIAL MODELS IN TWO OR MORE VARIABLES
7.5 ORTHOGONAL POLYNOMIALS
Example 7.5 Orthogonal Polynomials
PROBLEMS
CHAPTER 8. INDICATOR VARIABLES. 8.1 GENERAL CONCEPT OF INDICATOR VARIABLES
Example 8.1 The Tool Life Data
Example 8.2 The Tool Life Data
Example 8.3 An Indicator Variable with More Than Two Levels
Example 8.4 More Than One Indicator Variable
Example 8.5 Comparing Regression Models
a. Parallel Lines
b. Concurrent Lines
c. Coincident Lines
8.2 COMMENTS ON THE USE OF INDICATOR VARIABLES. 8.2.1 Indicator Variables versus Regression on Allocated Codes
8.2.2 Indicator Variables as a Substitute for a Quantitative Regressor
8.3 REGRESSION APPROACH TO ANALYSIS OF VARIANCE
PROBLEMS
CHAPTER 9. MULTICOLLINEARITY. 9.1 INTRODUCTION
9.2 SOURCES OF MULTICOLLINEARITY
9.3 EFFECTS OF MULTICOLLINEARITY
Example 9.1 The Acetylene Data
9.4 MULTICOLLINEARITY DIAGNOSTICS
9.4.1 Examination of the Correlation Matrix
9.4.2 Variance Inflation Factors
9.4.3 Eigensystem Analysis of X′X
9.4.4 Other Diagnostics
9.4.5 SAS and R Code for Generating Multicollinearity Diagnostics
9.5 METHODS FOR DEALING WITH MULTICOLLINEARITY
9.5.1 Collecting Additional Data
9.5.2 Model Respecification
9.5.3 Ridge Regression
Example 9.2 The Acetylene Data
Some Other Properties of Ridge Regression
Relationship to Other Estimators
Methods for Choosing k
Generalized Regression Techniques
9.5.4 Principal-Component Regression
Example 9.3 Principal-Component Regression for the Acetylene Data
9.5.5 Comparison and Evaluation of Biased Estimators
9.6 USING SAS TO PERFORM RIDGE AND PRINCIPAL-COMPONENT REGRESSION
PROBLEMS
CHAPTER 10. VARIABLE SELECTION AND MODEL BUILDING. 10.1 INTRODUCTION. 10.1.1 Model-Building Problem
10.1.2 Consequences of Model Misspecification
10.1.3 Criteria for Evaluating Subset Regression Models
Coefficient of Multiple Determination
Adjusted R2
Residual Mean Square
Mallows’s Cp Statistic
The Akaike Information Criterion and Bayesian Analogues (BICs)
Uses of Regression and Model Evaluation Criteria
10.2 COMPUTATIONAL TECHNIQUES FOR VARIABLE SELECTION
10.2.1 All Possible Regressions
Example 10.1 The Hald Cement Data
Efficient Generation of All Possible Regressions
10.2.2 Stepwise Regression Methods
Forward Selection
Example 10.2 Forward Selection—Hald Cement Data
Backward Elimination
Example 10.3 Backward Elimination—Hald Cement Data
Stepwise Regression
Example 10.4 Stepwise Regression—Hald Cement Data
General Comments on Stepwise-Type Procedures
Stopping Rules for Stepwise Procedures
10.3 STRATEGY FOR VARIABLE SELECTION AND MODEL BUILDING
10.4 CASE STUDY: GORMAN AND TOMAN ASPHALT DATA USING SAS
PROBLEMS
CHAPTER 11. VALIDATION OF REGRESSION MODELS. 11.1 INTRODUCTION
11.2 VALIDATION TECHNIQUES
11.2.1 Analysis of Model Coefficients and Predicted Values
Example 11.1 The Hald Cement Data
11.2.2 Collecting Fresh Data—Confirmation Runs
Example 11.2 The Delivery Time Data
11.2.3 Data Splitting
Example 11.3 The Delivery Time Data
11.3 DATA FROM PLANNED EXPERIMENTS
PROBLEMS
CHAPTER 12. INTRODUCTION TO NONLINEAR REGRESSION
12.1 LINEAR AND NONLINEAR REGRESSION MODELS. 12.1.1 Linear Regression Models
12.1.2 Nonlinear Regression Models
12.2 ORIGINS OF NONLINEAR MODELS
Example 12.1
Example 12.2
12.3 NONLINEAR LEAST SQUARES
Example 12.3 Normal Equations for a Nonlinear Model
Geometry of Linear and Nonlinear Least Squares
Maximum-Likelihood Estimation
12.4 TRANFORMATION TO A LINEAR MODEL
Example 12.4 The Puromycin Data
12.5 PARAMETER ESTIMATION IN A NONLINEAR SYSTEM. 12.5.1 Linearization
Example 12.5 The Puromycin Data
Computer Programs
Estimation of σ2
Graphical Perspective on Linearization
12.5.2 Other Parameter Estimation Methods
Method of Steepest Descent
Fractional Increments
Marquardt’s Compromise
12.5.3 Starting Values
12.6 STATISTICAL INFERENCE IN NONLINEAR REGRESSION
Example 12.6 The Puromycin Data
Validity of Approximate Inference
12.7 EXAMPLES OF NONLINEAR REGRESSION MODELS
12.8 USING SAS AND R
PROBLEMS
CHAPTER 13. GENERALIZED LINEAR MODELS. 13.1 INTRODUCTION
13.2 LOGISTIC REGRESSION MODELS. 13.2.1 Models with a Binary Response Variable
13.2.2 Estimating the Parameters in a Logistic Regression Model
Example 13.1 The Pneumoconiosis Data
13.2.3 Interpretation of the Parameters in a Logistic Regression Model
Example 13.2 The Pneumoconiosis Data
13.2.4 Statistical Inference on Model Parameters
Likelihood Ratio Tests
Testing Goodness of Fit
Testing Hypotheses on Subsets of Parameters Using Deviance
Example 13.3 The Pneumoconiosis Data
Tests on Individual Model Coefficients
Example 13.4 The Pneumoconiosis Data
Confidence Intervals
Example 13.5 The Pneumoconiosis Data
Example 13.6 The Pneumoconiosis Data
Example 13.7 The Pneumoconiosis Data
13.2.5 Diagnostic Checking in Logistic Regression
13.2.6 Other Models for Binary Response Data
13.2.7 More Than Two Categorical Outcomes
13.3 POISSON REGRESSION
Example 13.8 The Aircraft Damage Data
13.4 THE GENERALIZED LINEAR MODEL
13.4.1 Link Functions and Linear Predictors
13.4.2 Parameter Estimation and Inference in the GLM
Example 13.9 The Worsted Yarn Experiment
13.4.3 Prediction and Estimation with the GLM
Example 13.10 The Worsted Yarn Experiment
13.4.4 Residual Analysis in the GLM
Example 13.11 The Worsted Yarn Experiment
13.4.5 Using R to Perform GLM Analysis
13.4.6 Overdispersion
PROBLEMS
CHAPTER 14. REGRESSION ANALYSIS OF TIME SERIES DATA. 14.1 INTRODUCTION TO REGRESSION MODELS FOR TIME SERIES DATA
14.2 DETECTING AUTOCORRELATION: THE DURBIN–WATSON TEST
Example 14.1
14.3 ESTIMATING THE PARAMETERS IN TIME SERIES REGRESSION MODELS
Example 14.2
The Cochrane–Orcutt Method
Example 14.3
The Maximum Likelihood Approach
Example 14.4
Prediction of New Observations and Prediction Intervals
The Case Where the Predictor Variable Must Also Be Forecast
Alternate Forms of the Model
Example 14.5
PROBLEMS
CHAPTER 15. OTHER TOPICS IN THE USE OF REGRESSION ANALYSIS
15.1 ROBUST REGRESSION. 15.1.1 Need for Robust Regression
15.1.2 M-Estimators
Example 15.1 The Stack Loss Data
Computing M-Estimates
15.1.3 Properties of Robust Estimators
Breakdown Point
Efficiency
15.2 EFFECT OF MEASUREMENT ERRORS IN THE REGRESSORS
15.2.1 Simple Linear Regression
15.2.2 The Berkson Model
15.3 INVERSE ESTIMATION—THE CALIBRATION PROBLEM
Example 15.2 Thermocouple Calibration
Other Approaches
15.4 BOOTSTRAPPING IN REGRESSION
15.4.1 Bootstrap Sampling in Regression
15.4.2 Bootstrap Confidence Intervals
Example 15.3 The Delivery Time Data
Example 15.4 The Puromycin date
15.5 CLASSIFICATION AND REGRESSION TREES (CART)
Example 15.5 The Gasoline Mileage Data
15.6 NEURAL NETWORKS
15.7 DESIGNED EXPERIMENTS FOR REGRESSION
PROBLEMS
APPENDIX A. STATISTICAL TABLES
APPENDIX B. DATA SETS FOR EXERCISES
APPENDIX C. SUPPLEMENTAL TECHNICAL MATERIAL
C.1 BACKGROUND ON BASIC TEST STATISTICS
C.1.1 Central Distributions
C.1.2 Noncentral Distributions
C.2 BACKGROUND FROM THE THEORY OF LINEAR MODELS. C.2.1 Basic Definitions
C.2.2 Matrix Derivatives
C.2.3 Expectations
C.2.4 Distribution Theory
C.3 IMPORTANT RESULTS ON SSR AND SSRES. C.3.1 SSR
C.3.2 SSRes
C.3.3 Global or Overall F Test
C.3.4 Extra-Sum-of-Squares Principle
C.3.5 Relationship of the t Test for an Individual Coefficient and the Extra-Sum-of-Squares Principle
C.4 GAUSS–MARKOV THEOREM, VAR(ε) = σ2I
C.5 COMPUTATIONAL ASPECTS OF MULTIPLE REGRESSION
C.6 RESULT ON THE INVERSE OF A MATRIX
C.7 DEVELOPMENT OF THE PRESS STATISTIC
C.8 DEVELOPMENT OF
C.9 OUTLIER TEST BASED ON R-STUDENT
C.10 INDEPENDENCE OF RESIDUALS AND FITTED VALUES
C.11 GAUSS-MARKOV THEOREM, VAR(ε) = V
C.12 BIAS IN MSRES WHEN THE MODEL IS UNDERSPECIFIED
C.13 COMPUTATION OF INFLUENCE DIAGNOSTICS
C.13.1 DFFITSi
C.13.2 Cook’s Di
C.13.3 DFBETASj,i
C.14 GENERALIZED LINEAR MODELS. C.14.1 Parameter Estimation in Logistic Regression
C.14.2 Exponential Family
C.14.3 Parameter Estimation in the Generalized Linear Model
APPENDIX D. INTRODUCTION TO SAS
D.1 BASIC DATA ENTRY
Step 1: Open the SAS Editor Window
Step 2: The Data Command
Step 3: The Input Command
Step 4: Give the Actual Data
Step 5: Using PROC PRINT to Check Data Entry
D.2 CREATING PERMANENT SAS DATA SETS
Step 1: Specify the Directory for the Permanent Data Set
Step 2: Use the Data Statement to Create the Data Set
D.3 IMPORTING DATA FROM AN EXCEL FILE
Step 1: Export the EXCEL Spreadsheet
Step 2: Get the dbf File into UNIX Format
Step 4: When in Doubt, Contact Your System’s Administrator!
D.4 OUTPUT COMMAND
D.5 LOG FILE
D.6 ADDING VARIABLES TO AN EXISTING SAS DATA SET
APPENDIX E. INTRODUCTION TO R TO PERFORM LINEAR REGRESSION ANALYSIS
E.1 BASIC BACKGROUND ON R
E.2 BASIC DATA ENTRY
E.3 BRIEF COMMENTS ON OTHER FUNCTIONALITY IN R
E.4 R COMMANDER
REFERENCES
INDEX
WILEY SERIES IN PROBABILITY AND STATISTICS
WILEY END USER LICENSE AGREEMENT
Отрывок из книги
WILEY SERIES IN PROBABILITY AND STATISTICS
.....
Equation (2.43) points out that the issue of extrapolation is much more subtle; the further the x value is from the center of the data, the more variable our estimate of E(y|x0). Please note, however, that nothing “magical” occurs at the boundary of the x space. It is not reasonable to think that the prediction is wonderful at the observed data value most remote from the center of the data and completely awful just beyond it. Clearly, Eq. (2.43) points out that we should be concerned about prediction quality as we approach the boundary and that as we move beyond this boundary, the prediction may deteriorate rapidly. Furthermore, the farther we move away from the original region of x space, the more likely it is that equation or model error will play a role in the process.
This is not the same thing as saying “never extrapolate.” Engineers and economists routinely use prediction equations to forecast a variable of interest one or more time periods in the future. Strictly speaking, this forecast is an extrapolation. Equation (2.43) supports such use of the prediction equation. However, Eq. (2.43) does not support using the regression model to forecast many periods in the future. Generally, the greater the extrapolation, the higher is the chance of equation error or model error impacting the results.
.....