Читать книгу Introduction to Linear Regression Analysis - Douglas C. Montgomery - Страница 57

PROBLEMS

Оглавление

1 2.1 Table B.1 gives data concerning the performance of the 26 National Football League teams in 1976. It is suspected that the number of yards gained rushing by opponents (x8) has an effect on the number of games won by a team (y).a. Fit a simple linear regression model relating games won y to yards gained rushing by opponents x8.b. Construct the analysis-of-variance table and test for significance of regression.c. Find a 95% CI on the slope.d. What percent of the total variability in y is explained by this model?e. Find a 95% CI on the mean number of games won if opponents’ yards rushing is limited to 2000 yards.

2 2.2 Suppose we would like to use the model developed in Problem 2.1 to predict the number of games a team will win if it can limit opponents’ yards rushing to 1800 yards. Find a point estimate of the number of games won when x8 = 1800. Find a 90% prediction interval on the number of games won.

3 2.3 Table B.2 presents data collected during a solar energy project at Georgia Tech.a. Fit a simple linear regression model relating total heat flux y (kilowatts) to the radial deflection of the deflected rays x4 (milliradians).b. Construct the analysis-of-variance table and test for significance of regression.c. Find a 99% CI on the slope.d. Calculate R2.e. Find a 95% CI on the mean heat flux when the radial deflection is 16.5 milliradians.

4 2.4 Table B.3 presents data on the gasoline mileage performance of 32 different automobiles.a. Fit a simple linear regression model relating gasoline mileage y (miles per gallon) to engine displacement xl (cubic inches).b. Construct the analysis-of-variance table and test for significance of regression.c. What percent of the total variability in gasoline mileage is accounted for by the linear relationship with engine displacement?d. Find a 95% CI on the mean gasoline mileage if the engine displacement is 275 in.3e. Suppose that we wish to predict the gasoline mileage obtained from a car with a 275-in.3 engine. Give a point estimate of mileage. Find a 95% prediction interval on the mileage.f. Compare the two intervals obtained in parts d and e. Explain the difference between them. Which one is wider, and why?

5 2.5 Consider the gasoline mileage data in Table B.3. Repeat Problem 2.4 (parts a, b, and c) using vehicle weight x10 as the regressor variable. Based on a comparison of the two models, can you conclude that x1 is a better choice of regressor than x10?

6 2.6 Table B.4 presents data for 27 houses sold in Erie, Pennsylvania.a. Fit a simple linear regression model relating selling price of the house to the current taxes (x1).b. Test for significance of regression.c. What percent of the total variability in selling price is explained by this model?d. Find a 95% CI on β1.e. Find a 95% CI on the mean selling price of a house for which the current taxes are $750.

7 2.7 The purity of oxygen produced by a fractional distillation process is thought to be related to the percentage of hydrocarbons in the main condensor of the processing unit. Twenty samples are shown below.Purity (%)Hydrocarbon (%)86.911.0289.851.1190.281.4386.341.1192.581.0187.330.9586.291.1191.860.8795.611.4389.861.0296.731.4699.421.5598.661.5596.071.5593.651.4087.311.1595.001.0196.850.9985.200.9590.560.98a. Fit a simple linear regression model to the data.b. Test the hypothesis H0: β1 = 0.c. Calculate R2.d. Find a 95% CI on the slope.e. Find a 95% CI on the mean purity when the hydrocarbon percentage is 1.00.

8 2.8 Consider the oxygen plant data in Problem 2.7 and assume that purity and hydrocarbon percentage are jointly normally distributed random variables.a. What is the correlation between oxygen purity and hydrocarbon percentage?b. Test the hypothesis that ρ = 0.c. Construct a 95% CI for ρ.

9 2.9 Consider the soft drink delivery time data in Table 2.10. After examining the original regression model (Example 2.9), one analyst claimed that the model was invalid because the intercept was not zero. He argued that if zero cases were delivered, the time to stock and service the machine would be zero, and the straight-line model should go through the origin. What would you say in response to his comments? Fit a no-intercept model to these data and determine which model is superior.

10 2.10 The weight and systolic blood pressure of 26 randomly selected males in the age group 25–30 are shown below. Assume that weight and blood pressure (BP) are jointly normally distributed.a. Find a regression line relating systolic blood pressure to weight.b. Estimate the correlation coefficient.c. Test the hypothesis that ρ = 0.d. Test the hypothesis that ρ = 0.6.e. Find a 95% CI for ρ.SubjectWeightSystolic BP1165130216713331801504155128521215161751467190150821014092001481014912511158133121691351317015014172153151591281616813217174149181831581921515020195163211801562214312423240170242351652519216026187159

11 2.11 Consider the weight and blood pressure data in Problem 2.10. Fit a no-intercept model to the data and compare it to the model obtained in Problem 2.10. Which model would you conclude is superior?

12 2.12 The number of pounds of steam used per month at a plant is thought to be related to the average monthly ambient temperature. The past year’s usages and temperatures follow.MonthTemperatureUsage/l000Jan.21185.79Feb.24214.47Mar.32288.03Apr.47424.84May50454.68Jun.59539.03Jul.68621.55Aug.74675.06Sep.62562.03Oct.50452.93Nov.41369.95Dec.30273.98a. Fit a simple linear regression model to the data.b. Test for significance of regression.c. Plant management believes that an increase in average ambient temperature of 1 degree will increase average monthly steam consumption by 10,000 lb. Do the data support this statement?d. Construct a 99% prediction interval on steam usage in a month with average ambient temperature of 58°.

13 2.13 Davidson (“Update on Ozone Trends in California’s South Coast Air Basin,” Air and Waste, 43, 226, 1993) studied the ozone levels in the South Coast Air Basin of California for the years 1976–1991. He believes that the number of days the ozone levels exceeded 0.20 ppm (the response) depends on the seasonal meteorological index, which is the seasonal average 850-millibar temperature (the regressor). The following table gives the data.YearDaysIndex19769116.7197710517.1197810618.2197910818.119808817.219819118.219825816.019838217.219848118.019856517.219866116.919874817.119886118.219894317.319903317.519913616.6a. Make a scatterplot of the data.b. Estimate the prediction equation.c. Test for significance of regression.d. Calculate and plot the 95% confidence and prediction bands.

14 2.14 Hsuie, Ma, and Tsai (“Separation and Characterizations of Thermotropic Copolyesters of p-Hydroxybenzoic Acid, Sebacic Acid, and Hydroquinone,” Journal of Applied Polymer Science, 56, 471–476, 1995) study the effect of the molar ratio of sebacic acid (the regressor) on the intrinsic viscosity of copolyesters (the response). The following table gives the data.RatioViscosity1.00.450.90.200.80.340.70.580.60.700.50.570.40.550.30.44a. Make a scatterplot of the data.b. Estimate the prediction equation.c. Perform a complete, appropriate analysis (statistical tests, calculation of R2, and so forth).d. Calculate and plot the 95% confidence and prediction bands.

15 2.15 Byers and Williams (“Viscosities of Binary and Ternary Mixtures of Polynomatic Hydrocarbons,” Journal of Chemical and Engineering Data, 32, 349–354, 1987) studied the impact of temperature on the viscosity of toluene–tetralin blends. The following table gives the data for blends with a 0.4 molar fraction of toluene.Temperature (°C)Viscosity (mPa · s)24.91.133035.00.977244.90.853255.10.755065.20.672375.20.602185.20.542095.20.5074a. Estimate the prediction equation.b. Perform a complete analysis of the model.c. Calculate and plot the 95% confidence and prediction bands.

16 2.16 Carroll and Spiegelman (“The Effects of Ignoring Small Measurement Errors in Precision Instrument Calibration,” Journal of Quality Technology, 18, 170–173, 1986) look at the relationship between the pressure in a tank and the volume of liquid. The following table gives the data. Use an appropriate statistical software package to perform an analysis of these data. Comment on the output produced by the software routine.VolumePressureVolume20844599284220844600303022735044303122735043303122735044322124635488322124635487340926515931341026525932360026525932360028426380378863803789859968183789860068173979904868183979904872664167948472684168948777094168948777104358993681564358993881584546103778597454710379

17 2.17 Atkinson (Plots, Transformations, and Regression, Clarendon Press, Oxford, 1985) presents the following data on the boiling point of water (°F) and barometric pressure (inches of mercury). Construct a scatterplot of the data and propose a model that relates boiling point to barometric pressure. Fit the model to the data and perform a complete analysis of the model using the techniques we have discussed in this chapter.Boiling PointBarometric Pressure199.520.79199.320.79197.922.40198.422.67199.423.15199.923.35200.923.89201.123.99201.924.02201.324.01203.625.14204.626.57209.528.49208.627.76210.729.64211.929.88212.230.06

18 2.18 On March 1, 1984, the Wall Street Journal published a survey of television advertisements conducted by Video Board Tests, Inc., a New York ad-testing company that interviewed 4000 adults. These people were regular product users who were asked to cite a commercial they had seen for that product category in the past week. In this case, the response is the number of millions of retained impressions per week. The regressor is the amount of money spent by the firm on advertising. The data follow.FirmAmount Spent (millions)Returned Impressions per week (millions)Miller Lite50.132.1Pepsi74.199.6Stroh’s19.311.7Federal Express22.921.9Burger King82.460.8Coca-Cola40.178.6McDonald’s185.992.4MCI26.950.7Diet Cola20.421.4Ford166.240.1Levi’s2740.8Bud Lite45.610.4ATT Bell154.988.9Calvin Klein512Wendy’s49.729.2Polaroid26.938Shasta5.710Meow Mix7.612.3Oscar Meyer9.223.4Crest32.471.1Kibbles N Bits6.14.4a. Fit the simple linear regression model to these data.b. Is there a significant relationship between the amount a company spends on advertising and retained impressions? Justify your answer statistically.c. Construct the 95% confidence and prediction bands for these data.d. Give the 95% confidence and prediction intervals for the number of retained impressions for MCI.

19 2.19 Table B.17 Contains the Patient Satisfaction data used in Section 2.7.a. Fit a simple linear regression model relating satisfaction to age.b. Compare this model to the fit in Section 2.7 relating patient satisfaction to severity.

20 2.20 Consider the fuel consumption data given in Table B.18. The automotive engineer believes that the initial boiling point of the fuel controls the fuel consumption. Perform a thorough analysis of these data. Do the data support the engineer’s belief?

21 2.21 Consider the wine quality of young red wines data in Table B.19. The winemakers believe that the sulfur content has a negative impact on the taste (thus, the overall quality) of the wine. Perform a thorough analysis of these data. Do the data support the winemakers’ belief?

22 2.22 Consider the methanol oxidation data in Table B.20. The chemist believes that ratio of inlet oxygen to the inlet methanol controls the conversion process. Perform a through analysis of these data. Do the data support the chemist’s belief?

23 2.23 Consider the simple linear regression model y = 50 + 10x + ε where ε is NID (0, 16). Suppose that n = 20 pairs of observations are used to fit this model. Generate 500 samples of 20 observations, drawing one observation for each level of x = 1, 1.5, 2, …, 10 for each sample.a. For each sample compute the least-squares estimates of the slope and intercept. Construct histograms of the sample values of and . Discuss the shape of these histograms.b. For each sample, compute an estimate of E(y|x = 5). Construct a histogram of the estimates you obtained. Discuss the shape of the histogram.c. For each sample, compute a 95% CI on the slope. How many of these intervals contain the true value β1 = 10? Is this what you would expect?d. For each estimate of E(y|x = 5) in part b, compute the 95% CI. How many of these intervals contain the true value of E(y|x = 5) = 100? Is this what you would expect?

24 2.24 Repeat Problem 2.23 using only 10 observations in each sample, drawing one observation from each level x = 1, 2, 3, …, 10. What impact does using n = 10 have on the questions asked in Problem 2.23? Compare the lengths of the CIs and the appearance of the histograms.

25 2.25 Consider the simple linear regression model y = β0 + β1x + ε, with E(ε) = 0, Var(ε) = σ2, and ε uncorrelated.a. Show that .b. Show that .

26 2.26 Consider the simple linear regression model y = β0 + β1x + ε, with E(ε) = 0, Var(ε) = σ2, and ε uncorrelated.a. Show that .b. Show that E(MSRes) = σ2.

27 2.27 Suppose that we have fit the straight-line regression model but the response is affected by a second variable x2 such that the true regression function isa. Is the least-squares estimator of the slope in the original simple linear regression model unbiased?b. Show the bias in .

28 2.28 Consider the maximum-likelihood estimator of σ2 in the simple linear regression model. We know that is a biased estimator for σ2.a. Show the amount of bias in .b. What happens to the bias as the sample size n becomes large?

29 2.29 Suppose that we are fitting a straight line and wish to make the standard error of the slope as small as possible. Suppose that the “region of interest” for x is −1 ≤ x ≤ 1. Where should the observations x1, x2, …, xn be taken? Discuss the practical aspects of this data collection plan.

30 2.30 Consider the data in Problem 2.12 and assume that steam usage and average temperature are jointly normally distributed.a. Find the correlation between steam usage and monthly average ambient temperature.b. Test the hypothesis that ρ = 0.c. Test the hypothesis that ρ = 0.5.d. Find a 99% CI for ρ.

31 2.31 Prove that the maximum value of R2 is less than 1 if the data contain repeated (different) observations on y at the same value of x.

32 2.32 Consider the simple linear regression modelwhere the intercept β0 is known.a. Find the least-squares estimator of β1 for this model. Does this answer seem reasonable?b. What is the variance of the slope for the least-squares estimator found in part a?c. Find a 100(1 − α) percent CI for β1. Is this interval narrower than the estimator for the case where both slope and intercept are unknown?

33 2.33 Consider the least-squares residuals , i = 1, 2, …, n, from the simple linear regression model. Find the variance of the residuals Var(ei). Is the variance of the residuals a constant? Discuss.

34 2.34 Consider the baseball regression model from Section 2.8 and assume that wins and ERA are jointly normally distributed.a. Find the correlation between wins and team ERA.b. Test the hypothesis that ρ = 0.c. Test the hypothesis that ρ = 0.5.d. Find a 95% CI for ρ.

35 2.35 Consider the baseball data in Table B.22. Fit a regression model to team wins using total runs scored as the predictor. How does that model compare to the one developed in Section 2.8 using team ERA as the predictor?

36 2.36 Table B.24 contains data on median family home rental price and other data for 51 US cities. Fit a linear regression model using the median home rental price as the response variable and median price per square foot as the predictor variable.a. Test for significance of regression.b. Find a 95% CI on the slope in this model.c. Does this predictor do an adequate job of explaining the variability in home rental prices?

37 2.37 Consider the rental price data in Table B.24. Assume that median home rental price and median price per square foot are jointly normally distributed.a. Find the correlation between home rental price and home price per square foot.b. Test the hypothesis that ρ = 0.c. Test the hypothesis that ρ = 0.5.d. Find a 95% CI for ρ.

38 2.38 You have fit a linear regression model to a sample of 20 observations. The total sum of squares is 100 and the regression sum of squares is 80. The estimate of the error variance isa. 1.5b. 1.2c. 2.0d. 1.88e. None of the above.

39 2.39 You have fit a simple linear regression model to a sample of 25 observations. The value of the t-statistic for testing that the slope is zero is 2.75. An upper bound on the P-value for this test isa. 0.05b. 0.025c. 0.01d. None of the above.

40 2.40 A linear regression model with an intercept term will always pass through the centroid of the data.a. Trueb. False

41 2.41 The variance of the predicted response in a linear regression model is a minimum at the average value of the predictor variable.a. Trueb. False

42 2.42 The confidence interval on the mean response at a particular value of the predictor variable is always wider than the prediction interval on a new observation at the same point.a. Trueb. False

43 2.43 The method of least squares ensures that the estimators of the slope and intercept in a linear regression model are best linear unbiased estimator.a. Trueb. False

44 2.44 For any simple linear regression model that has an intercept, the sum of the residuals is always zero.a. Trueb. False

Introduction to Linear Regression Analysis

Подняться наверх