Читать книгу Applied Regression Modeling - Iain Pardoe - Страница 28

Problems

Оглавление

“Computer help” refers to the numbered items in the software information files available from the book website. There are brief answers to the even‐numbered problems in Appendix F (www.wiley.com/go/pardoe/AppliedRegressionModeling3e).

1 1.1 Assume that weekly orders of a popular mobile phone at a local store follow a normal distribution with mean and standard deviation . Find the scores, , that correspond to the:95th percentile (i.e., find such that );50th percentile (i.e., find such that );2.5th percentile (i.e., find such that ). Suppose represents potential values of repeated sample means from this population for samples of size . Use the normal version of the central limit theorem to find the mean scores, , that correspond to the:95th percentile (i.e., find such that );50th percentile (i.e., find such that );2.5th percentile (i.e., find such that ).How many phones should the store order to be 95% confident they can meet demand for a particular week?

2 1.2 Assume that final scores in a statistics course follow a normal distribution with mean and standard deviation . Find the scores, , that correspond to the:90th percentile (i.e., find such that );99th percentile (i.e., find such that );5th percentile (i.e., find such that ). Suppose represents potential values of repeated sample means from this population for samples of size (e.g., average class scores). Use the normal version of the central limit theorem to find the mean scores, , that correspond to the:90th percentile (i.e., find such that );99th percentile (i.e., find such that );5th percentile (i.e., find such that ).If the bottom 5% of the class fail, what is the cut‐off percentage to pass the class?The university requires the long‐term average class score for this course to be no higher than 75%. Does this requirement seem feasible?

3 1.3 The NBASALARY data file contains salary information for 214 guards in the National Basketball Association (NBA) for 2009–2010 (obtained from the online USA Today NBA Salaries Database).Construct a histogram of the variable, representing 2009–2010 salaries in thousands of dollars [computer help #14].What would we expect the histogram to look like if the data were normal?Construct a QQ‐plot of the variable [computer help #22].What would we expect the QQ‐plot to look like if the data were normal?Compute the natural logarithm of guard salaries (call this variable ) [computer help #6], and construct a histogram of this variable [computer help #14]. Hint: The “natural logarithm” transformation (also known as “log to base‐e,” or by the symbols or ln) is a way to transform (rescale) skewed data to make them more symmetric and normal.Construct a QQ‐plot of the variable [computer help #22].Based on the plots in parts (a), (c), (e), and (f), say whether salaries or log‐salaries more closely follow a normal curve, and justify your response.

4 1.4 A company's pension plan includes 50 mutual funds, with each fund expected to earn a mean, , of 3% over the risk‐free rate with a standard deviation of %. Based on the assumption that the funds are randomly selected from a population of funds with normally distributed returns in excess of the risk‐free rate, find the probability that an individual fund's return in excess of the risk‐free rate is, respectively, greater than 34.1%, greater than 15.7%, or less than %. In other words, if represents potential values of individual fund returns, find:;;. Use the normal version of the central limit theorem to approximate the probability that the pension plan's overall mean return in excess of the risk‐free rate is, respectively, greater than 7.4%, greater than 4.8%, or less than 0.7%. In other words, if represents potential values of repeated sample means, find:;;.

5 1.5 Consider the data on 2009–2010 salaries of 214 NBA guards from Problem 1.3.Calculate a 95% confidence interval for the population mean in thousands of dollars [computer help #23]. Hint: Calculate by hand (using the fact that the sample mean of is 3980.318, the sample standard deviation is 4525.378, and the 97.5th percentile of the t‐distribution with 213 degrees of freedom is approximately 1.971) and check your answer using statistical software.Consider , the natural logarithms of the salaries. The sample mean of is 7.664386. Re‐express this number in thousands of dollars (the original units of salary).Hint: To back‐transform a number in natural logarithms to its original scale, use the “exponentiation” function on a calculator [denoted exp(X) or , where X is the variable expressed in natural logarithms]. This is because exp((Y)) Y.Compute a 95% confidence interval for the population mean in natural logarithms of thousands of dollars [computer help #23].Hint: Calculate by hand (using the fact that the sample mean of is 7.664386, the sample standard deviation of is 1.197118, and the 97.5th percentile of the t‐distribution with 213 degrees of freedom is approximately 1.971) and check your answer using statistical software.Re‐express each interval endpoint of your 95% confidence interval computed in part (c) in thousands of dollars and say what this interval means in words.The confidence interval computed in part (a) is exactly symmetric about the sample mean of . Is the confidence interval computed in part (d) exactly symmetric about the sample mean of back‐transformed to thousands of dollars that you computed in part (b)? How does this relate to quantifying our uncertainty about the population mean salary?Hint: Looking at the histogram from Problem 3 part (a), if someone asked you to give lower and upper bounds on the population mean salary using your intuition rather than statistics, would you give a symmetric or an asymmetric interval?

6 1.6 The FINALSCORES data file contains values of variable , which measures final scores in a statistics course.Calculate the sample mean and sample standard deviation of [computer help #10].Calculate a 90% confidence interval for the population mean of [computer help #23]. Hint: Calculate by hand (using the sample mean and sample standard deviation from part (a), and the 95th percentile of the t‐distribution with 99 degrees of freedom, which is approximately 1.660) and check your answer using statistical software.

7 1.7 Gapminder is a “non‐profit venture promoting sustainable global development and achievement of the United Nations Millennium Development Goals.” It provides related time series data for all countries in the world at the website www.gapminder.org . For example, the COUNTRIES data file contains the 2010 population count (variable in millions) of the 55 most populous countries together with 2010 life expectancy at birth (variable in years).Calculate the sample mean and sample standard deviation of [computer help #10].Briefly say why calculating a confidence interval for the population mean would not be useful for understanding mean population counts for all countries in the world.Consider the variable , which represents the average number of years a newborn child would live if current mortality patterns were to stay the same. Suppose that for this variable, these 55 countries could be considered a random sample from the population of all countries in the world. Calculate a 95% confidence interval for the population mean of [computer help #23]. Hint: Calculate by hand (using the fact that the sample mean of is 69.787, the sample standard deviation is 9.2504, and the 97.5th percentile of the t‐distribution with 54 degrees of freedom is approximately 2.005) and check your answer using statistical software.

8 1.8 Consider the FINALSCORES data file from Problem 1.6.Do a hypothesis test to determine whether there is sufficient evidence at a significance level of 5% to conclude that the population mean of is greater than 66 [computer help #24].Repeat part (a) but test whether the population mean of is less than 73.Repeat part (a) but test whether the population mean of is not equal to 66.

9 1.9 Consider the COUNTRIES data file from Problem 1.7. A journalist speculates that the population mean of is greater than 68 years. Based on the sample of 55 countries, a smart statistics student thinks that there is insufficient evidence to conclude this. Do a hypothesis test to show who is correct based on a significance level of 5% [computer help #24].Hint: Make sure that you lay out all the steps involved—as in Section 1.6.1—and include a short sentence summarizing your conclusion; that is, who do you think is correct, the journalist or the student?

10 1.10Consider the housing market represented by the sale prices in the HOMES1 data file.As suggested in Section 1.3, calculate the probability of finding an affordable home (less than ) in this housing market. Assume that the population of sale prices () is normal, with mean and standard deviation .As suggested in Section 1.5, calculate a 90% confidence interval for the population mean in this housing market. Recall that the sample mean , the sample standard deviation , and the sample size . Check your answer using statistical software [computer help #23].Practice the mechanics of hypothesis tests by conducting the following tests using a significance level of 5%.: versus : ;: versus : ;: versus : ;: versus : .As suggested in Section 1.7, calculate a 90% prediction interval for an individual sale price in this market.

11 10.11Consider the COUNTRIES data file from Problem 7. Calculate a 95% prediction interval for the variable . Discuss why this interval is so much wider than the confidence interval calculated in Problem 7 part (c).Hint: Calculate by hand (using the fact that the sample mean of is 69.787, the sample standard deviation is 9.2504, and the 97.5th percentile of the t‐distribution with 54 degrees of freedom is approximately 2.005) and check your answer using statistical software (if possible—see the discussion of the“ones trick” in Section 1.7).

12 10.12This problem is adapted from one in Frees (1995). The HOSP data file contains data on charges for patients at a Wisconsin hospital in 1989, as analyzed by Frees (1994). Managers wish to estimate health care costs and to measure how reliable their estimates are. Suppose that a risk manager for a large corporation is trying to understand the cost of one aspect of health care, hospital costs for a small, homogeneous group of claims, the charges (in thousands of dollars) for female patients aged 30–49 who were admitted to the hospital for circulatory disorders.Calculate a 95% confidence interval for the population mean, . Use the following in your calculation: the sample mean, , is 2.9554, the sample standard deviation, , is 1.48104, and the 97.5th percentile of the t‐distribution with 32 degrees of freedom is 2.037. Check your answer using statistical software [computer help #23].Also calculate a 95% prediction interval for an individual claim, . Does this interval seem reasonable given the range of values in the data?Transform the data by taking the reciprocal of the claim values (i.e., ). Calculate a 95% confidence interval for the population mean of the reciprocal‐transformed claims. Use the following sample statistics: the sample mean of is 0.3956 and the sample standard deviation of is 0.12764. Check your answer using statistical software [computer help #23].Back‐transform the endpoints of the interval you just calculated into the original units of (thousands of dollars).Do the same for a 95% prediction interval—that is, calculate the reciprocal‐transformed interval and back‐transform to the original units. Does this interval seem reasonable given the range of values in the data? If so, why did transforming the data help here?

13 10.13The following questions allow you to practice important concepts from Chapter 1 without having to use a computer.In the construction of confidence intervals, will an increase in the sample size lead to a wider or narrower interval (if all other quantities are unchanged)?Suppose that a 95% confidence interval for the population mean, , turns out to be . Give a definition of what it means to be “95% confident” here.A government department is supposed to respond to requests for information within 5 days of receiving the request. Studies show a mean time to respond of 5.28 days and a standard deviation of 0.40 day for a sample of requests. Construct a 90% confidence interval for the mean time to respond. Then do an appropriate hypothesis test at significance level 5% to determine if the mean time to respond exceeds 5 days. (You may find some of the following information useful in answering these questions: The 90th percentile of the t‐distribution with 8 degrees of freedom is 1.397; the 95th percentile of the t‐distribution with 8 degrees of freedom is 1.860.)Students have claimed that the average number of classes missed per student during a quarter is 2. College professors dispute this claim and believe that the average is more than this. They sample students and find that the sample mean is 2.3 and the sample standard deviation is 0.6. State the null and alternative hypotheses that the professors wish to test. Then calculate the test statistic for this test and, using a 5% significance level, determine who appears to be correct, the students or the professors. (You may find some of the following information useful: The 95th percentile of the t‐distribution with 15 degrees of freedom is 1.753; the 97.5th percentile of the t‐distribution with 15 degrees of freedom is 2.131.)Consider the following computer output:Sample size, 150Mean2.94Standard deviation0.50Suppose that we desire a two‐tail test of the null hypothesis that the population mean is equal to 3 versus the alternative hypothesis that the population mean is not equal to 3. Find upper and lower limits for the p‐value for the test. (You may find some of the following information useful: The 90th percentile of the t‐distribution with 149 degrees of freedom is 1.287; the 95th percentile of the t‐distribution with 149 degrees of freedom is 1.655.)A developer would like to see if the average sale price of condominiums in a particular locality has changed in the last 12 months. A study conducted 12 months ago indicated that the average sale price of condominiums in this locality was $. Data on recent sales were as follows:Sample size, 28MeanStandard deviationWrite down the null and alternative hypotheses for this problem. Then specify the rejection region for conducting a two‐tail test at significance level 5%. Based on the computer output, would you reject or fail to reject the null hypothesis for this test? (You may find some of the following information useful: The 95th percentile of the t‐distribution with 27 degrees of freedom is 1.703; the 97.5th percentile of the t‐distribution with 27 degrees of freedom is 2.052.)In a hypothesis test, is it true that the smaller the p‐value, the less likely you are to reject the null hypothesis? Explain.

Applied Regression Modeling

Подняться наверх