Читать книгу Industrial Data Analytics for Diagnosis and Prognosis - Yong Chen - Страница 35
Exercises
Оглавление1 Consider the data in the following table with two numerical variables x1 and x2 and two categorical variables x3 and x4.
x1 | x2 | x3 | x4 |
9 | 1 | Yes | On |
5 | 3 | No | Off |
1 | 2 | Yes | Off |
3 | 4 | Yes | On |
6 | −1 | No | On |
3 | 3 | Yes | On |
1 Manually sketch the scatter plot for x1 and x2.Manually sketch the mosaic plot for x3 and x4.
1 Consider the data set in Exercise 1. Manually calculate the sample mean vector, the sample covariance matrix, and the sample correlation matrix of x = (x1 x2)T.
2 Consider the data in the following table with two numerical variables x1 and x2 and two categorical variables x3 and x4.
x1 | x2 | x3 | x4 |
1 | 0 | Yes | Working |
4 | 6 | No | Fail |
2 | 2 | Yes | Fail |
0 | 3 | No | Fail |
3 | 4 | No | Working |
5 | 7 | Yes | Working |
1 Manually sketch the scatter plot for x1 and x2.Manually sketch the mosaic plot for x3 and x4.
1 Consider the data set in Exercise 3. Manually calculate the sample mean vector, the sample covariance matrix, and the sample correlation matrix of x = (x1 x2)T.
2 Consider the auto_spec data set in the file auto_spec.csv. Use R to draw appropriate plots to display the following information and comment on any patterns that can be found from the plots.Distribution of the variables fuel.type and aspiration.Distribution of each of the following three variables: width, height, and highway.mpg. Use two types of plots for each variable.How does the horsepower affect the city.mpg?The relationship between horsepower and body.style.The relationship between body.style and fuel.type.
3 For the auto_spec data, use R to create a new variable named cat.mpg, which is equal to “high” if highway.mpg is at least 30, and “low” otherwise.Using R, create a scatter plot of horsepower versus curb.weight, color-coded by the variable cat.mpg. Format the plot with appropriate labels and legend.Use R to find the sample mean vector, the sample covariance matrix, and the sample correlation matrix of highway.mpg and city.mpg.Assume that 75% of the mileage of a car is on a highway and 25% is on local roads, using the results from part (b), manually calculate the sample mean and sample variance of the overall average MPG of the cars in this data set.Use R to calculate the overall average MPG of each car in the data set based on the assumption in part (c). Then use R to find the sample mean and sample variance of the overall average MPG. Compare with the results in part (c).
4 Hot rolling is among the key steel-making processes that convert cast or semi-finished steel into finished products. A typical hot rolling process usually includes a melting division and a rolling division. The melting division is a continuous casting process that melts scrapped metals and solidifies the molten steel into semi-finished steel billet; the rolling division will further squeeze the steel billet by a sequence of stands. Each stand is composed of several rolls. The final long thin steel billet is coiled for transportation convenience and thus is often called a coil. Due to the recent development of computer and sensor technology, the whole hot rolling process is highly automated and monitored by a large number of sensors. Various types of sensors (optical sensor, temperature sensor, force sensor, etc.) are installed in the hot rolling process. The last rolling stands are equipped with some infrared sensors. These sensors take photos of the steel billets, and then the photos are processed to see if any defects are produced. We focus on two types of defect: checkings and seams.The file hotrolling_defects.csv contains the numbers of checkings and seams of 754 billets. Use R to generate two new variables corresponding to whether a billet has at least one checking defect and whether it has at least one seams defect, respectively. Use appropriate plots to visualize the distribution of each of these two new variables and the relationship between them.The file stand_5_side_temp.csv contains side temperature measurements when a steel billet is passing stand 5 of the rolling division. The side temperature is measured at 79 evenly spaced locations along the stand. Use R to draw a scatter plot matrix for the side temperature measurements at the first five locations of stand 5. Comment on noticeable patterns in relationship among the first five temperature variables.Use R to find the sample mean vector, the sample covariance matrix, and sample correlation matrix of the side temperature measurements at the first five locations of stand 5.Use R to draw a heatmap for the correlation of the side temperature measurements at the first 20 locations of stand 5. Which locations have the highest correlation in side temperature measurements?