Читать книгу Industrial Data Analytics for Diagnosis and Prognosis - Yong Chen - Страница 26

Color Coded Scatter Plot

We have seen that a scatter plot can effectively show the relationship between two numerical variables. By adding color coding to the points on a scatter plot of two numerical variables, we are able to study their relationship with a third variable. Typically, the third variable is a categorical variable, with each category represented by a different color. The color coded scatter plot is very useful in visualizing how some numerical variables can be used to predict a categorical variable. For the auto_spec data, we can use a color coded scatter plot to show how fuel.type is related to two of the numerical variables horsepower and peak.rpm. The color coded scatter plot is shown in Figure 2.7, which is created by the following R codes.

Figure 2.7 Scatter plot color coded by fuel type.

oldpar <- par(xpd = TRUE) plot(auto.spec.df$peak.rpm ~ auto.spec.df$horsepower,

xlab = "Horsepower", ylab = "Peak RPM",

col = ifelse(auto.spec.df$fuel.type == "gas",

"black", "gray")) legend("topleft", inset = c(0, -0.2),

legend = c("gas", "diesel"),

col = c(“black”, "gray"), pch = 1, cex = 0.8) par(oldpar)

Although there is no clear relationship between the peak RPM and horsepower of a car from the scatter plot in Figure 2.7, it is obvious from the color coded plot that diesel cars tend to have low peak RPM and low horsepower.

Industrial Data Analytics for Diagnosis and Prognosis

Подняться наверх