Читать книгу Industrial Data Analytics for Diagnosis and Prognosis - Yong Chen - Страница 26
Color Coded Scatter Plot
ОглавлениеWe have seen that a scatter plot can effectively show the relationship between two numerical variables. By adding color coding to the points on a scatter plot of two numerical variables, we are able to study their relationship with a third variable. Typically, the third variable is a categorical variable, with each category represented by a different color. The color coded scatter plot is very useful in visualizing how some numerical variables can be used to predict a categorical variable. For the auto_spec
data, we can use a color coded scatter plot to show how fuel.type
is related to two of the numerical variables horsepower
and peak.rpm
. The color coded scatter plot is shown in Figure 2.7, which is created by the following R
codes.
Figure 2.7 Scatter plot color coded by fuel type.
oldpar <- par(xpd = TRUE) plot(auto.spec.df$peak.rpm ~ auto.spec.df$horsepower,
xlab = "Horsepower", ylab = "Peak RPM",
col = ifelse(auto.spec.df$fuel.type == "gas",
"black", "gray")) legend("topleft", inset = c(0, -0.2),
legend = c("gas", "diesel"),
col = c(“black”, "gray"), pch = 1, cex = 0.8) par(oldpar)
Although there is no clear relationship between the peak RPM and horsepower of a car from the scatter plot in Figure 2.7, it is obvious from the color coded plot that diesel cars tend to have low peak RPM and low horsepower.