Читать книгу Industrial Data Analytics for Diagnosis and Prognosis - Yong Chen - Страница 23
Relationship Between A Numerical Variable and A Categorical Variable – Side-by-Side Box Plot
ОглавлениеSide-by-side box plots can be used to show how the distribution of a numerical variable changes over different values of a categorical variable. The idea is to use a box plot to represent the distribution of the numerical variable at each value of the categorical variable. In Figure 2.5, we draw two side-by-side box plots for the auto_spec
data set using the following R
codes:
Figure 2.5 Side-by-side box plots.
oldpar <- par(mfrow = c(1, 2)) boxplot(auto.spec.df$compression.ratio ~ auto.spec.df$ fuel.type, xlab = "Fuel Type", ylab = "Compression Ratio") boxplot(auto.spec.df$highway.mpg ~ auto.spec.df$body. style, las = 2, xlab = "", ylab = "Highway MPG") mtext("Body Style", side = 3, line = 1) par(oldpar)
The left panel of Figure 2.5 shows how the numerical variable compression.ratio
is related to the two values (diesel
and gas
) of fuel.type
. It is clear from the side-by-side box plot that a car with diesel fuel has a much higher compression ratio than a car with gas fuel. This also explains the separate cluster of outliers in the histogram and box plot of compression.ratio
that is observed in Figure 2.3. The right panel of Figure 2.5 shows how highway.mpg
is related to the five values of body.style
. It can be seen that a hatchback car is more likely to have higher highway MPG while a convertible tends to have lower highway MPG.