Читать книгу End-to-end Data Analytics for Product Development - Chris Jones - Страница 24

Stat Tool 1.11 Boxplots

Оглавление

So far, we have looked at three different aspects of numerical data analysis: shape of the data, central and non‐central tendency, and variability.

Boxplots can be used to assess and compare these three aspects of quantitative data distributions, and to look for outliers.

Like histograms, boxplots work best with moderate to large sample sizes (at least 20 values).

Let's look at how a boxplot is constructed. It can be displayed horizontally or vertically:

1 Start by drawing a horizontal or vertical axis in the units of the data values.

2 Draw a box to encompass 50% of middle data values. The left edge of the box is the first quartile Q1. The right edge of the box is the third quartile Q3. The width of the box is the interquartile range, IQR. Draw a line inside the box to denote the median.

3 Draw lines, called whiskers, on the left (to the minimum) and on the right (to the maximum) of the box to show the spread of the remaining data (25% of data points are below Q1 and 25% are above Q3). Several statistical softwares do not allow the whiskers to extend beyond one and a half times the interquartile range (1.5 × IQR). Any points outside of this range are outliers and are displayed individually by asterisks.


Boxplots help to summarize:

1 Central tendency. Look at the value of the median.

2 Non‐central tendency. Look at the values of the first quartile Q1 and the third quartile Q3.

3 Variability. Look at the length of the boxplot (range) and the width of the box (IQR).

4 Shape of data. Look at the position of the line of the median in the box and the position of the box between the two whiskers. In a symmetric distribution, the median is in the middle of the box and the two whiskers have the same length. In a skewed distribution, the median is closer to Q1 (skewed to the right) or to Q3 (skewed to the left) and the two whiskers do not have the same length (Figure 1.10).

Figure 1.10 Histograms and boxplots.

End-to-end Data Analytics for Product Development

Подняться наверх