Читать книгу End-to-end Data Analytics for Product Development - Chris Jones - Страница 21
Stat Tool 1.8 Measures of Variability: Range and Interquartile Range
ОглавлениеVariability refers to how spread out a set of datavalues is.
Consider the following graphs (see Figure 1.8):
The two data distributions are quite different in terms of variability: the graph on the left shows more densely packed values (less variability), while the graph on the right reveals more spread out data (higher variability).
The terms variability, spread, variation, and dispersion are synonyms, and refer to how spread out a distribution is.
Figure 1.8 Frequency distributions and variability.
How can the spread of a set of numeric values be quantified?
The range, commonly represented as R, is a simple way to describe the spread of data values. It is the difference between the maximum value and the minimum value in a data set. The range can also be represented as the interval: (minimum value; maximum value).
A large range value (or a wide interval) indicates greater dispersion in the data. A small range value (or a narrow interval) indicates that there is less dispersion in the data.
Note that the range only uses two data values. For this reason, it is most useful in representing dispersion when data doesn't include outliers.
A second measure of variation is the interquartile range, commonly represented as IQR. It is the difference between the third quartile Q3 and the first quartile Q1 in a data set. IQR can also be represented as the interval: (Q1; Q3). Fifty percent of the data are within this range: as the spread of these data increases, the IQR becomes larger.
The IQR is not affected by the presence of outliers.