Читать книгу Biostatistics Decoded - A. Gouveia Oliveira - Страница 14
1.6 Measures of Location and Dispersion
ОглавлениеAs with the central tendency measures, there are a number of available measures of dispersion, each one having some useful properties and some shortcomings. One possible way of expressing the degree of heterogeneity of the values of an attribute could be to write down the limits, that is, the minimum and maximum values (Figure 1.11). The limits are actually measures of location. These measures indicate the values on defined positions in an ordered set of values. One good thing about this approach is that it is easy to interpret. If the two values are similar then the dispersion of the values is small, otherwise it is large. There are some problems with the limits as measures of variation, though. First, we will have to deal with two quantities, which is not practical. Second, the limits are rather unstable, in the sense that if one adds a dozen observations to a study, most likely it will be necessary to redefine the limits. This is because, as one adds more observations, the more extreme values of an attribute will have a greater chance of appearing.
The first problem can be solved by using the difference between the maximum and minimum values, a quantity commonly called the range, but this will not solve the problem of instability.
The second problem can be minimized if, instead of using the minimum and maximum to describe the dispersion of values, we use the other measures of location, the lower and upper quartiles. The lower quartile (also called the 25th percentile) is the value below which one‐quarter, or 25%, of all the values in the dataset lie. The upper quartile (or 75th percentile) is the value below which three‐quarters, or 75%, of all the values in the dataset lie (note, incidentally, that the median is the same as the 50th percentile). The advantage of the quartiles over the limits is that they are more stable because the addition of one or two extreme values to the dataset will probably not change the quartiles.
Figure 1.11 Measures of dispersion derived from measures of location.
However, we still have the problem of having to deal with two values, which is certainly not as practical and easy to remember, and to reason with, as if we had just one value. One way around this could be to use the difference between the upper quartile and the lower quartile to describe the dispersion. This is called the interquartile range, but the interpretation of this value is not straightforward: it is not amenable to mathematical treatment and therefore it is not a very popular measure, except perhaps in epidemiology.
So what are we looking for in a measure of dispersion? The ideal measure should have the following properties: unicity, that is, it should be a single value; stability, that is, it should not change much if more observations are added; interpretability, that is, its value should meaningful and easy to understand.