Читать книгу End-to-end Data Analytics for Product Development - Chris Jones - Страница 22

Stat Tool 1.9 Measures of Variability: Variance and Standard Deviation

Оглавление

For numeric data, spread can also be measured by the variance. It accounts for all the data by measuring the distance or difference between each value and the mean. These differences are called deviations. The variance is the sum of squared deviations, divided by the number of values minus one. Roughly speaking, the variance (usually denoted by S2) is the average of the squared deviations from the mean.


1 Example 1.2. Suppose you observed the following numeric data with their dotplot (Figure 1.9):

8.1 8.2 7.6 9.0 7.5 6.9 8.1 9.0 8.3 8.1 8.2 7.6

Figure 1.9 Dotplot.

The mean is equal to 8.05. Let's calculate the deviations from the mean and their squares:

Deviations Squared deviations
8.1 − 8.05 = 0.05 (0.05)2 = 0.0025
8.2 − 8.05 = 0.15 (0.15)2 = 0.0225
7.6 − 8.05 = −0.45 (−0.45)2 = 0.2025
9.0 − 8.05 = 0.95 (0.95)2 = 0.9025
7.5 − 8.05 = −0.55 (−0.55)2 = 0.3025
6.9 − 8.05 = −1.15 (−1.15)2 = 1.3225
8.1 − 8.05 = 0.05 (0.05)2 = 0.0025
9.0 − 8.05 = 0.95 (0.95)2 = 0.9025
8.3 − 8.05 = 0.25 (0.25)2 = 0.0625
8.1 − 8.05 = 0.05 (0.05)2 = 0.0025
8.2 − 8.05 = 0.15 (0.15)2 = 0.0225
7.6 − 8.05 = −0.45 (−0.45)2 = 0.2025

Now, by adding up the squared deviations to get the sum and dividing it by the number of values minus one, you obtain the variance:


The variance measures how spread out the data are around their mean. The greater the variance, the greater the spread in the data.

The variance is not in the same units as the data, but in squared units. If the data are in grams, the variance is expressed in squared grams, and so on. Thus, for descriptive purposes, its square root, called standard deviation, is used instead.

The standard deviation (usually denoted by S) quantifies variability in the same units of measurement as we measure our data.

Considering the previous example, the standard deviation is:


The greater the standard deviation, the greater the spread of data values around the mean.

Considering the mean and the standard deviation together and computing the range: mean ± S, we can say that data values vary on average from (mean − S) to (mean + S).

From the previous example the average range is:


The observed data vary on average from 7.5 to 8.6.

End-to-end Data Analytics for Product Development

Подняться наверх