Читать книгу Statistics and Probability with Applications for Engineers and Scientists Using MINITAB, R and JMP - Bhisham C. Gupta, Irwin Guttman - Страница 79

Variance

Оглавление

One of the most interesting pieces of information associated with any data is how the values in the data set vary from one another. Of course, the range can give us some idea of variability. Unfortunately, the range does not help us understand centrality. To better understand variability, we rely on more powerful indicators such as the variance, which is a value that focuses on how far the observations within a data set deviate from their mean.

For example, if the values in a data set are , and the sample average is , then are the deviations from the sample average. It is then natural to find the sum of these deviations and to argue that if this sum is large, the values differ too much from each other, but if this sum is small, they do not differ from each other too much. Unfortunately, this argument does not hold, since, as is easily proved, the sum of these deviations is always zero, no matter how much the values in the data set differ. This is true because some of the deviations are positive and some are negative. To avoid the fact that this summation is zero, we can square these deviations and then take their sum. The variance is then the average value of the sum of the squared deviations from . If the data set represents a population, then the deviations are taken from the population mean . Thus, the population variance, denoted by (read as sigma squared), is defined as

(2.5.6)

Further the sample variance, denoted by , is defined as

(2.5.7)

For computational purposes, we give below the simplified forms for the population variance and the sample variances.

(2.5.8)

(2.5.9)

Note that one difficulty in using the variance as the measure of dispersion is that the units for measuring the variance are not the same as those for data values. Rather, variance is expressed as a square of the units used for the data values. For example, if the data values are dollar amounts, then the variance will be expressed in squared dollars. Therefore, for application purposes, we define another measure of dispersion, called the standard deviation, that is directly related to the variance. We note that the standard deviation is measured in the same units as used for the data values (see (2.5.10) and (2.5.11) given below).

Statistics and Probability with Applications for Engineers and Scientists Using MINITAB, R and JMP

Подняться наверх