Читать книгу Probability with R - Jane M. Horgan - Страница 42

Example 2.1 Apps Usage

Examining the nine apps with greatest usage on your smartphone, you may find the usage statistics (in MB) are

App	Usage (MB)
Facebook	39.72
Chrome	35.37
WhatsApp	5.73
Google	5.60
System Account	3.30
Instagram	3.22
Gmail	2.52
Messenger	1.71
Maps	1.55

To enter the data, write

usage <- c(39.72, 35.27, 5.73, 5.6, 3.3, 3.22, 2.52, 1.71, 1.55)

The mean is

mean(usage) [1] 10.95778

while the median is

median(usage) [1] 3.3

Unlike the previous examples, where the mean and median were similar, here the mean is more than three times the median. Looking at the data again, you will notice that the usage of the first two apps, Facebook and Chrome, is much larger than the usages of the other apps in the data set. These values are the cause of the mean being so high. Such values are often designated as outliers and are analyzed separately. Omitting them and calculating the mean and median once more, we get

mean(usage[3:9]) [1] 3.375714 median(usage[3:9]) [1] 3.22

Now, we see that there is not much difference between the mean and median.

When there are extremely high values in the data, using the mean as a measure of central tendency gives the wrong impression. A classic example of this is wage statistics where there may be a few instances of very high salaries, which will grossly inflate the average, giving the impression that salaries are higher than they actually are.

Подняться наверх