Читать книгу Probability with R - Jane M. Horgan - Страница 42
Example 2.1 Apps Usage
ОглавлениеExamining the nine apps with greatest usage on your smartphone, you may find the usage statistics (in MB) are
App | Usage (MB) |
39.72 | |
Chrome | 35.37 |
5.73 | |
5.60 | |
System Account | 3.30 |
3.22 | |
Gmail | 2.52 |
Messenger | 1.71 |
Maps | 1.55 |
To enter the data, write
usage <- c(39.72, 35.27, 5.73, 5.6, 3.3, 3.22, 2.52, 1.71, 1.55)
The mean is
mean(usage) [1] 10.95778
while the median is
median(usage) [1] 3.3
Unlike the previous examples, where the mean and median were similar, here the mean is more than three times the median. Looking at the data again, you will notice that the usage of the first two apps, Facebook and Chrome, is much larger than the usages of the other apps in the data set. These values are the cause of the mean being so high. Such values are often designated as outliers and are analyzed separately. Omitting them and calculating the mean and median once more, we get
mean(usage[3:9]) [1] 3.375714 median(usage[3:9]) [1] 3.22
Now, we see that there is not much difference between the mean and median.
When there are extremely high values in the data, using the mean as a measure of central tendency gives the wrong impression. A classic example of this is wage statistics where there may be a few instances of very high salaries, which will grossly inflate the average, giving the impression that salaries are higher than they actually are.