Читать книгу Becoming a Data Head - Alex J. Gutman - Страница 55

BASIC SUMMARY STATISTICS

Оглавление

Data does not always look like a dataset or spreadsheet. It's often in the form of summary statistics. Summary statistics enable us to understand information about a set of data.

The three most common summary statistics are mean, median, and mode, and you're probably quite familiar with them. However, we wanted to spend a few minutes discussing these statistics because we frequently see the colloquial terms “normal,” “usual,” “typical,” or “average” used as synonyms for each of the terms. To avoid confusion, let's be clear on what each term means:

 The mean is the sum of all the numbers you have divided by the count of all the numbers. The effect of this operation is to give you a sense of what each observation in your series contributes to the entire sum if every observation generated the same amount. The mean is also called the average.

 The median is the midpoint of the entire data range if you sorted it in order.

 The mode is the most common number in the dataset.

Mean, median, and mode are called measures of location or measures of central tendency. Measures of variation—variance, range, and standard deviation—are measures of spread. The location number tells you where on the number line a typical value falls and spread tells you how spread out the other numbers are from that value.

As a trivial example, the numbers 7, 5, 4, 8, 4, 2, 9, 4, and 100 have mean 15.89, median 5, and mode 4. Notice the mean (average), 15.89, is a number that doesn't appear in the data. This happens a lot: the average number of people in a household in the United States in 2018 was 2.63; basketball star LeBron James scores an average of 27.1 points per game.

It's a common mistake for people to use the average (mean) to represent the midpoint of the data, which is the median. They assume half the numbers must be above average, and half below. This isn't true. In fact, it's common for most of the data to be below (or above) the average. For example, the vast majority of people have greater than the average number of fingers (likely 9.something).

To avoid confusion and misconceptions, we recommend sticking with mean or average, median, and mode for full transparency. Try not to use words like usual, typical, or normal.

Becoming a Data Head

Подняться наверх