Читать книгу Becoming a Data Head - Alex J. Gutman - Страница 24

Classifying Restaurants

Imagine you're on a walk and pass by an empty store front with the sign “New Restaurant: Coming Soon.” You're tired of eating at national chains and are always on the lookout for new, locally owned restaurants, so you can't help but wonder, “Will this be a new local restaurant?”

Let's pose this question more formally: Do you predict the new restaurant will be a chain restaurant or an independent restaurant?

Take a guess. (Seriously, take a guess before moving on.)

If this scenario happened in real life, you'd have a pretty good hunch in a split second. If you're in a trendy neighborhood, surrounded by local pubs and eateries, you'd guess independent. If you're next to an interstate highway and near a shopping mall, you'd guess chain.

But when we asked the question, you hesitated. They didn't give me enough information, you thought. And you were right. We didn't give you any data to make a decision.

Lesson learned: Informed decisions require data.

Now look at the data in the first image on the next page. The new restaurant is marked with an X, the Cs indicate chain restaurants, and the Is indicate independent, local eateries. What would you guess this time?

Most people guess (I) because most of the surrounding restaurants are also (I). But notice not all restaurants in the neighborhood are independent. If we asked you to rate your confidence⁴ in your prediction between 0 and 100, we'd expect it to be high but not 100. It's certainly possible another chain restaurant is coming to the neighborhood.

Lesson learned: Predictions should never be 100% confident.

Over the Rhine neighborhood, Cincinnati, Ohio

Next, look at the data in the following image. This area includes a large shopping mall, and most restaurants in the area are chains. When asked to predict chain or independent, the majority choose (C). But we love when someone chooses (I) because it highlights several important lessons.

Kenwood Towne Centre, Cincinnati, Ohio

During this thought experiment, everyone creates a slightly different algorithm in their head. Of course, everyone looks at the markers surrounding the point of interest, X, to understand the neighborhood, but at some point, you must decide when a restaurant is too far away to influence your prediction. At one extreme (and we see it happen), someone looks at the restaurant's single closest neighbor, in this case an independent restaurant, and bases their prediction on it: “The nearest neighbor to X is an (I), so my prediction is (I).”

Most people, however, look at several neighboring restaurants. The second image shows a circle surrounding the new restaurant containing its seven nearest neighbors. You probably chose a different number, but we chose 7, and 6 out of the 7 are (C) chains, so we'd predict (C).

Подняться наверх