Читать книгу Search Analytics for Your Site - Louis Rosenfeld - Страница 27

George Kingsley Zipf, Harvard Linguist and Hockey Star

Оглавление

Of course, we’ve just been looking at a tiny slice of a search log. And as interesting as it is, the true power of SSA comes from collectively analyzing the thousands or millions of such interactions that take place on your site during a given period of time. That’s when the patterns emerge, when trends take shape, and when there’s enough activity to merit measuring—and drawing interesting conclusions.

Nowhere is the value of statistical analysis more apparent than when viewing the Zipf Distribution, named for Harvard linguist George Kingsley Zipf, who, as you’d expect from a linguist, liked to count words.[4] He found that a few terms were used quite often, while many were hardly used at all. We find the same thing when tallying up queries from most to least frequent, as in Figure 2-4.

The Zipf distribution—which emerges when tallying just about any site’s search data—shows that the few most common queries account for a surprisingly large portion of all search activity during any given period. (Remember in Chapter 1, how John Ferrara focused exclusively on those common queries.) You can see how tall and narrow what we’ll call the “short head” is, and how quickly it drops down to the “long tail” of esoteric queries (technically, described as “twosies” and “onesies”). In fact, we’re only showing the first 500 or so queries here; in reality, this site’s long tail would extend into the tens of thousands, many meters to the right of where you sit.

http://www.flickr.com/photos/rosenfeldmedia/5690405271/

Figure 2-4. The hockey-stick-shaped Zipf Distribution shows that a few queries are very popular, while most are not. This example is from Michigan State University, but this distribution is true of just about every Web site and intranet.

It’s equally enlightening to examine the same phenomenon when presented textually, as shown in Table 2-1

The most common query, campus map, accounts for 1.4% of all the search activity during this time period. That number, 1.4%, doesn’t sound like much, but those top queries add up very quickly—the top 14 most common queries account for 10% of all search activity. (Note to MSU.edu webmaster: better make sure that relevant results come up when users search campus map!)

Table 2-1. http://www.flickr.com/photos/rosenfeldmedia/5825543717/

The ZIPF Distribution Shown Textually
Rank Cumulative % Count Query Terms
1 1.40% 7,218 campus map
14 10.53% 2,464 housing
42 20.18% 1,35I webenroll
98 30,01% 650 computer center
221 40.05% 295 msu union
500 50.02% 124 hotels
7,877 80.00% 7 department of surgery
Note how few queries are required to account for 10% of all search activity. (This data is also from Michigan State University.)

That’s incredible—it means that if you invested the small amount of effort needed to ensure that the top 14 queries performed well, you’d improve the search experience for 10% of all users. And if, say, half of your site’s users were search dominant,[5] then you’ve just improved the overall user experience by 5% (10% × 50%). Numbers like this can and should be challenged, and 5% may not sound like much. But 5% here, 3% there... these quickly add up.

It bears noting that we just started with a simple report—presented both visually and as a table—and quickly drew some useful conclusions based on the data presented. That there, folks, is analysis. And that’s why reports are only means, not goals.

And equally important, this analysis scales beautifully. Have the time and resources to go beyond the top 14 queries? No problem—tuning the top 42 queries will get you to the 20% mark. About a 100 gets you to 30%, and so on.

[4] You may not have heard of Zipf, but you’ve probably heard of the 80/20 Rule, the Pareto Principle, or Power Laws. All relate to the hockey-stick curve’s dramatic dropoff from “short head” to long tail.

[5] Usability expert Jakob Nielsen suggests that this is the case; see www.useit.com/alertbox/9707b.html

Search Analytics for Your Site

Подняться наверх