Читать книгу Algorithms to Live By: The Computer Science of Human Decisions - Brian Christian, Tom Griffiths - Страница 9
ОглавлениеNowe if the word, which thou art desirous to finde, begin with (a) then looke in the beginning of this Table, but if with (v) looke towards the end. Againe, if thy word beginne with (ca) looke in the beginning of the letter (c) but if with (cu) then looke toward the end of that letter. And so of all the rest. &c.
—ROBERT CAWDREY, A TABLE ALPHABETICALL (1604)
Before Danny Hillis founded the Thinking Machines corporation, before he invented the famous Connection Machine parallel supercomputer, he was an MIT undergraduate, living in the student dormitory, and horrified by his roommate’s socks.
What horrified Hillis, unlike many a college undergraduate, wasn’t his roommate’s hygiene. It wasn’t that the roommate didn’t wash the socks; he did. The problem was what came next.
The roommate pulled a sock out of the clean laundry hamper. Next he pulled another sock out at random. If it didn’t match the first one, he tossed it back in. Then he continued this process, pulling out socks one by one and tossing them back until he found a match for the first.
With just 10 different pairs of socks, following this method will take on average 19 pulls merely to complete the first pair, and 17 more pulls to complete the second. In total, the roommate can expect to go fishing in the hamper 110 times just to pair 20 socks.
It was enough to make any budding computer scientist request a room transfer.
Now, just how socks should be sorted is a good way get computer scientists talking at surprising length. A question about socks posted to the programming website Stack Overflow in 2013 prompted some twelve thousand words of debate.
“Socks confound me!” confessed legendary cryptographer and Turing Award–winning computer scientist Ron Rivest to the two of us when we brought up the topic.
He was wearing sandals at the time.
The Ecstasy of Sorting
Sorting is at the very heart of what computers do. In fact, in many ways it was sorting that brought the computer into being.
In the late nineteenth century, the American population was growing by 30% every decade, and the number of “subjects of inquiry” in the US Census had gone from just five in 1870 to more than two hundred in 1880. The tabulation of the 1880 census took eight years—just barely finishing by the time the 1890 census began. As a writer at the time put it, it was a wonder “the clerks who toiled at the irritating slips of tally paper … did not go blind and crazy.” The whole enterprise was threatening to collapse under its own weight. Something had to be done.
Inspired by the punched railway tickets of the time, an inventor by the name of Herman Hollerith devised a system of punched manila cards to store information, and a machine, which he called the Hollerith Machine, to count and sort them. Hollerith was awarded a patent in 1889, and the government adopted the Hollerith Machine for the 1890 census. No one had ever seen anything like it. Wrote one awestruck observer, “The apparatus works as unerringly as the mills of the Gods, but beats them hollow as to speed.” Another, however, reasoned that the invention was of limited use: “As no one will ever use it but governments, the inventor will not likely get very rich.” This prediction, which Hollerith clipped and saved, would not prove entirely correct. Hollerith’s firm merged with several others in 1911 to become the Computing-Tabulating-Recording Company. A few years later it was renamed—to International Business Machines, or IBM.
Sorting continued to spur the development of the computer through the next century. The first code ever written for a “stored program” computer was a program for efficient sorting. In fact, it was the computer’s ability to outsort IBM’s dedicated card-sorting machines that convinced the US government their enormous financial investment in a general-purpose machine was justified. By the 1960s, one study estimated that more than a quarter of the computing resources of the world were being spent on sorting. And no wonder—sorting is essential to working with almost any kind of information. Whether it’s finding the largest or the smallest, the most common or the rarest, tallying, indexing, flagging duplicates, or just plain looking for the thing you want, they all generally begin under the hood with a sort.
But sorting is more pervasive, even, than this. After all, one of the main reasons things get sorted is to be shown in useful form to human eyes, which means that sorting is also key to the human experience of information. Sorted lists are so ubiquitous that—like the fish who asks, “What is water?”—we must consciously work to perceive them at all. And then we perceive them everywhere.
Our email inbox typically displays the top fifty messages of potentially thousands, sorted by time of receipt. When we look for restaurants on Yelp we’re shown the top dozen or so of hundreds, sorted by proximity or by rating. A blog shows a cropped list of articles, sorted by date. The Facebook news feed, Twitter stream, and Reddit homepage all present themselves as lists, sorted by some proprietary measure. We refer to things like Google and Bing as “search engines,” but that is something of a misnomer: they’re really sort engines. What makes Google so dominant as a means of accessing the world’s information is less that it finds our text within hundreds of millions of webpages—its 1990s competitors could generally do that part well enough—but that it sorts those webpages so well, and only shows us the most relevant ten.
The truncated top of an immense, sorted list is in many ways the universal user interface.
Computer science gives us a way to understand what’s going on behind the scenes in all of these cases, which in turn can offer us some insight for those times when we are the one stuck making order—with our bills, our papers, our books, our socks, probably more times each day than we realize. By quantifying the vice (and the virtue) of mess, it also shows us the cases where we actually shouldn’t make order at all.
What’s more, when we begin looking, we see that sorting isn’t just something we do with information. It’s something we do with people. Perhaps the place where the computer science of establishing rank is most unexpectedly useful is on the sporting field and in the boxing ring—which is why knowing a little about sorting might help explain how human beings are able to live together while only occasionally coming to blows. That is to say, sorting offers some surprising clues about the nature of society—that other, larger, and more important kind of order that we make.
The Agony of Sorting
“To lower costs per unit of output, people usually increase the size of their operations,” wrote J. C. Hosken in 1955, in the first scientific article published on sorting. This is the economy of scale familiar to any business student. But with sorting, size is a recipe for disaster: perversely, as a sort grows larger, “the unit cost of sorting, instead of falling, rises.” Sorting involves steep diseconomies of scale, violating our normal intuitions about the virtues of doing things in bulk. Cooking for two is typically no harder than cooking for one, and it’s certainly easier than cooking for one person twice. But sorting, say, a shelf of a hundred books will take you longer than sorting two bookshelves of fifty apiece: you have twice as many things to organize, and there are twice as many places each of them could go. The more you take on, the worse it gets.
This is the first and most fundamental insight of sorting theory. Scale hurts.
From this we might infer that minimizing our pain and suffering when it comes to sorting is all about minimizing the number of things we have to sort. It’s true: one of the best preventives against the computational difficulty of sock sorting is just doing your laundry more often. Doing laundry three times as frequently, say, could reduce your sorting overhead by a factor of nine. Indeed, if Hillis’s roommate stuck with his peculiar procedure but went thirteen days between washes instead of fourteen, that alone would save him twenty-eight pulls from the hamper. (And going just a single day longer between washes would cost him thirty pulls more.)
Even at such a modest, fortnightly scope we can see the scale of sorting beginning to grow untenable. Computers, though, must routinely sort millions of items in a single go. For that, as the line from Jaws puts it, we’re going to need a bigger boat—and a better algorithm.
But to answer the question of just how we ought to be sorting, and which methods come out on top, we need to figure out something else first: how we’re going to keep score.
Big-O: A Yardstick for the Worst Case
The Guinness Book of World Records attributes the record for sorting a deck of cards to the Czech magician Zdeněk Bradáč. On May 15, 2008, Bradáč sorted a 52-card deck in just 36.16 seconds.* How did he do it? What sorting technique delivered him the title? Though the answer would shed interesting light on sorting theory, Bradáč declined to comment.
While we have nothing but respect for Bradáč’s skill and dexterity, we are 100% certain of the following: we can personally break his record. In fact, we are 100% certain that we can attain an unbreakable record. All we need are about 80,658,175,170,943,878,571,660,636,856,403,766,975,289,505,440,883,277,824,000,000,000,000 attempts at the title. This number, a bit over 80 unvigintillion, is 52 factorial, or “52!” in mathematical notation—the number of ways that a deck of 52 cards can possibly be ordered. By taking roughly that many attempts, sooner or later we are bound to start with a shuffled deck that is in fact completely sorted by chance. At that point we can proudly enter Christian-Griffiths into The Guinness Book alongside a not-too-shabby sort time of 0m00s.
To be fair, we’d almost certainly be trying until the heat death of the universe before we got our perfect record attempt. Nonetheless, this highlights the biggest fundamental difference between record keepers and computer scientists. The fine folks at Guinness care only about best-case performance (and beer). They’re hardly blameworthy, of course: all records in sports reflect the single best performance. Computer science, however, almost never cares about the best case. Instead, computer scientists might want to know the average sort time of someone like Bradáč: get him to sort all of the 80 unvigintillion deck orders, or a reasonably sized sample, and score him on his average speed across all attempts. (You can see why they don’t let computer scientists run these things.)
Moreover, a computer scientist would want to know the worst sort time. Worst-case analysis lets us make hard guarantees: that a critical process will finish in time, that deadlines won’t be blown. So for the rest of this chapter—and the rest of this book, actually—we will be discussing only algorithms’ worst-case performance, unless noted otherwise.
Computer science has developed a shorthand specifically for measuring algorithmic worst-case scenarios: it’s called “Big-O” notation. Big-O notation has a particular quirk, which is that it’s inexact by design. That is, rather than expressing an algorithm’s performance in minutes and seconds, Big-O notation provides a way to talk about the kind of relationship that holds between the size of the problem and the program’s running time. Because Big-O notation deliberately sheds fine details, what emerges is a schema for dividing problems into different broad classes.
Imagine you’re hosting a dinner party with n guests. The time required to clean the house for their arrival doesn’t depend on the number of guests at all. This is the rosiest class of problems there is: called “Big-O of one,” written O(1), it is also known as “constant time.” Notably, Big-O notation doesn’t care a whit how long the cleaning actually takes—just that it’s always the same, totally invariant of the guest list. You’ve got the same work to do if you have one guest as if you have ten, a hundred, or any other n.
Now, the time required to pass the roast around the table will be “Big-O of n,” written O(n), also known as “linear time”—with twice the guests, you’ll wait twice as long for the dish to come around. And again, Big-O notation couldn’t care less about the number of courses that get served, or whether they go around for second helpings. In each case, the time still depends linearly on the guest list size—if you drew a graph of the number of guests vs. the time taken, it would be a straight line. What’s more, the existence of any linear-time factors will, in Big-O notation, swamp all constant-time factors. That is to say, passing the roast once around the table, or remodeling your dining room for three months and then passing the roast once around the table, are both, to a computer scientist, effectively equivalent. If that seems crazy, remember that computers deal with values of n that could easily be in the thousands, millions, or billions. In other words, computer scientists are thinking about very, very big parties. With a guest list in the millions, passing the roast once around would indeed make the home remodel seem dwarfed to the point of insignificance.
What if, as the guests arrived, each one hugged the others in greeting? Your first guest hugs you; your second guest has two hugs to give; your third guest, three. How many hugs will there be in total? This turns out to be “Big-O of n-squared,” written O(n2) and also known as “quadratic time.” Here again, we only care about the basic contours of the relationship between n and time. There’s no O(2n2) for two hugs apiece, or O(n2 + n) for hugs plus passing the food around, or O(n2 + 1) for hugs plus home cleaning. It’s all quadratic time, so O(n2) covers everything.
Constant time, written O(1); linear time, written O(n); and quadratic time, written O(n2).
It gets worse from there. There’s “exponential time,” O(2n), where each additional guest doubles your work. Even worse is “factorial time,” O(n!), a class of problems so truly hellish that computer scientists only talk about it when they’re joking—as we were in imagining shuffling a deck until it’s sorted—or when they really, really wish they were.
The Squares: Bubble Sort and Insertion Sort
When then senator Obama visited Google in 2007, CEO Eric Schmidt jokingly began the Q&A like a job interview, asking him, “What’s the best way to sort a million thirty-two-bit integers?” Without missing a beat, Obama cracked a wry smile and replied, “I think the Bubble Sort would be the wrong way to go.” The crowd of Google engineers erupted in cheers. “He had me at Bubble Sort,” one later recalled.
Obama was right to eschew Bubble Sort, an algorithm which has become something of a punching bag for computer science students: it’s simple, it’s intuitive, and it’s extremely inefficient.
Imagine you want to alphabetize your unsorted collection of books. A natural approach would be just to scan across the shelf looking for out-of-order pairs—Wallace followed by Pynchon, for instance—and flipping them around. Put Pynchon ahead of Wallace, then continue your scan, looping around to the beginning of the shelf each time you reach the end. When you make a complete pass without finding any out-of-order pairs on the entire shelf, then you know the job is done.
This is Bubble Sort, and it lands us in quadratic time. There are n books out of order, and each scan through the shelf can move each one at most one position. (We spot a tiny problem, make a tiny fix.) So in the worst case, where the shelf is perfectly backward, at least one book will need to be moved n positions. Thus a maximum of n passes through n books, which gives us O(n2) in the worst case.* It’s not terrible—for one thing, it’s worlds better than our O(n!) shuffle-till-it’s-sorted idea from earlier (in case you needed computer science to confirm that). But all the same, that squared term can get daunting quickly. For instance, it means that sorting five shelves of books will take not five times as long as sorting a single shelf, but twenty-five times as long.
You might take a different tack—pulling all the books off the shelf and putting them back in place one by one. You’d put the first book in the middle of the shelf, then take the second and compare it to the first, inserting it either to the right or to the left. Picking up the third book, you’d run through the books on the shelf from left to right until you found the right spot to tuck it in. Repeating this process, gradually all of the books would end up sorted on the shelf and you’d be done.
Computer scientists call this, appropriately enough, Insertion Sort. The good news is that it’s arguably even more intuitive than Bubble Sort and doesn’t have quite the bad reputation. The bad news is that it’s not actually that much faster. You still have to do one insertion for each book. And each insertion still involves moving past about half the books on the shelf, on average, to find the correct place. Although in practice Insertion Sort does run a bit faster than Bubble Sort, again we land squarely, if you will, in quadratic time. Sorting anything more than a single bookshelf is still an unwieldy prospect.
Breaking the Quadratic Barrier: Divide and Conquer
At this point, having seen two entirely sensible approaches fall into unsustainable quadratic time, it’s natural to wonder whether faster sorting is even possible.
The question sounds like it’s about productivity. But talk to a computer scientist and it turns out to be closer to metaphysics—akin to thinking about the speed of light, time travel, superconductors, or thermodynamic entropy. What are the universe’s fundamental rules and limits? What is possible? What is allowed? In this way computer scientists are glimpsing God’s blueprints every bit as much as the particle physicists and cosmologists. What is the minimum effort requred to make order?
Could we find a constant-time sort, O(1), one that (like cleaning the house before the bevy of guests arrive) can sort a list of any size in the same amount of time? Well, even just confirming that a shelf of n books is sorted cannot be done in constant time, since it requires checking all n of them. So actually sorting the books in constant time seems out of the question.
What about a linear-time sort, O(n), as efficient as passing a dish around a table, where doubling the number of items to sort merely doubles the work? Thinking about the examples above, it’s tough to imagine how that might work either. The n2 in each case comes from the fact that you need to move n books, and the work required in each move scales with n as well. How would we get from n moves of size n down to just n by itself? In Bubble Sort, our O(n2) running time came from handling each of the n books and moving them as many as n places each. In Insertion Sort, quadratic running time came from handling each of the n books and comparing them to as many as n others before inserting them. A linear-time sort means handling each book for constant time regardless of how many others it needs to find its place among. Doesn’t seem likely.
So we know that we can do at least as well as quadratic time, but probably not as well as linear time. Perhaps our limit lies somewhere between linear time and quadratic time. Are there any algorithms between linear and quadratic, between n and n × n?
There are—and they were hiding in plain sight.
As we mentioned earlier, information processing began in the US censuses of the nineteenth century, with the development, by Herman Hollerith and later by IBM, of physical punch-card sorting devices. In 1936, IBM began producing a line of machines called “collators” that could merge two separately ordered stacks of cards into one. As long as the two stacks were themselves sorted, the procedure of merging them into a single sorted stack was incredibly straightforward and took linear time: simply compare the two top cards to each other, move the smaller of them to the new stack you’re creating, and repeat until finished.
The program that John von Neumann wrote in 1945 to demonstrate the power of the stored-program computer took the idea of collating to its beautiful and ultimate conclusion. Sorting two cards is simple: just put the smaller one on top. And given a pair of two-card stacks, both of them sorted, you can easily collate them into an ordered stack of four. Repeating this trick a few times, you’d build bigger and bigger stacks, each one of them already sorted. Soon enough, you could collate yourself a perfectly sorted full deck—with a final climactic merge, like a riffle shuffle’s order-creating twin, producing the desired result.
This approach is known today as Mergesort, one of the legendary algorithms in computer science. As a 1997 paper put it, “Mergesort is as important in the history of sorting as sorting in the history of computing.”
The power of Mergesort comes from the fact that it indeed ends up with a complexity between linear and quadratic time—specifically, O(n log n), known as “linearithmic” time. Each pass through the cards doubles the size of the sorted stacks, so to completely sort n cards you’ll need to make as many passes as it takes for the number 2, multiplied by itself, to equal n: the base-two logarithm, in other words. You can sort up to four cards in two collation passes, up to eight cards with a third pass, and up to sixteen cards with a fourth. Mergesort’s divide-and-conquer approach inspired a host of other linearithmic sorting algorithms that quickly followed on its heels. And to say that linearithmic complexity is an improvement on quadratic complexity is a titanic understatement. In the case of sorting, say, a census-level number of items, it’s the difference between making twenty-nine passes through your data set … and three hundred million. No wonder it’s the method of choice for large-scale industrial sorting problems.
Mergesort also has real applications in small-scale domestic sorting problems. Part of the reason why it’s so widely used is that it can easily be parallelized. If you’re still strategizing about that bookshelf, the Mergesort solution would be to order a pizza and invite over a few friends. Divide the books evenly, and have each person sort their own stack. Then pair people up and have them merge their stacks. Repeat this process until there are just two stacks left, and merge them one last time onto the shelf. Just try to avoid getting pizza stains on the books.
Beyond Comparison: Outsmarting the Logarithm
In an inconspicuous industrial park near the town of Preston, Washington, tucked behind one nondescript gray entrance of many, lies the 2011 and 2013 National Library Sorting Champion. A long, segmented conveyor belt moves 167 books a minute—85,000 a day—through a bar code scanner, where they are automatically diverted into bomb-bay doors that drop into one of 96 bins.
A Mergesort in action. Given a shelf of eight unsorted books, start by putting adjacent books into sorted pairs. Then collate the pairs into ordered sets of four, and finally collate those sets to get a fully sorted shelf.
The Preston Sort Center is one of the biggest and most efficient book-sorting facilities in the world. It’s run by the King County Library System, which has begun a healthy rivalry with the similarly equipped New York Public Library, with the title going back and forth over four closely contested years. “King County Library beating us this year?” said the NYPL’s deputy director of BookOps, Salvatore Magaddino, before the 2014 showdown. “Fuhgeddaboutit.”
There’s something particularly impressive about the Preston Sort Center from a theoretician’s point of view, too. The books going through its system are sorted in O(n)—linear time.
In an important sense, the O(n log n) linearithmic time offered by Mergesort is truly the best we can hope to achieve. It’s been proven that if we want to fully sort n items via a series of head-to-head comparisons, there’s just no way to compare them any fewer than O(n log n) times. It’s a fundamental law of the universe, and there are no two ways around it.
But this doesn’t, strictly speaking, close the book on sorting. Because sometimes you don’t need a fully ordered set—and sometimes sorting can be done without any item-to-item comparisons at all. These two principles, taken together, allow for rough practical sorts in faster than linearithmic time. This is beautifully demonstrated by an algorithm known as Bucket Sort—of which the Preston Sort Center is a perfect illustration.
In Bucket Sort, items are grouped together into a number of sorted categories, with no regard for finer, intracategory sorting; that can come later. (In computer science the term “bucket” simply refers to a chunk of unsorted data, but some of the most powerful real-world uses of Bucket Sort, as at the KCLS, take the name entirely literally.) Here’s the kicker: if you want to group n items into m buckets, the grouping can be done in O(nm) time—that is, the time is simply proportional to the number of items times the number of buckets. And as long as the number of buckets is relatively small compared to the number of items, Big-O notation will round that to O(n), or linear time.
The key to actually breaking the linearithmic barrier is knowing the distribution from which the items you’re sorting are drawn. Poorly chosen buckets will leave you little better than when you started; if all the books end up in the same bin, for instance, you haven’t made any progress at all. Well-chosen buckets, however, will divide your items into roughly equal-sized groups, which—given sorting’s fundamental “scale hurts” nature—is a huge step toward a complete sort. At the Preston Sort Center, whose job is to sort books by their destination branch, rather than alphabetically, the choice of buckets is driven by circulation statistics. Some branches have a greater circulation volume than others, so they may have two bins allocated to them, or even three.
A similar knowledge of the material is useful to human sorters too. To see sorting experts in action, we took a field trip to UC Berkeley’s Doe and Moffitt Libraries, where there are no less than fifty-two miles of bookshelves to keep in order—and it’s all done by hand. Books returned to the library are first placed in a behind-the-scenes area, allocated to shelves designated by Library of Congress call numbers. For example, one set of shelves there contains a jumble of all the recently returned books with call numbers PS3000–PS9999. Then student assistants load those books onto carts, putting up to 150 books in proper order so they can be returned to the library shelves. The students get some basic training in sorting, but develop their own strategies over time. After a bit of experience, they can sort a full cart of 150 books in less than 40 minutes. And a big part of that experience involves knowing what to expect.
Berkeley undergraduate Jordan Ho, a chemistry major and star sorter, talked us through his process as he went through an impressive pile of books on the PS3000–PS9999 shelves:
I know from experience that there’s a lot of 3500s, so I want to look for any books that are below 3500 and rough-sort those out. And once I do that, then I sort those more finely. After I sort the ones under 3500, I know 3500 itself is a big section—3500–3599—so I want to make that a section itself. If there are a lot of those I might want to fine-tune it even more: 3510s, 3520s, 3530s.
Jordan aims to get a group of 25 or so books onto his cart before putting them in final order, which he does using an Insertion Sort. And his carefully developed strategy is exactly the right way to get there: a Bucket Sort, with his well-informed forecast of how many books he’ll have with various call numbers telling him what his buckets should be.
Sort Is Prophylaxis for Search
Knowing all these sorting algorithms should come in handy next time you decide to alphabetize your bookshelf. Like President Obama, you’ll know not to use Bubble Sort. Instead, a good strategy—ratified by human and machine librarians alike—is to Bucket Sort until you get down to small enough piles that Insertion Sort is reasonable, or to have a Mergesort pizza party.
But if you actually asked a computer scientist to help implement this process, the first question they would ask is whether you should be sorting at all.
Computer science, as undergraduates are taught, is all about tradeoffs. We’ve already seen this in the tensions between looking and leaping, between exploring and exploiting. And one of the most central tradeoffs is between sorting and searching. The basic principle is this: the effort expended on sorting materials is just a preemptive strike against the effort it’ll take to search through them later. What the precise balance should be depends on the exact parameters of the situation, but thinking about sorting as valuable only to support future search tells us something surprising:
Err on the side of messiness.
Sorting something that you will never search is a complete waste; searching something you never sorted is merely inefficient.
The question, of course, becomes how to estimate ahead of time what your future usage will be.
The poster child for the advantages of sorting would be an Internet search engine like Google. It seems staggering to think that Google can take the search phrase you typed in and scour the entire Internet for it in less than half a second. Well, it can’t—but it doesn’t need to. If you’re Google, you are almost certain that (a) your data will be searched, (b) it will be searched not just once but repeatedly, and (c) the time needed to sort is somehow “less valuable” than the time needed to search. (Here, sorting is done by machines ahead of time, before the results are needed, and searching is done by users for whom time is of the essence.) All of these factors point in favor of tremendous up-front sorting, which is indeed what Google and its fellow search engines do.
So, should you alphabetize your bookshelves? For most domestic bookshelves, almost none of the conditions that make sorting worthwhile are true. It’s fairly rare that we find ourselves searching for a particular title. The costs of an unsorted search are pretty low: for every book, if we know roughly where it is we can put our hands on it quickly. And the difference between the two seconds it would take to find the book on a sorted shelf and the ten seconds it would take to scan for it on an unsorted one is hardly a deal breaker. We rarely need to find a title so urgently that it’s worth spending preparatory hours up front to shave off seconds later on. What’s more, we search with our quick eyes and sort with slow hands.
The verdict is clear: ordering your bookshelf will take more time and energy than scanning through it ever will.
Your unsorted bookshelf might not be an everyday preoccupation, but your email inbox almost certainly is—and it’s another domain where searching beats sorting handily. Filing electronic messages by hand into folders takes about the same amount of time as filing physical papers in the real world, but emails can be searched much more efficiently than their physical counterparts. As the cost of searching drops, sorting becomes less valuable.
Steve Whittaker is one of the world’s experts on how people handle their email. A research scientist at IBM and professor at UC Santa Cruz, Whittaker, for almost two decades, has been studying how people manage personal information. (He wrote a paper on “email overload” in 1996, before many people even had email.) In 2011, Whittaker led a study of the searching and sorting habits of email users, resulting in a paper titled “Am I Wasting My Time Organizing Email?” Spoiler alert: the conclusion was an emphatic Yes. “It’s empirical, but it’s also experiential,” Whittaker points out. “When I interview people about these kinds of organizational problems, that’s something that they characteristically talk about, is that they sort of wasted a part of their life.”
Computer science shows that the hazards of mess and the hazards of order are quantifiable and that their costs can be measured in the same currency: time. Leaving something unsorted might be thought of as an act of procrastination—passing the buck to one’s future self, who’ll have to pay off with interest what we chose not to pay up front. But the whole story is subtler than that. Sometimes mess is more than just the easy choice. It’s the optimal choice.
Sorts and Sports
The search-sort tradeoff suggests that it’s often more efficient to leave a mess. Saving time isn’t the only reason we sort things, though: sometimes producing a final order is an end in itself. And nowhere is that clearer than on the sporting field.
In 1883, Charles Lutwidge Dodgson developed incredibly strong feelings about the state of British lawn tennis. As he explains:
At a Lawn Tennis Tournament, where I chanced, some while ago, to be a spectator, the present method of assigning prizes was brought to my notice by the lamentations of one of the Players, who had been beaten (and had thus lost all chance of a prize) early in the contest, and who had had the mortification of seeing the 2nd prize carried off by a Player whom he knew to be quite inferior to himself.
Normal spectators might chalk up such “lamentations” to little more than the sting of defeat, but Dodgson was no ordinary sympathetic ear. He was an Oxford lecturer in mathematics, and the sportsman’s complaints sent him on a deep investigation of the nature of sports tournaments.
Dodgson was more than just an Oxford mathematician—in fact, he’s barely remembered as having been one. Today he’s best known by his pen name, Lewis Carroll, under which he wrote Alice’s Adventures in Wonderland and many other beloved works of nineteenth-century literature. Fusing his mathematical and literary talents, Dodgson produced one of his lesser-known works: “Lawn Tennis Tournaments: The True Method of Assigning Prizes with a Proof of the Fallacy of the Present Method.”
Dodgson’s complaint was directed at the structure of the Single Elimination tournament, where players are paired off with one another and eliminated from competition as soon as they lose a single match. As Dodgson forcefully argued, the true second-best player could be any of the players eliminated by the best—not just the last-eliminated one. Ironically, in the Olympics we do hold bronze medal matches, by which we appear to acknowledge that the Single Elimination format doesn’t give us enough information to determine third place.* But in fact this format doesn’t tell us enough to determine second place either—or, indeed, anything except the winner. As Dodgson put it, “The present method of assigning prizes is, except in the case of the first prize, entirely unmeaning.” Said plainly, the silver medal is a lie.
“As a mathematical fact,” he continued, “the chance that the 2nd best Player will get the prize he deserves is only 16/31sts; while the chance that the best 4 shall get their proper prizes is so small, that the odds are 12 to 1 against its happening!”
Despite the powers of his pen, it appears that Dodgson had little impact on the world of lawn tennis. His solution, an awkward take on triple elimination where the defeat of someone who had defeated you could also eliminate you, never caught on. But if Dodgson’s solution was cumbersome, his critique of the problem was nevertheless spot on. (Alas, silver medals are still being handed out in Single Elimination tournaments to this day.)
But there’s also a deeper insight in Dodgson’s logic. We humans sort more than our data, more than our possessions. We sort ourselves.
The World Cup, the Olympics, the NCAA, NFL, NHL, NBA, and MLB—all of these implicitly implement sorting procedures. Their seasons, ladders, and playoffs are algorithms for producing rank order.
One of the most familiar algorithms in sports is the Round-Robin format, where each of n teams eventually plays every one of the other n − 1 teams. While arguably the most comprehensive, it’s also one of the most laborious. Having every team grapple with every other team is like having guests exchange hugs at our dinner party: the dreaded O(n2), quadratic time.
Ladder tournaments—popular in sports like badminton, squash, and racquetball—put players in a linear ranking, with each player allowed to issue a direct challenge to the player immediately above them, exchanging places if they prevail. Ladders are the Bubble Sorts of the athletic world and are thus also quadratic, requiring O(n2) games to reach a stable ranking.
Perhaps the most prevalent tournament format, however, is a bracket tournament—as in the famous NCAA basketball “March Madness,” among many others. The March Madness tournament progresses from the “Round of 64” and the “Round of 32” to the “Sweet 16,” “Elite Eight,” “Final Four,” and the finals. Each round divides the field in half: does that sound familiarly logarithmic? These tournaments are effectively Mergesort, beginning with unsorted pairs of teams and collating, collating, collating them.
We know that Mergesort operates in linearithmic time—O(n log n)—and so, given that there are 64 teams, we can expect to only need something like 6 rounds (192 games), rather than the whopping 63 rounds (2,016 games) it would take to do a Ladder or Round-Robin. That’s a huge improvement: algorithm design at work.
Six rounds of March Madness sounds about right, but wait a second: 192 games? The NCAA tournament is only 63 games long.
In fact, March Madness is not a complete Mergesort—it doesn’t produce a full ordering of all 64 teams. To truly rank the teams, we’d need an extra set of games to determine second place, another for third, and so on—taking a linearithmic number of games in sum. But March Madness doesn’t do that. Instead, just like the lawn tennis tournament that Dodgson complained about, it uses a Single Elimination format where the eliminated teams are left unsorted. The advantage is that it runs in linear time: since every game eliminates exactly one team, in order to have one team left standing you need just n − 1 games—a linear number. The disadvantage is that, well, you never really figure out the standings aside from first place.
Ironically, in Single Elimination no tournament structure is actually necessary at all. Any 63 games will yield a single undefeated champion. For instance, you could simply have a single “king of the hill” team take on challengers one by one until it is dethroned, at which point whoever defeated it takes over its spot and continues. This format would have the drawback of needing 63 separate rounds, however, as games couldn’t happen in parallel; also, one team could potentially have to play as many as 63 games in a row, which might not be ideal from a fatigue standpoint.
Though born well over a century after Dodgson, perhaps no one carries forward his mathematical take on sporting into the twenty-first century as strongly as Michael Trick. We met Trick back in our discussion of optimal stopping, but in the decades since his hapless application of the 37% Rule to his love life he’s become not only a husband and a professor of operations research—he’s now also one of the principal schedulers for Major League Baseball and for NCAA conferences like the Big Ten and the ACC, using computer science to decide the year’s matchups.
As Trick points out, sports leagues aren’t concerned with determining the rankings as quickly and expeditiously as possible. Instead, sports calendars are explicitly designed to maintain tension throughout the season, something that has rarely been a concern of sorting theory.
For instance in Major League Baseball, you often have races to see who is going to win the division. Now, if we ignored the divisional setup, some of those races might get resolved fairly early in the season. But instead what we do is we make certain in the last five weeks, everybody plays everybody else within their division. The purpose of that is it doesn’t matter who’s in a divisional race: they’re going to have to play their next closest opponent at least six games in the final five weeks of the season. That allows for more interest in the schedule or interest in the season because in this case, uncertainty is delayed in its resolution.
What’s more, sports are not, of course, always designed strictly to minimize the number of games. Without remembering this, some aspects of sports scheduling would otherwise seem mysterious to a computer scientist. As Trick says of baseball’s regular season of 2,430 games, “We know that n log n is the right number of comparisons to do a full sort. That can get you everybody. Why do they do n2 in order to just get, in some sense, the top, if that’s all they care about?” In other words, why do a full O(n2) Round-Robin and then some, if we know we can do a full sort in linearithmic time, and can crown an undefeated Single Elimination champion in less than n games? Well, minimizing the number of games isn’t actually in the league’s interest. In computer science unnecessary comparisons are always bad, a waste of time and effort. But in sports that’s far from the case. In many respects, after all, the games themselves are the point.
Griping Rights: Noise and Robustness
Another, perhaps even more important way of training an algorithmic lens on sports is to ask not what confidence we should have in the silver medal, but what confidence we should have in the gold.
As Michael Trick explains, in some sports, “for instance baseball, a team is going to lose 30% of their games and a team is going to win 30% of their games practically no matter who they are.” This has disturbing implications for the Single Elimination format. If NCAA basketball games, say, are won by the stronger team 70% of the time, and winning the tournament involves prevailing in 6 straight games, then the best team has only a 0.70 to the 6th power—less than 12%—chance of winning the tournament! Put another way, the tournament would crown the league’s truly best team just once a decade.
It may be that in some sports, having even 70% confidence in a game’s outcome might be putting too much stock in the final score. UCSD physicist Tom Murphy applied numerical modeling techniques to soccer and concluded that soccer’s low scores make game outcomes much closer to random than most fans would prefer to imagine. “A 3:2 score gives the winning team only a 5-in-8 chance of actually being a better team … Personally, I don’t find this to be very impressive. Even a 6:1 blowout leaves a 7% chance that it was a statistical fluke.”
Computer scientists call this phenomenon noise. All of the sorting algorithms that we’ve considered thus far assume perfect, flawless, foolproof comparisons, ones that never mess up and mistakenly judge the lesser of two quantities to be the greater. Once you allow for a “noisy comparator,” some of computer science’s most hallowed algorithms go out the window—and some of its most maligned have their day of redemption.
Dave Ackley, professor of computer science at the University of New Mexico, works at the intersection of computer science and “artificial life”—he believes computers can stand to learn a few things from biology. For starters, organisms live in a world where few processes have anywhere near the level of reliability that computers depend on, so they are built from the ground up for what researchers call robustness. It’s time, argues Ackley, that we started recognizing the virtues of robustness in algorithms too.
Thus, while the authoritative programming tome Sorting and Searching boldly declares that “bubble sort has no apparent redeeming features,” the research of Ackley and his collaborators suggests that there may be a place for algorithms like Bubble Sort after all. Its very inefficiency—moving items only one position at a time—makes it fairly robust against noise, far more robust than faster algorithms like Mergesort, in which each comparison potentially moves an item a long way. Mergesort’s very efficiency makes it brittle. An early error in a Mergesort is like a fluke loss in the first round of a Single Elimination tournament, which can not only dash a favored team’s championship hopes but also permanently relegate them to the bottom half of the results.* In a Ladder tournament, on the other hand, as in a Bubble Sort, a fluke loss would only set a player back a single place in the standings.
But in fact it isn’t Bubble Sort that emerges as the single best algorithm in the face of a noisy comparator. The winner of that particular honor is an algorithm called Comparison Counting Sort. In this algorithm, each item is compared to all the others, generating a tally of how many items it is bigger than. This number can then be used directly as the item’s rank. Since it compares all pairs, Comparison Counting Sort is a quadratic-time algorithm, like Bubble Sort. Thus it’s not a popular choice in traditional computer science applications, but it’s exceptionally fault-tolerant.
This algorithm’s workings should sound familiar. Comparison Counting Sort operates exactly like a Round-Robin tournament. In other words, it strongly resembles a sports team’s regular season—playing every other team in the division and building up a win-loss record by which they are ranked.
That Comparison Counting Sort is the single most robust sorting algorithm known, quadratic or better, should offer something very specific to sports fans: if your team doesn’t make the playoffs, don’t whine. The Mergesort postseason is chancy, but the Comparison Counting regular season is not; championship rings aren’t robust, but divisional standings are literally as robust as it gets. Put differently, if your team is eliminated early in the postseason, it’s tough luck. But if your team fails to get to the postseason, it’s tough truth. You may get sports-bar sympathy from your fellow disappointed fans, but you won’t get any from a computer scientist.
Blood Sort: Pecking Orders and Dominance Hierarchies
In all the examples we’ve considered so far, the sorting process in every case has been imposed from the top down: a librarian shelving books, the NCAA telling teams whom to play and when. But what if head-to-head comparisons happened only voluntarily? What does sorting look like when it emerges organically, from the bottom up?
It might look something like online poker.
Unlike most sports, which are governed by a ruling body of some kind, poker remains somewhat anarchic despite exploding in popularity over the past decade. Though some high-profile tournaments do explicitly sort their contestants (and remunerate them accordingly), a substantial portion of poker is still played in what are known as “cash games,” where two or more players spontaneously agree to play with real money on the line with every hand.
Virtually no one knows this world more deeply than Isaac Haxton, one of the world’s best cash-game poker players. In most sports it’s sufficient to be as good as possible, and the less self-conscious one is about one’s skills the better. But, Haxton explains, “In some ways the most important skill as a professional poker player is to be able to evaluate how good you are. If you’re anything short of the very best poker player in the world, you can be pretty much assured of going broke if you are endlessly willing to play people better than you.”
Haxton is a heads-up, no-limit specialist: “heads-up” meaning one-on-one poker, and “no-limit” meaning just that—the highest stakes, limited only by what they can bankroll and stomach. In multi-handed poker cash games, there will often be one weak player—a wealthy amateur, for instance—feeding a table full of professionals, who then don’t much care who among them is better than whom. In the world of heads-up, it’s different. “There has to be a disagreement between you and them about who’s better—or somebody has to be willingly losing.”
So what happens when there’s a fairly established consensus and no one’s willing to play anyone better than they are? You get something that looks a lot like players simply jockeying for seats. Most online poker sites have only a finite number of tables available. “So if you want to play heads-up no-limit, with blinds of fifty and one hundred dollars, there are only ten available tables for that,” says Haxton, “and so only the consensus ten best players who are out right now … sit and wait for someone to show up who wants to play.” And if a superior player arrives and sits down at one of these tables? If the person sitting isn’t willing to ante up, they scram.
“Imagine two monkeys,” says Christof Neumann. “One is sitting and feeding in its spot, very peacefully, and another one is coming up [to] where the other guy is sitting. And that guy would then stand up and leave.”
Neumann isn’t making a poker metaphor. He’s a behavioral biologist at the University of Neuchâtel who studies dominance in macaques. What he’s just described is known as displacement.
Displacement happens when an animal uses its knowledge of the hierarchy to determine that a particular confrontation simply isn’t worth it. In many animal societies, resources and opportunities—food, mates, preferred spaces, and so forth—are scarce, and somehow it must be decided who gets what. Establishing an order ahead of time is less violent than coming to blows every time a mating opportunity or a prime spot of grass becomes available. Though we may cringe when we see creatures turning their claws and beaks on each other, biologists tend to think of pecking orders as the violence that preempts violence.
Sound familiar? It’s the search-sort tradeoff.
The creation of a pecking order is a pugilistic solution to a fundamentally computational problem. For this reason, incidentally, debeaking chickens on farms may be a well-intentioned but counterproductive approach: it removes the authority of individual fights to resolve the order, and therefore makes it much harder for the flock to run any sorting procedure at all. So the amount of antagonism within the flock in many cases actually increases.
Looking at animal behavior from the perspective of computer science suggests several things. For one, it implies that the number of hostile confrontations encountered by each individual will grow substantially—at least logarithmically, and perhaps quadratically—as the group gets bigger. Indeed, studies of “agonistic behavior” in hens have found that “aggressive acts per hen increased as group size increased.” Sorting theory thus suggests that the ethical raising of livestock may include limiting the size of the flock or herd. (In the wild, feral chickens roam in groups of ten to twenty, far smaller than flock sizes on commercial farms.) The studies also show that aggression appears to go away after a period of some weeks, unless new members are added to the flock—corroborating the idea that the group is sorting itself.
The key to thinking about decentralized sorting in nature, argues Jessica Flack, codirector of the Center for Complexity and Collective Computation at UW–Madison, is that dominance hierarchies are ultimately information hierarchies. There’s a significant computational burden to these decentralized sorting systems, Flack points out. The number of fights in, say, a group of macaques is minimized only to the extent that every monkey has a detailed—and similar—understanding of the hierarchy. Otherwise violence will ensue.
If it comes down to how good the protagonists are at keeping track of the current order, we might expect to see fewer confrontations as animals become better able to reason and remember. And perhaps humans do come closest to optimally efficient sorting. As Haxton says of the poker world, “I’m one of the top heads-up, no-limit hold ’em players in the world, and in my head I have a fairly specific ranking of who I think the twenty or so best players are, and I think each of them has a similar ranking in their mind. I think there is a pretty high degree of consensus about what the list looks like.” Only when these rankings differ will cash games ensue.
A Race Instead of a Fight
We’ve now seen two separate downsides to the desire of any group to sort itself. You have, at minimum, a linearithmic number of confrontations, making everyone’s life more combative as the group grows—and you also oblige every competitor to keep track of the ever-shifting status of everyone else, otherwise they’ll find themselves fighting battles they didn’t need to. It taxes not only the body but the mind.
But it doesn’t have to be that way. There are ways of making order without the costs.
There’s one sporting contest, for instance, where tens of thousands of competitors are completely sorted within the time that it takes to hold just a single event. (A Round-Robin tournament with ten thousand players, on the other hand, would require a hundred million matchups.) The only caveat is that the time required for the event is determined by its slowest competitors. This sporting contest is the marathon, and it suggests something critical: a race is fundamentally different from a fight.
Consider the difference between boxers and skiers, between fencers and runners. An Olympic boxer must risk concussion O(log n) times, usually from 4 to 6, to make it to the podium; allowing a greater number of athletes into the games would imperil the health of all. But a skeleton racer or ski jumper or halfpipe specialist needs to make only a constant number of gambles with gravity, no matter the size of the field. A fencer puts herself at her opponent’s mercy O(log n) times, but a marathoner must endure only one race. Being able to assign a simple numerical measure of performance results in a constant-time algorithm for status.
This move from “ordinal” numbers (which only express rank) to “cardinal” ones (which directly assign a measure to something’s caliber) naturally orders a set without requiring pairwise comparisons. Accordingly, it makes possible dominance hierarchies that don’t require direct head-to-head matchups. The Fortune 500 list, to the extent that it creates a kind of corporate hierarchy, is one of these. To find the most valuable company in the United States, analysts don’t need to perform due diligence comparing Microsoft to General Motors, then General Motors to Chevron, Chevron to Walmart, and so forth. These seemingly apples-to-oranges contests (how many enterprise software installations equal how many oil futures?) become apples-to-apples in the medium of dollars. Having a benchmark—any benchmark—solves the computational problem of scaling up a sort.
In Silicon Valley, for instance, there’s an adage about meetings: “You go to the money, the money doesn’t come to you.” Vendors go to founders, founders go to venture capitalists, venture capitalists go to their limited partners. It’s possible for the individuals to resent the basis of this hierarchy, but not really to contest its verdict. As a result, individual pairwise interactions take place with a minimum of jockeying for status. By and large, any pair of people can tell, without needing to negotiate, who is supposed to show what level of respect to whom. Everyone knows where to meet.
Likewise, while maritime right-of-way is governed in theory by an extremely elaborate set of conventions, in practice one straightforward principle determines which ships give way to which: the “Law of Gross Tonnage.” Quite simply, the smaller ship gets out of the way of the larger one. Some animals are also lucky enough to have such clear-cut dominance hierarchies. As Neumann observes, “Look at fish, for example: the bigger one is the dominant one. It’s very simple.” And because it’s so simple, it’s peaceful. Unlike chickens and primates, fish make order without shedding blood.
When we think about the factors that make large-scale human societies possible, it’s easy to focus on technologies: agriculture, metals, machinery. But the cultural practice of measuring status with quantifiable metrics might be just as important. Money, of course, need not be the criterion; a rule like “respect your elders,” for instance, likewise settles questions of people’s status by reference to a common quantity. And the same principle is at work between nations as within them. It is often noted that a benchmark like national GDP—which underlies the invite lists to diplomatic summits such as the G20—is a crude, imperfect measurement. But the existence of any benchmark at all transforms the question of national status from one demanding at least a linearithmic number of tussles and resolutions into something with a single reference point that ranks all. Given that nation-to-nation status disputes often take military form, this saves not only time but lives.
Linearithmic numbers of fights might work fine for small-scale groups; they do in nature. But in a world where status is established through pairwise comparisons—whether they involve exchanging rhetoric or gunfire—the amount of confrontation quickly spirals out of control as society grows. Operating at industrial scale, with many thousands or millions of individuals sharing the same space, requires a leap beyond. A leap from ordinal to cardinal.
Much as we bemoan the daily rat race, the fact that it’s a race rather than a fight is a key part of what sets us apart from the monkeys, the chickens—and, for that matter, the rats.
*This is far from Bradáč’s only record—he can escape from three pairs of handcuffs while underwater in roughly the same amount of time.
*Actually, the average running time for Bubble Sort isn’t any better, as books will, on average, be n/2 positions away from where they’re supposed to end up. A computer scientist will still round n/2 passes of n books up to O(n2).
*On rare occasions, as in boxing—where it is medically unsafe for a boxer to fight again after being recently knocked out—two bronzes are awarded instead.
*It’s interesting to note that NCAA’s March Madness tournament is consciously designed to mitigate this flaw in its algorithm. The biggest problem in Single Elimination, as we’ve said, would seem to be a scenario where the first team that gets eliminated by the winning team is actually the second-best team overall, yet lands in the (unsorted) bottom half. The NCAA works around this by seeding the teams, so that top-ranked teams cannot meet each other in the early rounds. The seeding process appears to be reliable at least in the most extreme case, as a sixteenth-seeded team has never defeated a first seed in the history of March Madness.