Читать книгу The Information: A History, a Theory, a Flood - James Gleick, James Gleick - Страница 9
ОглавлениеChapter Three
Two Wordbooks
(The Uncertainty in Our Writing, the Inconstancy in Our Letters)
In such busie, and active times, there arise more new thoughts of men, which must be signifi’d, and varied by new expressions.
—Thomas Sprat (1667)
A VILLAGE SCHOOLMASTER AND PRIEST made a book in 1604 with a rambling title that began “A Table Alphabeticall, conteyning and teaching the true writing, and understanding of hard usuall English wordes,” and went on with more hints to its purpose, which was unusual and needed explanation:
With the interpretation thereof by plaine English words, gathered for the benefit & helpe of Ladies, Gentlewomen, or any other unskilfull persons.
Whereby they may the more easily and better understand many hard English wordes, which they shall heare or read in Scriptures, Sermons, or elsewhere, and also be made able to use the same aptly themselves.
The title page omitted the name of the author, Robert Cawdrey, but included a motto from Latin—“As good not read, as not to understand”—and situated the publisher with as much formality and exactness as could be expected in a time when the address, as a specification of place, did not yet exist:
At London, Printed by I. R. for Edmund Weaver, & are to be sold at his shop at the great North doore of Paules Church.
CAWDREY’S TITLE PAGE
Even in London’s densely packed streets, shops and homes were seldom to be found by number. The alphabet, however, had a definite order— the first and second letters providing its very name—and that order had been maintained since the early Phoenician times, through all the borrowing and evolution that followed.
Cawdrey lived in a time of information poverty. He would not have thought so, even had he possessed the concept. On the contrary, he would have considered himself to be in the midst of an information explosion, which he himself was trying to abet and organize. But four centuries later, his own life is shrouded in the obscurity of missing knowledge. His Table Alphabeticall appears as a milestone in the history of information, yet of its entire first edition, just one worn copy survived into the future. When and where he was born remain unknown—probably in the late 1530s; probably in the Midlands. Parish registers notwithstanding, people’s lives were almost wholly undocumented. No one has even a definitive spelling for Cawdrey’s name (Cowdrey, Cawdry). But then, no one agreed on the spelling of most names: they were spoken, seldom written.
In fact, few had any concept of “spelling”—the idea that each word, when written, should take a particular predetermined form of letters. The word cony (rabbit) appeared variously as conny, conye, conie, connie, coni, cuny, cunny, and cunnie in a single 1591 pamphlet. Others spelled it differently. And for that matter Cawdrey himself, on the title page of his book for “teaching the true writing,” wrote wordes in one sentence and words in the next. Language did not function as a storehouse of words, from which users could summon the correct items, preformed. On the contrary, words were fugitive, on the fly, expected to vanish again thereafter. When spoken, they were not available to be compared with, or measured against, other instantiations of themselves. Every time people dipped quill in ink to form a word on paper they made a fresh choice of whatever letters seemed to suit the task. But this was changing. The availability—the solidity—of the printed book inspired a sense that the written word should be a certain way, that one form was right and others wrong. First this sense was unconscious; then it began to rise toward general awareness. Printers themselves made it their business.
To spell (from an old Germanic word) first meant to speak or to utter. Then it meant to read, slowly, letter by letter. Then, by extension, just around Cawdrey’s time, it meant to write words letter by letter. The last was a somewhat poetic usage. “Spell Eva back and Ave shall you find,” wrote the Jesuit poet Robert Southwell (shortly before being hanged and quartered in 1595). When certain educators did begin to consider the idea of spelling, they would say “right writing”—or, to borrow from Greek, “orthography.” Few bothered, but one who did was a school headmaster in London, Richard Mulcaster. He assembled a primer, titled “The first part [a second part was not to be] of the Elementarie which entreateth chefelie of the right writing of our English tung.” He published it in 1582 (“at London by Thomas Vautroullier dwelling in the blak-friers by Lud-gate”), including his own list of about eight thousand words and a plea for the idea of a dictionary:
It were a thing verie praiseworthie in my opinion, and no lesse profitable than praise worthie, if some one well learned and as laborious a man, wold gather all the words which we use in our English tung . . . into one dictionarie, and besides the right writing, which is incident to the Alphabete, wold open unto us therein, both their naturall force, and their proper use.
He recognized another motivating factor: the quickening pace of commerce and transportation made other languages a palpable presence, forcing an awareness of the English language as just one among many. “Forenners and strangers do wonder at us,” Mulcaster wrote, “both for the uncertaintie in our writing, and the inconstancie in our letters.” Language was no longer invisible like the air.
Barely 5 million people on earth spoke English (a rough estimate; no one tried to count the population of England, Scotland, or Ireland until 1801). Barely a million of those could write. Of all the world’s languages English was already the most checkered, the most mottled, the most polygenetic. Its history showed continual corruption and enrichment from without. Its oldest core words, the words that felt most basic, came from the language spoken by the Angles, Saxons, and Jutes, Germanic tribes that crossed the North Sea into England in the fifth century, pushing aside the Celtic inhabitants. Not much of Celtic penetrated the Anglo-Saxon speech, but Viking invaders brought more words from Norse and Danish: egg, sky, anger, give, get. Latin came by way of Christian missionaries; they wrote in the alphabet of the Romans, which replaced the runic scripts that spread in central and northern Europe early in the first millennium. Then came the influence of French.
Influence, to Robert Cawdrey, meant “a flowing in.” The Norman Conquest was more like a deluge, linguistically. English peasants of the lower classes continued to breed cows, pigs, and oxen (Germanic words), but in the second millennium the upper classes dined on beef, pork, and mutton (French). By medieval times French and Latin roots accounted for more than half of the common vocabulary. More alien words came when intellectuals began consciously to borrow from Latin and Greek to express concepts the language had not before needed. Cawdrey found this habit irritating. “Some men seek so far for outlandish English, that they forget altogether their mothers language, so that if some of their mothers were alive, they were not able to tell, or understand what they say,” he complained. “One might well charge them, for counterfeyting the Kings English.”
Four hundred years after Cawdrey published his book of words, John Simpson retraced Cawdrey’s path. Simpson was in certain respects his natural heir: the editor of a grander book of words, the Oxford English Dictionary. Simpson, a pale, soft-spoken man, saw Cawdrey as obstinate, uncompromising, and even pugnacious. The schoolteacher was ordained a deacon and then a priest of the Church of England in a restless time, when Puritanism was on the rise. Nonconformity led him into trouble. He seems to have been guilty of “not Conforming himself” to some of the sacraments, such as “the Cross in Baptism, and the Ring in Marriage.” As a village priest he did not care to bow down to bishops and archbishops. He preached a form of equality unwelcome to church authorities. “There was preferred secretly an Information against him for speaking diverse Words in the Pulpit, tending to the depraving of the Book of Common Prayer. . . . And so being judged a dangerous Person, if he should continue preaching, but infecting the People with Principles different from the Religion established.” Cawdrey was degraded from the priesthood and deprived of his benefice. He continued to fight the case for years, to no avail.
All that time, he collected words (“collect, gather”). He published two instructional treatises, one on catechism (“catechiser, that teacheth the principles of Christian religion”) and one on A godlie forme of householde government for the ordering of private families, and in 1604 he produced a different sort of book: nothing more than a list of words, with brief definitions.
Why? Simpson says, “We have already seen that he was committed to simplicity in language, and that he was strong-minded to the point of obstinacy.” He was still preaching—now, to preachers. “Such as by their place and calling (but especially Preachers) as have occasion to speak publiquely before the ignorant people,” Cawdrey declared in his introductory note, “are to bee admonished.” He admonishes them. “Never affect any strange ynckhorne termes.” (An inkhorn was an inkpot; by inkhorn term he meant a bookish word.) “Labour to speake so as is commonly received, and so as the most ignorant may well understand them.” And above all do not affect to speak like a foreigner:
Some far journied gentlemen, at their returne home, like as they love to go in forraine apparrell, so they will pouder their talke with over-sea language. He that commeth lately out of France, will talk French English, and never blush at the matter.
Cawdrey had no idea of listing all the words—whatever that would mean. By 1604 William Shakespeare had written most of his plays, employing a vocabulary of nearly 30,000, but these words were not available to Cawdrey or anyone else. Cawdrey did not bother with the most common words, nor the most inkhorn and Frenchified words; he listed only the “hard usual” words, words difficult enough to need some explanation but still “proper unto the tongue wherein we speake” and “plaine for all men to perceive.” He compiled 2,500. He knew that many were derived from Greek, French, and Latin (“derive, fetch from”), and he marked these accordingly. The book Cawdrey made was the first English dictionary. The word dictionary was not in it.
Although Cawdrey cited no authorities, he had relied on some. He copied the remarks about inkhorn terms and the far-journeyed gentlemen in their foreign apparel from Thomas Wilson’s successful book The Arte of Rhetorique. For the words themselves he found several sources (“source, wave, or issuing foorth of water”). He found about half his words in a primer for teaching reading, called The English Schoolemaister, by Edmund Coote, first published in 1596 and widely reprinted thereafter. Coote claimed that a schoolmaster could teach a hundred students more quickly with his text than forty without it. He found it worthwhile to explain the benefits of teaching people to read: “So more knowledge will be brought into this Land, and moe bookes bought, than otherwise would have been.” Coote included a long glossary, which Cawdrey plundered.
That Cawdrey should arrange his words in alphabetical order, to make his Table Alphabeticall, was not self-evident. He knew he could not count on even his educated readers to be versed in alphabetical order, so he tried to produce a small how-to manual. He struggled with this: whether to describe the ordering in logical, schematic terms or in terms of a step-by-step procedure, an algorithm. “Gentle reader,” he wrote—again adapting freely from Coote—
thou must learne the Alphabet, to wit, the order of the Letters as they stand, perfectly without booke, and where every Letter standeth: as b neere the beginning, n about the middest, and t toward the end. Nowe if the word, which thou art desirous to finde, begin with a then looke in the beginning of this Table, but if with v looke towards the end. Againe, if thy word beginne with ca looke in the beginning of the letter c but if with cu then looke toward the end of that letter. And so of all the rest. &c.
It was not easy to explain. Friar Johannes Balbus of Genoa tried in his 1286 Catholicon. Balbus thought he was inventing alphabetical order for the first time, and his instructions were painstaking: “For example I intend to discuss amo and bibo. I will discuss amo before bibo because a is the first letter of amo and b is the first letter of bibo and a is before b in the alphabet. Similarly . . .” He rehearsed a long list of examples and concluded: “I beg of you, therefore, good reader, do not scorn this great labor of mine and this order as something worthless.”
In the ancient world, alphabetical lists scarcely appeared until around 250 BCE, in papyrus texts from Alexandria. The great library there seems to have used at least some alphabetization in organizing its books. The need for such an artificial ordering scheme arises only with large collections of data, not otherwise ordered. And the possibility of alphabetical order arises only in languages possessing an alphabet: a discrete small symbol set with its own conventional sequence (“abecedarie, the order of the Letters, or hee that useth them”). Even then the system is unnatural. It forces the user to detach information from meaning; to treat words strictly as character strings; to focus abstractly on the configuration of the word. Furthermore, alphabetical ordering comprises a pair of procedures, one the inverse of the other: organizing a list and looking up items; sorting and searching. In either direction the procedure is recursive (“recourse, a running backe againe”). The basic operation is a binary decision: greater than or less than. This operation is performed first on one letter; then, nested as a subroutine, on the next letter; and (as Cawdrey put it, struggling with the awkwardness) “so of all the rest. &c.” This makes for astounding efficiency. The system scales easily to any size, the macrostructure being identical to the microstructure. A person who understands alphabetical order homes in on any one item in a list of a thousand or a million, unerringly, with perfect confidence. And without knowing anything about the meaning.
Not until 1613 was the first alphabetical catalogue made—not printed, but written in two small handbooks—for the Bodleian Library at Oxford. The first catalogue of a university library, made at Leiden, Holland, two decades earlier, was arranged by subject matter, as a shelf list (about 450 books), with no alphabetical index. Of one thing Cawdrey could be sure: his typical reader, a literate, book-buying Englishman at the turn of the seventeenth century, could live a lifetime without ever encountering a set of data ordered alphabetically.
More sensible ways of ordering words came first and lingered for a long time. In China the closest thing to a dictionary for many centuries was the Erya, author unknown, date unknown but probably around the third century BCE. It arranged its two thousand entries by meaning, in topical categories: kinship, building, tools and weapons, the heavens, the earth, plants and animals. Egyptian had word lists organized on philosophical or educational principles; so did Arabic. These lists were arranging not the words themselves, mainly, but rather the world: the things for which the words stood. In Germany, a century after Cawdrey, the philosopher and mathematician Gottfried Wilhelm Leibniz made this distinction explicit:
Let me mention that the words or names of all things and actions can be brought into a list in two different ways, according to the alphabet and according to nature. . . . The former go from the word to the thing, the latter from the thing to the word.
Topical lists were thought provoking, imperfect, and creative. Alphabetical lists were mechanical, effective, and automatic. Considered alphabetically, words are no more than tokens, each placed in a slot. In effect they may as well be numbers.
Meaning comes into the dictionary in its definitions, of course. Cawdrey’s crucial models were dictionaries for translation, especially a 1587 Latin-English Dictionarium by Thomas Thomas. A bilingual dictionary had a clearer purpose than a dictionary of one language alone: mapping Latin onto English made a kind of sense that translating English to English did not. Yet definitions were the point, Cawdrey’s stated purpose being after all to help people understand and use hard words. He approached the task of definition with a trepidation that remains palpable. Even as he defined his words, Cawdrey still did not quite believe in their solidity. Meanings were even more fluid than spellings. Define, to Cawdrey, was for things, not for words: “define, to shew clearely what a thing is.” It was reality, in all its richness, that needed defining. Interpret meant “open, make plaine, to shewe the sence and meaning of a thing.” For him the relationship between the thing and the word was like the relationship between an object and its shadow.
The relevant concepts had not reached maturity:
figurate, to shadowe, or represent, or to counterfaite
type, figure, example, shadowe of any thing
represent, expresse, beare shew of a thing
An earlier contemporary of Cawdrey’s, Ralph Lever, made up his own word: “saywhat, corruptly called a definition: but it is a saying which telleth what a thing is, it may more aptly be called a saywhat.” This did not catch on. It took almost another century—and the examples of Cawdrey and his successors—for the modern sense to come into focus: “Definition,” John Locke finally writes in 1690, “being nothing but making another understand by Words, what Idea the Term defin’d stands for.” And Locke still takes an operational view. Definition is communication: making another understand; sending a message.
Cawdrey borrows definitions from his sources, combines them, and adapts them. In many case he simply maps one word onto another:
orifice, mouth
baud, whore
helmet, head peece
For a small class of words he uses a special designation, the letter k: “standeth for a kind of.” He does not consider it his job to say what kind. Thus:
crocodile, k beast
alablaster, k stone
citron, k fruit
But linking pairs of words, either as synonyms or as members of a class, can carry a lexicographer only so far. The relationships among the words of a language are far too complex for so linear an approach (“chaos, a confused heap of mingle-mangle”). Sometimes Cawdrey tries to cope by adding one or more extra synonyms, definition by triangulation:
specke, spot, or marke
cynicall, doggish, froward
vapor, moisture, ayre, hote breath, or reaking
For other words, representing concepts and abstractions, further removed from the concrete realm of the senses, Cawdrey needs to find another style altogether. He makes it up as he goes along. He must speak to his reader, in prose but not quite in sentences, and we can hear him struggle, both to understand certain words and to express his understanding.
gargarise, to wash the mouth, and throate within, by stirring some liquor up and downe in the mouth
hipocrite, such a one as in his outward apparrell, countenaunce,& behaviour, pretendeth to be another man, then he is indeede, or a deceiver
buggerie, coniunction with one of the same kinde, or of men with beasts theologie, divinitie, the science of living blessedly for ever
Among the most troublesome were technical terms from new sciences:
cypher, a circle in numbering, of no value of it selfe, but serveth to make up the number, and to make other figures of more value
horizon, a circle, deviding the halfe of the firmament, from the other halfe which we see not
zodiack, a circle in the heaven, wherein be placed the 12 signes, and in which the Sunne is mooved
Not just the words but the knowledge was in flux. The language was examining itself. Even when Cawdrey is copying from Coote or Thomas, he is fundamentally alone, with no authority to consult.
One of Cawdrey’s hard usual words was science (“knowledge, or skill”). Science did not yet exist as an institution responsible for learning about the material universe and its laws. Natural philosophers were beginning to have a special interest in the nature of words and their meaning. They needed better than they had. When Galileo pointed his first telescope skyward and discovered sunspots in 1611, he immediately anticipated controversy— traditionally the sun was an epitome of purity—and he sensed that science could not proceed without first solving a problem of language:
So long as men were in fact obliged to call the sun “most pure and most lucid,” no shadows or impurities whatever had been perceived in it; but now that it shows itself to us as partly impure and spotty; why should we not call it “spotted and not pure”? For names and attributes must be accommodated to the essence of things, and not the essence to the names, since things come first and names afterwards.
When Isaac Newton embarked on his great program, he encountered a fundamental lack of definition where it was most needed. He began with a semantic sleight of hand: “I do not define time, space, place, and motion, as being well known to all,” he wrote deceptively. Defining these words was his very purpose. There were no agreed standards for weights and measures. Weight and measure were themselves vague terms. Latin seemed more reliable than English, precisely because it was less worn by everyday use, but the Romans had not possessed the necessary words either. Newton’s raw notes reveal a struggle hidden in the finished product. He tried expressions like quantitas materiae. Too hard for Cawdrey: “materiall, of some matter, or importance.” Newton suggested (to himself) “that which arises from its density and bulk conjointly.” He considered more words: “This quantity I designate under the name of body or mass.” Without the right words he could not proceed. Velocity, force, gravity—none of these were yet suitable. They could not be defined in terms of one another; there was nothing in visible nature at which anyone could point a finger; and there was no book in which to look them up.
As for Robert Cawdrey, his mark on history ends with the publication of his Table Alphabeticall in 1604. No one knows when he died. No one knows how many copies the printer made. There are no records (“records, writings layde up for remembrance”). A single copy made its way to the Bodleian Library in Oxford, which has preserved it. All the others disappeared. A second edition appeared in 1609, slightly expanded (“much inlarged,” the title page claims falsely) by Cawdrey’s son, Thomas, and a third and fourth appeared in 1613 and 1617, and there the life of this book ended.
It was overshadowed by a new dictionary, twice as comprehensive, An English Expositour: Teaching the Interpretation of the hardest Words used in our Language, with sundry Explications, Descriptions, and Discourses. Its compiler, John Bullokar, otherwise left as faint a mark on the historical record as Cawdrey did. He was doctor of physic; he lived for some time in Chichester; his dates of birth and death are uncertain; he is said to have visited London in 1611 and there to have seen a dead crocodile; and little else is known. His Expositour appeared in 1616 and went through several editions in the succeeding decades. Then in 1656 a London barrister, Thomas Blount, published his Glossographia: or a Dictionary, Interpreting all such Hard Words of Whatsoever Language, now used in our refined English Tongue. Blount’s dictionary listed more than eleven thousand words, many of which, he recognized, were new, reaching London in the hurly-burly of trade and commerce—
coffa or cauphe, a kind of drink among the Turks and Persians, (and of late introduced among us) which is black, thick and bitter, destrained from Berries of that nature, and name, thought good and very wholesom: they say it expels melancholy.
—or home-grown, such as “tom-boy, a girle or wench that leaps up and down like a boy.” He seems to have known he was aiming at a moving target. The dictionary maker’s “labor,” he wrote in his preface, “would find no end, since our English tongue daily changes habit.” Blount’s definitions were much more elaborate than Cawdrey’s, and he tried to provide information about the origins of words as well.
Neither Bullokar nor Blount so much as mentioned Cawdrey. He was already forgotten. But in 1933, upon the publication of the greatest word book of all, the first editors of the Oxford English Dictionary did pay their respects to his “slim, small volume.” They called it “the original acorn” from which their oak had grown. (Cawdrey: “akecorne, k fruit.”)
Four hundred and two years after the Table Alphabeticall, the International Astronomical Union voted to declare Pluto a nonplanet, and John Simpson had to make a quick decision. He and his band of lexicographers in Oxford were working on the P’s. Pletzel, plish, pod person, point-and-shoot, and polyamorous were among the new words entering the OED. The entry for Pluto was itself relatively new. The planet had been discovered only in 1930, too late for the OED’s first edition. The name Minerva was first proposed and then rejected because there was already an asteroid Minerva. In terms of names, the heavens were beginning to fill up. Then “Pluto” was suggested by Venetia Burney, an eleven-year-old resident of Oxford. The OED caught up by adding an entry for Pluto in its second edition: “1. A small planet of the solar system lying beyond the orbit of Neptune . . . 2. The name of a cartoon dog that made its first appearance in Walt Disney’s Moose Hunt, released in April 1931.”
“We really don’t like being pushed into megachanges,” Simpson said, but he had little choice. The Disney meaning of Pluto had proved more stable than the astronomical sense, which was downgraded to “small planetary body.” Consequences rippled through the OED. Pluto was removed from the list under planet n. 3a. Plutonian was revised (not to be confused with pluton, plutey, or plutonyl ).
Simpson was the sixth in a distinguished line, the editors of the Oxford English Dictionary, whose names rolled fluently off his tongue— “Murray, Bradley, Craigie, Onions, Burchfield, so however many fingers that is”—and saw himself as a steward of their traditions, as well as traditions of English lexicography extending back to Cawdrey by way of Samuel Johnson. James Murray in the nineteenth century established a working method based on index cards, slips of paper 6 inches by 4 inches. At any given moment a thousand such slips sat on Simpson’s desk, and within a stone’s throw were millions more, filling metal files and wooden boxes with the ink of two centuries. But the word-slips had gone obsolete. They had become treeware. Treeware had just entered the OED as “computing slang, freq. humorous”; blog was recognized in 2003, dot-commer in 2004, cyberpet in 2005, and the verb to Google in 2006. Simpson himself Googled often. Beside the word-slips his desk held conduits into the nervous system of the language: instantaneous connection to a worldwide network of proxy amateur lexicographers and access to a vast, interlocking set of databases growing asymptotically toward the ideal of All Previous Text. The dictionary had met cyberspace, and neither would be the same thereafter. However much Simpson loved the OED’s roots and legacy, he was leading a revolution, willy-nilly—in what it was, what it knew, what it saw. Where Cawdrey had been isolated, Simpson was connected.
The English language, spoken now by more than a billion people globally, has entered a period of ferment, and the perspective available in these venerable Oxford offices is both intimate and sweeping. The language upon which the lexicographers eavesdrop has become wild and amorphous: a great, swirling, expanding cloud of messaging and speech; newspapers, magazines, pamphlets; menus and business memos; Internet news groups and chat-room conversations; television and radio broadcasts and phonograph records. By contrast, the dictionary itself has acquired the status of a monument, definitive and towering. It exerts an influence on the language it tries to observe. It wears its authoritative role reluctantly. The lexicographers may recall Ambrose Bierce’s sardonic century-old definition: “dictionary, a malevolent literary device for cramping the growth of a language and making it hard and inelastic.” Nowadays they stress that they do not presume (or deign) to disapprove any particular usage or spelling. But they cannot disavow a strong ambition: the goal of completeness. They want every word, all the lingo: idioms and euphemisms, sacred or profane, dead or alive, the King’s English or the street’s. It is an ideal only: the constraints of space and time are ever present and, at the margins, the question of what qualifies as a word can become impossible to answer. Still, to the extent possible, the OED is meant to be a perfect record, perfect mirror of the language.
The dictionary ratifies the persistence of the word. It declares that the meanings of words come from other words. It implies that all words, taken together, form an interlocking structure: interlocking, because all words are defined in terms of other words. This could never have been an issue in an oral culture, where language was barely visible. Only when printing—and the dictionary—put the language into separate relief, as an object to be scrutinized, could anyone develop a sense of word meaning as interdependent and even circular. Words had to be considered as words, representing other words, apart from things. In the twentieth century, when the technologies of logic advanced to high levels, the potential for circularity became a problem. “In giving explanations I already have to use language full blown,” complained Ludwig Wittgenstein. He echoed Newton’s frustration three centuries earlier, but with an extra twist, because where Newton wanted words for nature’s laws, Wittgenstein wanted words for words: “When I talk about language (words, sentences, etc.) I must speak the language of every day. Is this language somehow too coarse and material for what we want to say?” Yes. And the language was always in flux.
James Murray was speaking of the language as well as the book when he said, in 1900, “The English Dictionary, like the English Constitution, is the creation of no one man, and of no one age; it is a growth that has slowly developed itself adown the ages.” The first edition of what became the OED was one of the largest books that had ever been made: A New English Dictionary on Historical Principles, 414,825 words in ten weighty volumes, presented to King George V and President Calvin Coolidge in 1928. The work had taken decades; Murray himself was dead; and the dictionary was understood to be out of date even as the volumes were bound and sewn. Several supplements followed, but not till 1989 did the second edition appear: twenty volumes, totaling 22,000 pages. It weighed 138 pounds. The third edition is different. It is weightless, taking its shape in the digital realm. It may never again involve paper and ink. Beginning in the year 2000, a revision of the entire contents began to appear online in quarterly installments, each comprising several thousand revised entries and hundreds of new words.
Cawdrey had begun work naturally enough with the letter A, and so had James Murray in 1879, but Simpson chose to begin with M. He was wary of the A’s. To insiders it had long been clear that the OED as printed was not a seamless masterpiece. The early letters still bore scars of the immaturity of the uncertain work in Murray’s first days. “Basically he got here, sorted his suitcases out and started setting up text,” Simpson said. “It just took them a long time to sort out their policy and things, so if we started at A, then we’d be making our job doubly difficult. I think they’d sorted themselves out by . . . well, I was going to say D, but Murray always said that E was the worst letter, because his assistant, Henry Bradley, started E, and Murray always said that he did that rather badly. So then we thought, maybe it’s safe to start with G, H. But you get to G and H and there’s I, J, K, and you know, you think, well, start after that.”
The first thousand entries from M to mahurat went online in the spring of 2000. A year later, the lexicographers reached words starting with me: me-ism (a creed for modern times), meds (colloq. for drugs), medspeak (doctors’ jargon), meet-and-greet (a N. Amer. type of social occasion), and an assortment of combined forms under media (baron, circus, darling, hype, savvy) and mega- (pixel, bitch, dose, hit, trend). This was no longer a language spoken by 5 million mostly illiterate inhabitants of a small island. As the OED revised the entries letter by letter, it also began adding neologisms wherever they arose; waiting for the alphabetical sequence became impractical. Thus one installment in 2001 saw the arrival of acid jazz, Bollywood, channel surfing, double-click, emoticon, feel-good, gangsta, hyperlink, and many more. Kool-Aid was recognized as a new word, not because the OED feels obliged to list proprietary names (the original Kool-Ade powdered drink had been patented in the United States in 1927) but because a special usage could no longer be ignored: “to drink the Kool-Aid: to demonstrate unquestioning obedience or loyalty.” The growth of this peculiar expression since the use of a powdered beverage in a mass poisoning in Guyana in 1978 bespoke a certain density of global communication.
But they were no slaves to fashion, these Oxford lexicographers. As a rule a neologism needs five years of solid evidence for admission to the canon. Every proposed word undergoes intense scrutiny. The approval of a new word is a solemn matter. It must be in general use, beyond any particular place of origin; the OED is global, recognizing words from everywhere English is spoken, but it does not want to capture local quirks. Once added, a word cannot come out. A word can go obsolete or rare, but the most ancient and forgotten words have a way of reappearing—rediscovered or spontaneously reinvented—and in any case they are part of the language’s history. All 2,500 of Cawdrey’s words are in the OED, perforce. For thirty-one of them Cawdrey’s little book was the first known usage. For a few Cawdrey is all alone. This is troublesome. The OED is irrevocably committed. Cawdrey, for example, has “onust, loaden, overcharged”; so the OED has “loaded, burdened,” but it is an outlier, a one-off. Did Cawdrey make it up? “I’m tending towards the view that he was attempting to reproduce vocabulary he had heard or seen,” Simpson said. “But I can’t be absolutely sure.” Cawdrey has “hallucinate, to deceive, or blind”; the OED duly gave “to deceive” as the first sense of the word, though it never found anyone else who used it that way. In cases like these, the editors can add their double caveat “Obs. rare.” But there it is.
For the twenty-first-century OED a single source is never enough. Strangely, considering the vastness of the enterprise and its constituency, individual men and women strive to have their own nonce-words ratified by the OED. Nonce-word, in fact, was coined by James Murray himself. He got it in. An American psychologist, Sondra Smalley, coined the word codependency in 1979 and began lobbying for it in the eighties; the editors finally drafted an entry in the nineties, when they judged the word to have become established. W. H. Auden declared that he wanted to be recognized as an OED word coiner—and he was, at long last, for motted, metalogue, spitzy, and others. The dictionary had thus become engaged in a feedback loop. It inspired a twisty self-consciousness in the language’s users and creators. Anthony Burgess whinged in print about his inability to break through: “I invented some years ago the word amation, for the art or act of making love, and still think it useful. But I have to persuade others to use it in print before it is eligible for lexicographicizing (if that word exists)”—he knew it did not. “T. S. Eliot’s large authority got the shameful (in my view) juvescence into the previous volume of the Supplement.” Burgess was quite sure that Eliot simply misspelled juvenescence. If so, the misspelling was either copied or reprised twenty-eight years later by Stephen Spender, so juvescence has two citations, not one. The OED admits that it is rare.
As hard as the OED tries to embody the language’s fluidity, it cannot help but serve as an agent of its crystallization. The problem of spelling poses characteristic difficulties. “Every form in which a word has occurred throughout its history” is meant to be included. So for mackerel (“a well-known sea-fish, Scomber scombrus, much used for food”) the second edition in 1989 listed nineteen alternative spellings. The unearthing of sources never ends, though, so the third edition revised entry in 2002 listed no fewer than thirty: maccarel, mackaral, mackarel, mackarell, mackerell, mackeril, mackreel, mackrel, mackrell, mackril, macquerel, macquerell, macrel, macrell, macrelle, macril, macrill, makarell, makcaral, makerel, makerell, makerelle, makral, makrall, makreill, makrel, makrell, makyrelle, maquerel, and maycril. As lexicographers, the editors would never declare these alternatives to be wrong: misspellings. They do not wish to declare their choice of spelling for the headword, mackerel, to be “correct.” They emphasize that they examine the evidence and choose “the most common current spelling.” Even so, arbitrary considerations come into play: “Oxford’s house style occasionally takes precedence, as with verbs which can end -ize or -ise, where the -ize spelling is always used.” They know that no matter how often and how firmly they disclaim a prescriptive authority, a reader will turn to the dictionary to find out how a word should be spelled. They cannot escape inconsistencies. They feel obliged to include words that make purists wince. A new entry as of December 2003 memorialized nucular: “= nuclear a. (in various senses).” Yet they refuse to count evident misprints found by way of Internet searches. They do not recognize straight-laced, even though statistical evidence finds that bastardized form outnumbering strait-laced. For the crystallization of spelling, the OED offers a conventional explanation: “Since the invention of the printing press, spelling has become much less variable, partly because printers wanted uniformity and partly because of a growing interest in language study during the Renaissance.” This is true. But it omits the role of the dictionary itself, arbitrator and exemplar.
For Cawdrey the dictionary was a snapshot; he could not see past his moment in time. Samuel Johnson was more explicitly aware of the dictionary’s historical dimension. He justified his ambitious program in part as a means of bringing a wild thing under control—the wild thing being the language, “which, while it was employed in the cultivation of every species of literature, has itself been hitherto neglected; suffered to spread, under the direction of chance, into wild exuberance; resigned to the tyranny of time and fashion; and exposed to the corruptions of ignorance, and caprices of innovation.” Not until the OED, though, did lexicography attempt to reveal the whole shape of a language across time. The OED becomes a historical panorama. The project gains poignancy if the electronic age is seen as a new age of orality, the word breaking free from the bonds of cold print. No publishing institution better embodies those bonds, but the OED, too, tries to throw them off. The editors feel they can no longer wait for a new word to appear in print, let alone in a respectably bound book, before they must take note. For tighty-whities (men’s underwear), new in 2007, they cite a typescript of North Carolina campus slang. For kitesurfer, they cite a posting to the Usenet newsgroup alt.kite and later a New Zealand newspaper found via an online database. Bits in the ether.
When Murray began work on the new dictionary, the idea was to find the words, and with them the signposts to their history. No one had any idea how many words were there to be found. By then the best and most comprehensive dictionary of English was American: Noah Webster’s, seventy thousand words. That was a baseline. Where were the rest to be discovered? For the first editors of what became the OED, it went almost without saying that the source, the wellspring, should be the literature of the language—particularly the books of distinction and quality. The dictionary’s first readers combed Milton and Shakespeare (still the single most quoted author, with more than thirty thousand references), Fielding and Swift, histories and sermons, philosophers and poets. Murray announced in a famous public appeal in 1879:
A thousand readers are wanted. The later sixteenth-century literature is very fairly done; yet here several books remain to be read. The seventeenth century, with so many more writers, naturally shows still more unexplored territory.
He considered the territory to be large but bounded. The founders of the dictionary explicitly meant to find every word, however many that would ultimately be. They planned a complete inventory. Why should they not? The number of books was unknown but not unlimited, and the number of words in those books was countable. The task seemed formidable but finite.
It no longer seems finite. Lexicographers are accepting the language’s boundlessness. They know by heart Murray’s famous remark: “The circle of the English language has a well-defined centre but no discernable circumference.” In the center are the words everyone knows. At the edges, where Murray placed slang and cant and scientific jargon and foreign border crossers, everyone’s sense of the language differs and no one’s can be called “standard.”
Murray called the center “well defined,” but infinitude and fuzziness can be seen there. The easiest, most common words—the words Cawdrey had no thought of including—require, in the OED, the most extensive entries. The entry for make alone would fill a book: it teases apart ninety-eight distinct senses of the verb, and some of these senses have a dozen or more subsenses. Samuel Johnson saw the problem with these words and settled on a solution: he threw up his hands.
My labor has likewise been much increased by a class of verbs too frequent in the English language, of which the signification is so loose and general, the use so vague and indeterminate, and the senses detorted so widely from the first idea, that it is hard to trace them through the maze of variation, to catch them on the brink of utter inanity, to circumscribe them by any limitations, or interpret them by any words of distinct and settled meaning; such are bear, break, come, cast, full, get, give, do, put, set, go, run, make, take, turn, throw. If of these the whole power is not accurately delivered, it must be remembered, that while our language is yet living, and variable by the caprice of every one that speaks it, these words are hourly shifting their relations, and can no more be ascertained in a dictionary, than a grove, in the agitation of a storm, can be accurately delineated from its picture in the water.
Johnson had a point. These are words that any speaker of English can press into new service at any time, on any occasion, alone or in combination, inventively or not, with hopes of being understood. In every revision, the OED’s entry for a word like make subdivides further and thus grows larger. The task is unbounded in an inward-facing direction.
The more obvious kind of unboundedness appears at the edges. Neologism never ceases. Words are coined by committee: transistor, Bell Laboratories, 1948. Or by wags: booboisie, H. L. Mencken, 1922. Most arise through spontaneous generation, organisms appearing in a petri dish, like blog (c. 1999). One batch of arrivals includes agroterrorism, bada-bing, bahookie (a body part), beer pong (a drinking game), bippy (as in, you bet your ———), chucklesome, cypherpunk, tuneage, and wonky. None are what Cawdrey would have seen as “hard, usual words,” and none are anywhere near Murray’s well-defined center, but they now belong to the common language. Even bada-bing: “Suggesting something happening suddenly, emphatically, or easily and predictably; ‘Just like that!’, ‘Presto!’ ” The historical citations begin with a 1965 audio recording of a comedy routine by Pat Cooper and continue with newspaper clippings, a television news transcript, and a line of dialogue from the first Godfather movie: “You’ve gotta get up close like this and bada-bing! you blow their brains all over your nice Ivy League suit.” The lexicographers also provide an etymology, an exquisite piece of guesswork: “Origin uncertain. Perh. imitative of the sound of a drum roll and cymbal clash. Perh. cf. Italian bada bene mark well.”
The English language no longer has such a thing as a geographic center, if it ever did. The universe of human discourse always has backwaters. The language spoken in one valley diverges from the language of the next valley, and so on. There are more valleys now than ever, even if the valleys are not so isolated. “We are listening to the language,” said Peter Gilliver, an OED lexicographer and resident historian. “When you are listening to the language by collecting pieces of paper, that’s fine, but now it’s as if we can hear everything said anywhere. Take an expatriate community living in a non-English-speaking part of the world, expatriates who live at Buenos Aires or something. Their English, the English that they speak to one another every day, is full of borrowings from local Spanish. And so they would regard those words as part of their idiolect, their personal vocabulary.” Only now they may also speak in chat rooms and on blogs. When they coin a word, anyone may hear. Then it may or may not become part of the language.
If there is an ultimate limit to the sensitivity of lexicographers’ ears, no one has yet found it. Spontaneous coinages can have an audience of one. They can be as ephemeral as atomic particles in a bubble chamber. But many neologisms require a level of shared cultural knowledge. Perhaps bada-bing would not truly have become part of twenty-first-century English had it not been for the common experience of viewers of a particular American television program (though it is not cited by the OED).
The whole word hoard—the lexis—constitutes a symbol set of the language. It is the fundamental symbol set, in one way: words are the first units of meaning any language recognizes. They are recognized universally. But in another way it is far from fundamental: as communication evolves, messages in a language can be broken down and composed and transmitted in much smaller sets of symbols: the alphabet; dots and dashes; drumbeats high and low. These symbol sets are discrete. The lexis is not. It is messier. It keeps on growing. Lexicography turns out to be a science poorly suited to exact measurement. English, the largest and most widely shared language, can be said very roughly to possess a number of units of meaning that approaches a million. Linguists have no special yardsticks of their own; when they try to quantify the pace of neologism, they tend to look to the dictionary for guidance, and even the best dictionary runs from that responsibility. The edges always blur. A clear line cannot be drawn between word and unword.
So we count as we can. Robert Cawdrey’s little book, making no pretense to completeness, contained a vocabulary of only 2,500. We possess now a more complete dictionary of English as it was circa 1600: the subset of the OED comprising words then current. That vocabulary numbers 60,000 and keeps growing, because the discovery of sixteenth-century sources never ends. Even so, it is a tiny fraction of the words used four centuries later. The explanation for this explosive growth, from 60,000 to a million, is not simple. Much of what now needs naming did not yet exist, of course. And much of what existed was not recognized. There was no call for transistor in 1600, nor nanobacterium, nor webcam, nor fen-phen. Some of the growth comes from mitosis. The guitar divides into the electric and the acoustic; other words divide in reflection of delicate nuances (as of March 2007 the OED assigned a new entry to prevert as a form of pervert, taking the view that prevert was not just an error but a deliberately humorous effect). Other new words appear without any corresponding innovation in the world of real things. They crystallize in the solvent of universal information.
What, in the world, is a mondegreen? It is a misheard lyric, as when, for example, the Christian hymn is heard as “Lead on, O kinky turtle . . .”). In sifting the evidence, the OED first cites a 1954 essay in Harper’s Magazine by Sylvia Wright: “What I shall hereafter call mondegreens, since no one else has thought up a word for them.” She explained the idea and the word this way:
When I was a child, my mother used to read aloud to me from Percy’s Reliques, and one of my favorite poems began, as I remember:
Ye Highlands and ye Lowlands,
Oh, where hae ye been?
They hae slain the Earl Amurray,
And Lady Mondegreen.
There the word lay, for some time. A quarter-century later, William Safire discussed the word in a column about language in The New York Times Magazine. Fifteen years after that, Steven Pinker, in his book The Language Instinct, offered a brace of examples, from “A girl with colitis goes by” to “Gladly the cross-eyed bear,” and observed, “The interesting thing about mondegreens is that the mishearings are generally less plausible than the intended lyrics.” But it was not books or magazines that gave the word its life; it was Internet sites, compiling mondegreens by the thousands. The OED recognized the word in June 2004.
A mondegreen is not a transistor, inherently modern. Its modernity is harder to explain. The ingredients—songs, words, and imperfect understanding—are all as old as civilization. Yet for mondegreens to arise in the culture, and for mondegreen to exist in the lexis, required something new: a modern level of linguistic self-consciousness and interconnectedness. People needed to mishear lyrics not just once, not just several times, but often enough to become aware of the mishearing as a thing worth discussing. They needed to have other such people with whom to share the recognition. Until the most modern times, mondegreens, like countless other cultural or psychological phenomena, simply did not need to be named. Songs themselves were not so common; not heard, anyway, on elevators and mobile phones. The word lyrics, meaning the words of a song, did not exist until the nineteenth century. The conditions for mondegreens took a long time to ripen. Similarly, the verb to gaslight now means “to manipulate a person by psychological means into questioning his or her own sanity”; it exists only because enough people saw the 1944 film of that title and could assume that their listeners had seen it, too. Might not the language Cawdrey spoke—which was, after all, the abounding and fertile language of Shakespeare—have found use for such a word? No matter: the technology for gaslight had not been invented. Nor had the technology for motion pictures.
The lexis is a measure of shared experience, which comes from inter-connectedness. The number of users of the language forms only the first part of the equation: jumping in four centuries from 5 million English speakers to a billion. The driving factor is the number of connections between and among those speakers. A mathematician might say that messaging grows not geometrically, but combinatorially, which is much, much faster. “I think of it as a saucepan under which the temperature has been turned up,” Gilliver said. “Any word, because of the interconnectedness of the English-speaking world, can spring from the backwater. And they are still backwaters, but they have this instant connection to ordinary, everyday discourse.” Like the printing press, the telegraph, and the telephone before it, the Internet is transforming the language simply by transmitting information differently. What makes cyberspace different from all previous information technologies is its intermixing of scales from the largest to the smallest without prejudice, broadcasting to the millions, narrowcasting to groups, instant messaging one to one.
This comes as quite an unexpected consequence of the invention of computing machinery. At first, that had seemed to be about numbers.