Читать книгу Geekspeak: Why Life + Mathematics = Happiness - Graham Tattersall - Страница 7
ОглавлениеSCRABBLING
FOR WORDS
How big is your vocabulary?
You know thousands of words with many different meanings. Jane Austen uses over 6,000 different words in Pride and Prejudice, and you can read them all without the slightest problem. In fact, your passive reading vocabulary probably exceeds 10,000 words. Can you remember the last time you heard or read a word whose meaning you didn’t know?
On the other hand, your active vocabulary – the words you use in everyday speech – will be much more limited. On an average day you’ll probably get by on a few hundred words. And those words say a lot about you: your sex, age and social class.
In the early 1990s, recordings were made of conversations and used to build a database of words in the English language. The database, held at the University of Lancaster, contains over 100 million words spoken by men and women of all ages and occupations. Some interesting facts were pulled out of this data by three researchers, Paul Rayson, Geoffrey Leech and Mary Hodges. One of the most startling is the difference between the kind of words used by men and those used by women. It turned out that there are certain words which act like fingerprints, showing that a conversation is between two men, or between two women.
These are the top three fingerprint words in women’s conversation:
she
her
said
And the three words most characteristic of man-to-man conversation:
fucking
er
the
Those top three female words are instantly recognisable as typical of ‘girl talk’. Just eavesdrop on a conversation between two women chatting near the office coffee machine: ‘And she said that her friend was really upset… And I said to her…’
As for the men, here are a couple of guys leaning over the open bonnet of a car: ‘What’s that fucking wire doing?’ ‘Er, dunno. The battery’s dead.’
Similar studies reveal words that distinguish social groups. Words such as ‘actually’ and ‘really’ are indicative of Groups A, B and C1; ‘bloke’, ‘bloody’ and ‘pound’ are distinctly C2, D and E.
The journey from the equality of baby burbling to speaking in ways that encode your gender, age and social status takes two or three decades, but you can go back to the first moments after your birth quite easily. Start by letting every muscle in your mouth and lips go slack. Now make a noise.
That grunt is called the schwa. It is the most basic, neutral vowel sound, and sounds similar, though not identical, when uttered by people with different mother tongues.
In the first few months after your birth, you’ll start to babble, and by the time you’re coming up to your first birthday you’ll have a few words. Those words use vowel sounds such as u as in ‘mum’ and a as in ‘man’. They’ll be bracketed by primitive consonants or nasal sounds such as m and n to create important words such as ‘mumma’ and ‘dadda’.
Fast-forward to the age of around eighteen months, and you’ll be making much more complex sounds by articulating most vowels and consonants, and introducing l and n sounds.
The extremes of the vowel sounds are the cardinal points of your language. In English, they range from a as in ‘cat’ and i as in ‘hit’, to oo as in ‘hoot’ and aw as in ‘saw’. You can utter a kind of sound circle with the cardinal vowels. Voice them in sequence and you’ll find that the sound changes smoothly from one vowel to the next.
Counting vowels, consonants, nasals, and l and r sounds, a fully developed English speaker can recognise at least forty-five basic sounds. They are called phonemes.
It used to be thought that each phoneme was a distinct acoustic event, but it is now accepted that many are psycho-perceptual. For example, the stop consonant pp in the word ‘apple’ does not exist by itself. A stop consonant is the sound we think we hear ourselves saying when we use our mouth to rapidly stop or start a sound. The pp in ‘apple’ is the sound made when we quickly close our lips to stop making the a sound, and then explosively open them again to continue with the le sound.
We perceive the stop consonant as an actual sound that exists between the a and the le. But if you look at the sound trace of someone saying ‘apple’, you’ll see a period of silence in the middle of the word. That’s the pp in ‘apple’. It doesn’t exist: it’s simply perceived because of the way the a and le are stopped and started.
One of the drawbacks of growing up speaking your mother tongue is losing responsiveness to speech sounds in other languages. This was demonstrated by Japanese researchers. They played sounds to infants while monitoring the frequency with which the infants sucked on a dummy. They sucked more often when there was a recognisable stimulus such as their mother’s face or a familiar sound.
Newborns, who had barely been exposed to their mother tongue, sucked rapidly when they heard any of a wide variety of sounds drawn from many languages. Older infants sucked rapidly only when they heard sounds used in their mother tongue. The researchers inferred that infants lose the ability to distinguish certain sounds when they start to learn a language in which those sounds are absent.
Now, hopefully many years after you stopped sucking because something seemed familiar, you understand, speak and read thousands of words. It’s rather strange that we bother, when just a few hundred words are sufficient for our daily lives.
So, how many words do you know?
It’s possible to work out the size of your passive vocabulary. One approach is to go through every entry in a dictionary and tick off every word you know. But if you’ve got other things to do in life, there’s a smarter way that gives a good estimate in a much shorter time. That method is called statistical sampling.
The idea behind statistical sampling is the same as used in surveys of, for example, voters. The nation’s intended voting pattern could be found by asking all 30 million voters about their plans for the voting booth. More practically, a representative sample of voters is questioned. The sample might consist of just 1,000 people carefully selected to represent all the localities and social groups in the country.
The same approach can be used to estimate your vocabulary. Sample the ‘population’ of words by opening the dictionary at random 100 times. Each time, look at the first entry at the top of the page. Do you know the meaning of this word? If the answer is yes, add one to your word score. At the end of the exercise, divide your score by the sample size of one hundred to get an estimate of the fraction of words in the dictionary that you know. Multiply that fraction by the total number of words in the dictionary to make an estimate of your vocabulary size.
This method works, but you need to be careful: how many times should you dip into the dictionary at random to get a good estimate? Say you do the test twice and find that you know the first word, but not the second. That means that you know 50% of the words in the tiny bit of the dictionary you examined.
But common sense tells you that this estimate is unreliable. It is true that you might know half the dictionary, but it is also possible that you know 10% or 90% of all the words. The two words you chanced upon might have been unusually uncommon, or unusually common. Two out of however many thousand words the dictionary defines is not a representative sample.
Do the trial 10 times, and confidence in the result is greater; 100 times, even better. If you did the trial 1,000 times and found that you knew 500 words, you could argue quite strongly that you really do know about half of all the words in the dictionary.
To complete the estimate of your vocabulary you’ll need to know the total number of words in the dictionary – preferably without having to count them. This is quite easy: look up the number of the last page in the dictionary, and take that as the number of pages. Next, open the dictionary at random and count the number of different words listed on that page. Multiply the number of pages by the number of words per page, and you have an estimate of the number of words in the dictionary.
I thought I’d better test myself using this statistical sampling technique. The dictionary I used has about 60 entries on each page, and over 800 pages. That’s around 48,000 words altogether.
I opened the dictionary 125 times, and made a tick on a piece of paper if I knew the meaning of the word at the top of the page, and a cross if I didn’t. Like me, you’ll probably find it hard to stop yourself jumping ahead to other entries if the first is unfamiliar. Don’t – that’s cheating, and invalidates the statistical sampling!
The result: there were 25 words whose meaning I didn’t know. On that basis, my passive vocabulary is 48,000 multiplied by 100/125. That’s around 40,000 words. It sounds high, but it includes all the possible extensions of the stem of each word. For example, take the word ‘abstract’. The dictionary will include ‘abstractedly’, ‘abstractedness’, and so on. The number of stem words I know is a lot less than 40,000.
Still, I’m feeling pretty good about myself, so I’m going to exercise my gigantic male vocabulary by introducing the next chapter:
‘The, er, next chapter is, er, fucking interesting…’
SPEAK GEEK
‘IT IS A TRUTH UNIVERSALLY ACKNOWLEDGED THAT A SINGLE MAN IN POSSESSION OF A GOOD FORTUNE MUST BE IN WANT OF A WIFE.’
Some authors are instantly recognisable from their vocabulary. For example, everyone recognises the style of Jane Austen, and many would say that her writing’s distinguishing feature is its abundance of long words. But is this true? A bit of statistical analysis can reveal the answer.
The four longest words used by Jane Austen in Pride and Prejudice have 16 or 17 characters. They are ‘superciliousness’, ‘communicativeness’, ‘disinterestedness’ and ‘misrepresentation’. But just looking at the longest words is not enough: we need to examine the distribution of word lengths over her entire vocabulary, as shown in the graph below:
For comparison, here is the ‘fingerprint’ of the writer Ian McEwan, showing that his vocabulary includes many shorter words:
And, what about this book? In this work I intend to speak with candour, and without misrepresentation or superciliousness, of the accomplishments of the irreproachable retrospections…