Читать книгу Stress Variation in English - Alexander Tokar - Страница 17

3.2 Other resources and tools

Оглавление

The Medical Research Council Psycholinguistic Database (henceforth MRC or MRC database) (http://ota.ox.ac.uk/headers/1054.xml, 07.02.2016) contains “150,837 words and up 26 linguistic and psycholinguistic attributes for each” (WilsonWilson 1988: 6). Among these attributes are syllabic length and stress pattern, and what is particularly important, for some of the items in the database, phonetic transcriptions are given in which boundaries between syllables are explicitly marked by means of the syllable boundary marker (/). Thus, knowing which phonetic symbols are used in the MRC to represent the sounds of English, the researcher can easily establish, e.g., how many English words whose syllabic length is at least three do indeed exhibit segmental structures that are in accordance with the LatinLatin Stress Rule, i.e., a heavyheavy penult, whose rhymerhyme contains either a long vowel or a coda consonant, when stress is penultimate vs. a lightlight penult, which ends in a short vowel, when stress is antepenultimate. For example, for casino, in which stress is penultimate, the MRC gives the syllabified phonetic transcription k@/si/n@U, whereas for algebra, in which stress is antepenultimate, the transcription given is &l/dZI/br@. Since the symbol (i), which we find in the penult /si/ of casino, is used in the MRC to represent the long /iː/ of, e.g., bead, whereas the symbol (I), which we find in the penult /dZI/ of algebra, is used in the database to represent the short /ɪ/ of, e.g., bid, we are justified in claiming (proceeding from the assumption that the assignment of stress in English abides by the Latin Stress Rule) that in the trisyllabic word casino stress is regularly penultimate (i.e., the penultimate syllable is heavy) whereas in the trisyllabic algebra stress is regularly antepenultimate (i.e., the penultimate syllable is light). For a full list of phonetic symbols used in the MRC to represent English sounds, see the Web page http://websites.psychology.uwa.edu.au/school/MRCDatabase/uwa_mrc.htm (13.03.2016).

Alternatively, to study stress pattern–segmental structure correspondences, the researcher can also use Merriam-Webster Online (henceforth MWO) (http://www.merriam-webster.com/, 07.02.2016), which is based upon the 11th edition of Merriam-Webster’s Collegiate Dictionary (2003). Similar to the phonetic transcriptions in the MRC, some of the transcriptions in the MWO likewise contain a syllable boundary marker, (-), and just like in the MRC, the phonetic symbols used in the MWO to represent the sounds of the English language are (sometimes) not the corresponding, usually used symbols of the International Phonetic Alphabet. For example, for casino the MWO gives the transcription /kə-ˈsē-(ˌ)nō/, whereas for algebra the transcription given is /ˈal-jə-brə/ (for an explanation of the symbols used, see http://www.merriam-webster.com/pronsymbols.html, 13.03.2016).

Of particular importance is also the already mentioned Oxford Dictionaries/OD (http://www.oxforddictionaries.com/, 07.02.2016), which comprises 201,079 entries (as of 31.12.2015). Since “Oxford Dictionaries focuses on current language and practical usage” (boldface mine)—the OD is thus essentially the same thing as the OED without, however, obsolete words and archaic stress patterns—the use of these two lexicographic resources has nicely complemented each other. Thus, for instance, for the verb concentrate, the OED gives the transcriptions /ˈkɒnsəntreɪt/ and /kənˈsɛntreɪt/ and states that “[t]he first-mentioned pronunciation, now prevalent, is recent.” In the OD, by contrast, we find only the transcription /ˈkɒns(ə)ntreɪt/, which, coupled with the historical information provided in the OED, allows us to conclude that in the case of the stress patterns /ˈkɒnsəntreɪt/ and /kənˈsɛntreɪt/, we are dealing not with a synchronic stress variation but with a diachronic stress shift, i.e., the stress pattern /kənˈsɛntreɪt/ has been (recently) abandoned by English speakers in favor of the stress pattern /ˈkɒnsəntreɪt/.

To distinguish between actually occurring and archaic stress patterns, also Longman Dictionary of Contemporary English (http://www.ldoceonline.com/), consisting (as of 09.11.2016) of 69,132 entries, can be relied upon. A major advantage of LDOCE in comparison with the other lexicographic resources named above is that 1,236 of its phonetic transcriptions contain the stress shift symbol (◀). E.g., academic is /ˌækəˈdemɪk◀/ (LDOCE), which means that the two possible stress patterns of this adjective are /ˌækəˈdemɪk/ (with, however, the secondary-stressed syllable /ˌæ/ being phonetically not different from the primary-stressed syllable /ˈde/) and /ˈækədemɪk/, with the latter being used especially in combinations such as academic year or academic study, in which the head noun, modified by academic, is either a monosyllable or an initially-stressed polysyllable.

Note also that although transcriptions in LDOCE do not contain a boundary symbol, the dictionary does give orthographic hyphenationshyphenations, such as, e.g., a∙ble for able. Using these, the researcher can easily retrieve words exhibiting a particular syllabic length. Thus, for instance, if the boundary symbol (∙) occurs only once, the word under consideration is a disyllable whereas in a trisyllabic word the same symbol would occur two times, which applies to, e.g., Ko∙re∙a of Korea. The 26,441 hyphenations given in LDOCE for 26,334 orthographically non-identical solidly-spelled items fall into 12,461 (~47.13 %) disyllabic items (i.e., hyphenations in which the boundary symbol occurs one time), 8,334 (~31.52 %) trisyllables, 4,003 (~15.14 %) tetrasyllables, 1,351 (~5.11 %) pentasyllables, 251 (~0.95 %) hexasyllables, 34 (~0.13 %) heptasyllables, six (~0.02 %) octasyllables, and one (very long!) word of 18 syllables: Llan∙fair∙pwll∙gwyn∙gyll∙go∙ger∙y∙chwyrn∙dro∙bwll∙llan∙ty∙si∙lio∙go∙go∙goch, which is “a small village on Anglesey in North Wales, famous for being the place with the longest name in the UK” (LDOCE).

To find out how frequent a particular word is in contemporary English, the author used the British National Corpus/BNC. To be more precise, the author used 1) the now-defunct BNC Simple Search, which had been available at http://www.natcorp.ox.ac.uk/, 2) the BYU-BNC, which is currently available at http://corpus.byu.edu/bnc/, and 3) the BNC XML edition, which can be downloaded for free at http://ota.ox.ac.uk/desc/2554 (07.02.2016). Note also that 7,879 items in LDOCE are classified into high-, medium-, and lower-frequency words. Thus, for instance, of the 1,542 solidly-spelled high-frequency polysyllables, 55 (~3.57 %) have more than one stress pattern. E.g., the high-frequency word cigarette is /ˌsɪɡəˈret/ in British English (with the diminutivediminutive suffix -ette-ette—a cigarette is a small cigar—being emphasized via stress) and /ˈsɪɡəˌret/ vs. /ˌsɪɡəˈret/ in American English (LDOCE). By contrast, in the case of medium- and lower-frequency polysyllables, the corresponding percentages are ~5.44 % (111/2,042) and ~7.52 % (162/2,153). E.g., the medium-frequency word controversy is, according to LDOCE, in British English interchangeably stressed /ˈkɒntrəvɜːsi/ and /kənˈtrɒvəsi/ and the lower-frequency word barricade is both in British and American English interchangeably stressed /ˈbærəkeɪd/ and /ˌbærəˈkeɪd/ (LDOCE).

For high- vs. medium-frequency words, χ2 (1) = 6.948, p = 0.008; for medium- vs. lower-frequency words, χ2 (1) = 7.514, p = 0.006; for high- vs. lower-frequency words, χ2 (1) = 25, p < 0.000001. These differences are all statistically significant, which supports BergBerg’s (1999: 137) assertion that infrequency is a prerequisite of stress instability in English.

Note, however, that if cases such as /ˌækəˈdemɪk◀/ (i.e., transcriptions in LDOCE that contain the stress shift symbol) are not counted as instances of stress variation, the corresponding χ2-statisticsχ2-statistics are as follows. For high- vs. medium-frequency words (22/1,542 vs. 29/2,042), χ2 (1) = 0.0003, p = 0.987; for medium- vs. lower-frequency words (29/2,042 vs. 48/2,153), χ2 (1) = 3.809, p = 0.051; for high- vs. lower-frequency words (22/1,542 vs. 48/2,153), χ2 (1) = 3.115, p = 0.078. These differences are not statistically significant.

The point here is that the average syllabic length of a high-frequency polysyllabic English word is ~2.47, but the average syllabic lengths of medium- and lower-frequency polysyllables are ~2.76 and ~2.94 respectively. High-frequency polysyllables thus contain on average fewer syllables than medium-frequency polysyllables, which in turn are as a rule shorter than lower-frequency polysyllables. (This is because derived forms are as a rule used less frequently than their base words (PlagPlag 2003: 111). E.g., the base government is, according to LDOCE, a high-frequency word, but the derivative governmental is a lower-frequency one. Derived forms have on average more syllables than base forms/morphologically simple words.) Accordingly, because medium- and lower-frequency words are on average longer than high-frequency words, the former have more chances of being stressed not only primary but also secondary. Of the 1,542 high-frequency polysyllables, only 102 (~6.61 %) are words such as cigarette, which have both primary and secondary stress. By contrast, in the case of the 4,195 medium- and lower-frequency polysyllables, the number of secondary-stressed words is 601: ~14.33 %. This difference is statistically hugely significant: χ2 (1) = 62, p < 0.000001.

To conclude, because secondary stress is more typical of medium- and lower-frequency words, stress shifts such as /ˌɡʌvəˈmentl◀/ (LDOCE), which involves the promotion of the secondary-stressed syllable /ˌɡʌ/ to the primary-stressed syllable (i.e., /ˈɡʌvəmentl/), are also more typical of medium- and lower-frequency words. The connection between stress variation and frequency of usefrequency of use is thus a very indirect one!

For searches involving regular expressionsregular expressions, the software TextCrawler, Version 2.5.0.0 (DigitalVolcano Software 2013), and GNU grep, Version 2.5.4 (Free Software Foundation, Inc.: 2009) were used. For simpler searches (e.g., finding and eliminating duplicates, extracting strings beginning with a particular symbol), Microsoft Excel 2007 was relied upon.

Regular expression or simply regex is essentially an advanced search option that is available in many computer programs, such as the above mentioned TextCrawler. One of the most powerful regular expressionsregular expressions is (|), which in combination with the parentheses matches alternatives, e.g., the search query (a|e|i|o|u|y) matches any of the vowel symbols used in the English alphabet, whereas the search query (b|c|d|f|g|h|j|k|l|m|n|p|q|r|s|t|v|w|x|z) will, by contrast, match any consonantal symbol. Another important regex is the curly braces {}, using which the researcher can indicate the number of occurrences of the string to be matched. For example, (b|c|d|f|g|h|j|k|l|m|n|p|q|r|s|t|v|w|x|z)(a|e|i|o|u|y){2} will match a consonant being followed by two vowels (e.g., the string bea-bea- of bead), whereas (b|c|d|f|g|h|j|k|l|m|n|p|q|r|s|t|v|w|x|z)(a|e|i|o|u|y){1,} will, apart from matching bea- of bead, also match bi-bi- of bid, where there is only one vowel occurring immediately after a consonant.

Using these two simple regular expressionsregular expressions, it was established by the author that items such as /kəˈsiːnəʊ/ of casino (OD), in which stress is regularly penultimate (i.e., the penultimate syllable /ˈsiː/ contains a long vowel), and items such as /ˈaldʒɪbrə/ of algebra, in which stress is regularly antepenultimate (i.e., the penultimate syllable /dʒɪ/ ends in a short vowel), constitute ~67.15 % of the total number of the 19,545 items of three and more syllables in the case of which stress patterns and syllabified phonetic transcriptions are available in the MRC. Similarly, in the MWO dictionary, of the total number of the 22,653 antepenults bearing stress, 17,032 (~75.19 %) co-occur with lightlight penults (i.e., cases such as /ˈaldʒɪbrə/ of algebra), whereas in the case of the 14,961 stressed penults occurring in words of three and more syllables, only 4,110 (~27.47 %) can be referred to as light syllables; e.g., in /prə(ʊ)ˈhɪbɪt/ of prohibit, stress falls upon the light penult /ˈhɪ/. English words of three and more syllables are thus by and large (segmentally) compliant with the provisions of the LatinLatin Stress Rule: Penultimately-stressed trisyllables and longer words have as a rule heavyheavy penults, whereas light penults normally occur when stress is antepenultimate. The question that remains to be answered is, however, whether the stress patterns that these words exhibit are indeed due to their segmental structures. Thus, it was pointed out in Chapter 1 that of the non-initially-stressed words in the OD dictionary, 67 % have segmentally longer righthand strings such as /‑ˈhɪbɪt/ of prohibit, which occur in at least one other English word (inhibit). The location of stress in these words can thus also be seen as their root–prefix boundaryroot–prefix boundary location.

To automatically identify words exhibiting a particular morphological structure (e.g., words beginning with a prefix/ending in a suffix), the online tool Morphological Analysis (https://open.xerox.com/Services/fst-nlp-tools/Consume/Morphological%20Analysis-176, 07.02.2016) was used. E.g., in the case of the derivative abbreviator the analysis returned by the tool is <abbreviate>or}+Noun+Sg, i.e., the word is correctly analyzed as the product of suffixation of the verbal base abbreviate by means of the suffix -or-or. Morphological Analysis even copes with derivatives in which the addition of a suffix is accompanied by segmental changes. For example, the adjective corrosive is correctly analyzed as the product of suffixation of the verbal base corrode by means of the suffix -ive-ive, i.e., <corrode>ive}+Adj; similarly, for secrecy the segmentation returned is <secret>cy}+Noun+Sg.

Relying upon base–suffix segmentations similar to these, we can automatically count, e.g., the percentage of suffixed derivatives in English whose stress pattern is the stress pattern of the corresponding base form (counting from left to right, i.e., from the beginning of the word). In the case of, e.g., 14,222 solidly-spelled items from the OD that the tool Morphological Analysis considers to be suffixed derivatives (i.e., for these items, the tool has returned segmentations that contain the suffix symbol (})), this is true of no less than 9,833 (~69.14 %) derivatives. These include the following cases (which have been identified by the author with the help of the software Excel): 1) both the derivative and the base are stressed initially (e.g., /ˈabəsi/ of abbacy and /ˈabət/ of abbot), 2) the derivative is stressed initially and the base is a monosyllable (e.g., /ˈfaktʃʊəl/ of factual and /fakt/ of fact), 3) in the transcriptions under comparison, the primary stress symbol (ˈ) is followed by three identical symbols (e.g., /ˈɒlə/ of /bʌɪˈɒlədʒɪst/ of biologist and /ˈɒlə/ of /bʌɪˈɒlədʒi/ of biology), which helps us find identically-stressed derivatives and bases in which stress is non-initial; additionally, with the help of this strategy, we do not miss cases such as /bʌɪˈaksɪəl/ of biaxial vs. /ˈaksɪs/ of axis, with the former being both a suffixed and prefixed derivative (the more identical symbols in the transcriptions under comparison occur after the primary stress symbol, the more likely it is that we are dealing with a genuine case of stress preservation. E.g., /ˌrɛfəˈrɛnʃ(ə)l/ of referential and /ˈrɛf(ə)r(ə)nt/ of referent share the string /ˈrɛ/, which comprises two symbols occurring after the primary stress symbol. The strings under comparison do not, however, count as identical if the threshold is raised to three identical symbols (following the primary stress symbol): /ˈrɛn/ vs. /ˈrɛf/; the case of referential vs. referent is therefore not treated in the same way as, e.g., the pair biologistbiology), 4) the location of the primary stress symbol (relative to the left transcription boundary) is identical in the transcriptions under comparison (e.g., in the transcriptions /prʌɪˈɒrətʌɪz/ of prioritize and /prʌɪˈɒrɪti/ of priority the primary stress symbol is not followed by three identical symbols (i.e., /ˈɒrə/ vs. /ˈɒrɪ/), but simply because the position in which this symbol occurs is the same in the transcriptions under comparison (i.e., in both /prʌɪˈɒrətʌɪz/ and /prʌɪˈɒrɪti/ the stress symbol (ˈ) is the sixth symbol relative to the lefthand (/)), we have good reasons to assume that stress in the derivative prioritize falls upon the same syllable as in the base priority, and 5) if the position of the primary stress symbol in the base is subtracted from the position of the primary stress symbol in the derivative, the result is either one or minus one (e.g., in /vəːˈbəʊs/ of verbose the symbol (ˈ) occupies the fifth position (relative to the left transcription boundary) whereas in /vəˈbɒsɪti/ of verbosity the corresponding number is four).

An obvious alternative to these strategies (especially 3–5) is to count the LevenshteinLevenshtein distance (Levenshtein 1966), i.e., the number of edits required to transform one string of symbols to another string. E.g., the Levenshtein distance between /ˈban/ of /əˈbandənm(ə)nt/ and /ˈban/ of /əˈband(ə)n/ is zero (because these strings are identical), but the Levenshtein distance between /ˈtəː/ of /dɪˈtəːmɪn/ and /ˈtɜː/ of /dɪˈtɜːmɪnəbl/ is one (because the nucleusnucleus of the stressed syllable is, according to the OD, non-identical in the transcriptions under comparison, even though the stress pattern is the same (note that phonetically, /ˈtəː/ of /dɪˈtəːmɪn/ might be very similar to /ˈtɜː/ of /dɪˈtɜːmɪnəbl/, but the only thing that matters now is that the symbol /ə/ of /ˈtəː/ is not identical to the symbol /ɜ/ of /ˈtɜː/); to obtain the stressed syllable /ˈtɜː/ of /dɪˈtɜːmɪnəbl/ out of the stressed syllable /ˈtəː/ of /dɪˈtəːmɪn/, we should thus replace /ə/ through /ɜ/). If the Levenshtein distance between the strings under comparison (which should include the primary stress symbol being followed by at least three further symbols) is either zero or one (i.e., no or only one edit is required to obtain the primary-stressed syllable of the derivative out of the primary-stressed syllable of the base), the case under consideration is most likely an instance of stress preservation.

Using these strategies, we can also establish whether a derivative, in addition to preserving the stress, also preserves the segmental structure of the base. Thus, for instance, because the LevenshteinLevenshtein distance between the strings /ˈbɒs/ of /vəˈbɒsɪti/ and /ˈbəʊ/ of /vəːˈbəʊs/ is two (i.e., the symbols /ə/ and /ʊ/ should be replaced through the symbols /ɒ/ and /s/) and because the location of the primary stress symbol relative to the left transcription boundary is not identical in the transcriptions under comparison, the pronunciation of the base form verbose can be said to be only partially preserved in the derived form verbosity. The transcription /ˈadm(ə)r(ə)lti/ of the derivative admiralty can, by contrast, be regarded as an instance of “agglutinative stress,” which means that “suffixes are simply hooked on, glued on, or ‘agglutinated’ to a word without influencing its structure” (PoldaufPoldauf 1984: 50–51). Of the 14,222 suffixed derivatives in the OD, no less than 7,486 (~52.64 %) can be regarded as instances of agglutinative suffixationagglutinative suffixation, which means that the transcriptions of these words’ base forms are part of the transcriptions of the corresponding derived forms; the usual case is when the transcription of a base form occurs in the beginning of the transcription of a corresponding derived form. E.g., the transcription /ˈadm(ə)r(ə)l/ of the base admiral occurs (in the very same form) in the beginning of the transcription /ˈadm(ə)r(ə)lti/ of the derivative admiralty. In the case of prefixed suffixed derivatives, however, the transcription of the base form occurs in the middle of the transcription of the corresponding derivative, which is true of, e.g., /ˈsteɪtsmən/ of /ʌnˈsteɪtsmənlʌɪk/.

In addition to attributing, e.g., the stress pattern /prʌɪˈɒrətʌɪz/ of the derivative prioritize to the stress pattern /prʌɪˈɒrɪti/ of the base priority, it can be argued that stress in the derivative is antepenultimate simply because the penultimate syllable /rə/ is lightlight. Likewise, in the case of /ˈabəsi/ of abbacy, /bʌɪˈɒlədʒɪst/ of biologist, and /vəˈbɒsɪti/ of verbosity, the LatinLatin Stress Rule also requires antepenultimate stress because the corresponding penultimate syllables /bə/, /lə/, and /sɪ/ are light (and, from a purely diachronic point of view, stress in the English word verbosity is antepenultimate because it is antepenultimate in the Latin etymon word verbōsitās (Dictionary.com), which has a short vowel in the penult.) Indeed, of all items in the MRC database, 12,866 are, according to the tool Morphological Analysis, suffixed derivatives, i.e., for these items, the tool has returned segmentations that contain the suffix symbol (}). Stress indications and syllabified phonetic transcriptions are available in the MRC for 4,990 of these items whose syllabic length is no less than three. Of these 4,990 items, 1,329 have regular penultimate stress (e.g., /əˈbjuːzə/ of abuser (OD), where stress falls on the heavyheavy penult /ˈbjuː/) and in 1,909 items, stress is regularly antepenultimate (e.g., /ˈaktɪvɪst/ of activist (OD), where the penult /tɪ/ is light). In other words, of the 4,990 suffixed derivatives in English whose syllabic length is at least three, 3,238 (~64.89 %) have stress patterns that are superficially in accordance with the Latin Stress Rule: Stress is penultimate when the penult is heavy and antepenultimate when the penult is light. Notice, however, that abidance by the Latin Stress Rule is more typical of trisyllabic suffixed derivatives: Of the 2,439 trisyllabic suffixed derivatives for which syllabified phonetic transcriptions and stress indications are given in the MRC, 1,081 are items such as /əˈbjuːzə/ of abuser, in which stress is regularly penultimate (i.e., the penultimate syllable is heavy), and 789 are items such as /ˈaktɪvɪst/ of activist, in which stress is regularly antepenultimate (i.e., the penultimate syllable is light). Thus, in the case of the 1,870 trisyllabic suffixed derivatives (~76.67 %), the location of stress can be said to be in compliance with the provisions of the Latin Stress Rule. By contrast, in the case of the 2,551 suffixed derivatives whose syllabic length is no less than four and for which syllabified phonetic transcriptions and stress indications are available in the MRC, the same is true of only 1,368 (~53.63 %) items: 1,120 are derivatives such as /əˈbɪlɪti/ of ability (OD), in which stress is regularly antepenultimate (i.e., the penultimate syllable /lɪ/ is light), and 248 are derivatives such as /ˌanɪkˈdəʊtl/ of anecdotal (OD), in which stress is regularly penultimate (i.e., the penultimate syllable /ˈdəʊ/ is heavy). The difference of 1,368/2,551 vs. 1,870/2,439 is statistically hugely significant: χ2 (1) = 291, p < 0.000001.

As for disyllables, the segmentations returned by Morphological Analysis allow us to conclude that the location of stress in a disyllabic English word (among other things) crucially depends upon whether it begins with a prefix vs. ends in a suffix. As for prefixed derivatives (i.e., those for which the tool Morphological Analysis has returned segmentations that contain the prefix symbol ({); e.g., {a<live>+Adj, for alive), the proportions in the OD dictionary among the 23,147 initially- vs. 4,585 finally-stressed disyllables are (55/23,147=)~0.24 % vs. (538/4,585=)~11.73, χ2 (1) = 2,417, p < 0.000001, which counts as a statistically hugely significant difference. As for suffixed derivatives, the corresponding proportions are (2,800/23,147=)~12.1 % vs. (321/4,585=)~7 %, χ2 (1) = 99, p < 0.000001, which also counts as an extremely significant difference. Final stress is thus the preferred stress pattern of disyllables beginning with a prefix whereas disyllables ending in a suffix are, by contrast, more frequently pronounced in English with initial stress. (The latter claim is also supported by the fact that of the 1,918 disyllables in the MRC that the tool Morphological Analysis considers to be suffixed derivatives, 1,797 (~93.69 %) are stressed on their first syllables. E.g., lover is stressed /ˈlʌvə/ (OD). Final stress in disyllabic suffixed derivatives is thus virtually non-existent in the English language. It can be found in, e.g., /əˈmʌŋst/ of amongst (OD), whose base is the finally-stressed /əˈmʌŋ/ of among (OD), to which the suffix -st-st, which on its own does not constitute a syllable, was added. Additionally, because some English suffixes bring about important semantic distinctions (e.g., lessorlessee, standing for granters vs. holders of a lease), some English disyllables have final stress as an emphaticemphatic alternative to initial stress. In all other cases when the base of a disyllabic suffixed derivative is monosyllabic, stress in the derived form is always placed upon the only syllable constituting the base form, i.e., e.g., stress in lover falls upon the syllable constituting its monosyllabic base love.)

Interestingly, initial stress is in English the preferred stress pattern not only of actual suffixed derivatives but also of disyllables such as, e.g., music, which is not segmentable into the base muse and the suffix -ic-ic the way, e.g., cubic is segmentable into the base cube and the suffix -ic-ic. Thus, of the 23,147 initially-stressed disyllables in the OD, 259 (~1.12 %) end orthographically in -ic-ic, and, what is particularly important, of these, only 12 (~4.63 %) are, according to the tool Morphological Analysis, actual -ic-ic-derivatives: calcic, centric, cubic, cyclic, cystic, fistic, metric, mythic, rhythmic, scenic, spheric, and splenic. By contrast, of the 4,585 finally-stressed disyllables, only five (~0.11 %) end orthographically in -ic-ic. E.g., the shorter word ridic is stressed /rɪˈdɪk/ (OD), preserving the stress of the longer word ridiculous: /rɪˈdɪkjʊləs/ (OD). Since the difference of 259/23,147 vs. five/4,585 is statistically hugely significant— χ2 (1) = 41, p < 0.000001—we are justified in claiming that initial stress is the preferred stress of an -ic-ic-disyllable irrespective of whether it is a genuine -ic-ic-derivative like cubic or a word like music (comic, magic, panic, topic, etc.). The same is true of the orthographic strings -a-a, -al-al, -an-an, -and-and, -ard-ard, -cy-cy, -dom-dom, -er-er, -ess-ess, -et-et, -ful-ful, -ia-ia, -ian-ian, -ie-ie, -ing-ing, -ion-ion, -is-is, -ish-ish, -ite-ite, -less-less, -let-let, -ling-ling, -ly-ly, -ness-ness, -o-o, -oid-oid, -ous-ous, -ry-ry, -ship-ship, -some-some, -ty-ty, -ule-ule, -um-um, -ward-ward, -way-way, and -y-y, which, according to the tool Morphological Analysis, also occur in English as suffixes. The medianmedian χ2-statistic of these strings is 18.71 (with the minimum being ~3.961 (-ule) and the maximum ~296 (-er)), which means that the median p-value is ~0.000015. Notice also that the total number of initially-stressed disyllables that end orthographically in these 36 strings is 10,748 and the total number of initially-stressed disyllabic suffixed derivatives (according to the tool Morphological Analysis) is only 2,800. Thus, disyllables ending in these strings (e.g., any, finish, medal, river, topic, etc.) are (as a rule) stressed initially even when these strings are not actual suffixes.

As for prefixed derivatives, notice also that with the help of the tool Morphological Analysis we can find emphatically-stressed English words. An example is subcategory, whose morphological structure is correctly analyzed by the tool as {sub<category&gt. The stress pattern of this prefixed derivative is, according to the OD, /ˈsʌbkatɪɡ(ə)ri/, which is emphaticemphatic stress falling upon the semantically important prefix sub-sub-, i.e., a subcategory is a “secondary or subordinate category” (OD). Another similar case is the stress pattern /ˈkaʊntəprəˌpəʊz(ə)l/ of counterproposal (OD). Since also the counter-words countercheck, counterclaim, counterdemonstration, countermarch, countermove, counteroffensive, counterplot, counterpunch, counterspy, and countertenor are, according to the OD, all pronounced with primary stress falling upon the prefix counter-counter-, we argue that the meaning inherent in this prefix, “movement or effect in the opposite direction” (OD), is emphasized in English via stress. Another semantic structure capable of giving rise to emphatic prefix stress is “considerably more/less than the base.” E.g., a megaton, which corresponds to one million tons, is stressed /ˈmɛɡətʌn/ (OD), and, similarly, a millisecond, which is one thousandth of a second, is stressed /ˈmɪlɪsɛk(ə)nd/ (OD). Similarly, the fact that archangel is, according to the OD, more frequently stressed /ˈɑːkeɪndʒ(ə)l/ than /ɑːkˈeɪndʒ(ə)l/ is neatly attributable to the fact that an archangel is “[a]n angel of greater than ordinary rank” (OD).

Unfortunately, the tool Morphological Analysis, which copes well with identifying suffixed and prefixed derivatives, returns identical analyses for, e.g., the morphologically simple algebra (<algebra>+Noun+Sg) and the compound battlefield (<battlefield>+Noun+Sg). Given this fact, compounds were, when necessary, identified in a different manner. Using the formulae =LEFT, =RIGHT, and =LENGTH in Microsoft Excel 2007, an orthographic string constituting a word of the English language was cut into all possible substrings; e.g., battlefield was cut into the substrings b and attlefield, ba and ttlefield, bat and tlefield, batt and lefield, battl and efield, battle and field, battlef and ield, battlefi and eld, battlefie and ld, and battlefiel and d. Using the software grep, these resultant combinations were then searched for in the BNC. Since compounds in English are often spelled interchangeably (e.g., English Word Stress (PoldaufPoldauf 1984) and English Word-Stress (FudgeFudge 1984) are the titles of two different monographs on English word stress, which were both published in 1984), a solidly-spelled compound can be expected to have a separately-spelled alternative. Since of the possible segmentations of the orthographic string battlefield, only battle field makes sense from a semantic point of view, it can be expected to occur in a representative corpus of the English language, such as the BNC: Despite her many affairs – one of her immortal duties being to entertain heroes slain on the battle field – she loved Odin dearly and wept tears of gold when she lost him.

This approach produces more precise results than a simple orthographic analysis. Thus, for instance, also the word question happens to be orthographically segmentable into the strings quest and ion, which both exist in English as nouns. However, in contrast to the combination battle field of battlefield, the combination quest ion of question does not make sense from a semantic point of view and thus does not occur in the BNC. The word question is therefore not wrongly analyzed as a compound (but note that, e.g., the derived adjective notable, which is also not a compound, happens to be orthographically segmentable into the strings not and able, which occur in the BNC in the form not able 451 times. This method is thus not entirely unproblematic either).

Naturally, to quickly find out whether the word under consideration is a compound, a lexicographer’s intuition can also be taken into consideration. For example, in Dictionary.com (which as of 07.02.2017 consisted of 237,057 entries), a compound analysis, such as, e.g., blue + print for blueprint, is given for 5,707 solidly-spelled English words.

Thus, with the help of these two strategies (i.e., searches for separate occurrences in the BNC and compound analyses given in Dictionary.com), it was established by the author that of the 23,147 initially-stressed disyllables in the OD, no less than 2,477 (~10.7 %) are compounds. E.g., the disyllabic compound airline is stressed /ˈɛːlʌɪn/ (OD). By contrast, in the case of the 4,585 finally-stressed disyllables, this is true of only 145 (~3.16 %) words. E.g., the disyllabic compound backstage is stressed /bakˈsteɪdʒ/ (OD). Since the difference of 2,477/23,147 vs. 145/4,585 is statistically very significant—χ2 (1) = 254, p < 0.000001—we are justified in claiming that compounds in English are indeed, as it is argued by many authors, more frequently left- than right-prominent (e.g., Bell & PlagBell & Plag 2013: 130, reporting that “[i]n running speech, about one third of NN compounds […] are stressed on N2, while two thirds are stressed on N1 […]”).

Finally, observe that the 6,227 monosyllables in the OD contain on average ~4.39 orthographic symbols. Given this number, it comes as no surprise that of the 6,198 initially-stressed disyllables (in the same dictionary) that contain eight and more orthographic symbols, 1,724 (~27.82 %) count morphologically as compounds (in accordance with the strategies described above). By contrast, of the 16,949 initially-stressed disyllables that contain no more than seven orthographic symbols, only 753 (~4.44 %) are compounds. The difference of 1,724/6,198 vs. 753/16,949 is statistically hugely significant—χ2 (1) = 2,594, p < 0.000001—which means that disyllabic compounds should be mainly sought among disyllables that contain at least eight orthographic symbols. (A related finding is that disyllables containing eight and more orthographic symbols are more frequently stressed initially than finally: 6,198/23,147(=~26.78 %) vs. 999/4,585(=~21.79 %), χ2 (1) = 50, p < 0.000001. Because 1) disyllabic compounds are made up of two monosyllables, which contain on average ~4.39 orthographic symbols, and 2) compounds in English are more frequently left- than right-prominent, words that contain eight and more orthographic symbols are more frequently stressed on their first syllables.)

To establish whether a difference between words exhibiting different stress patterns counts as statistically significant, PearsonPearson’s (1900) chi-squared testchi-squared test (χ2) was performed in the majority of the situations requiring such a comparison. Calculations were performed with the help of the software Minitab, Version 15.1.30.0 (Minitab Inc 2007) and Microsoft Excel 2007. Statistical significance is (regarded as) achieved when p < 0.05, but notice that the lower the p-value, the more significant the difference under consideration. Thus, especially in connection with the result p < 0.000001, which means that the χ-statistic should exceed the critical value of 23.92812699, this monograph will sometimes use the modifying adverb hugely (as well as highly/extremely/very), i.e., e.g., This difference is statistically hugely significant.

Stress Variation in English

Подняться наверх