Читать книгу The Handbook of Speech Perception - Группа авторов - Страница 47

Auditory phonetic representations in the superior temporal gyrus

Оглавление

ECoG, which involves the placement of electrodes directly onto the surface of the brain, cannot easily record from the primary auditory cortex. This is because the PAC is tucked away inside the Sylvian fissure, along the dorsal aspect of the temporal lobe. At the same time, because ECoG measures the summed postsynaptic electrical current of neurons with millisecond resolution, it is sensitive to rapid neural responses at the timescale of individual syllables, or even individual phones. By contrast, fMRI measures hemodynamic responses; these are changes in blood flow that are related to neural activity but occur on the order of seconds. In recent years, the use of ECoG has revolutionized the study of speech in auditory neuroscience. An exemplar of this can be found in a recent paper (Mesgarani et al., 2014).

Mesgarani et al. (2014) used ECoG to learn about the linguistic‐phonetic representation of auditory speech processing in the STG of six epileptic patients. These patients listened passively to spoken sentences taken from the TIMIT corpus (Garofolo et al., 1993), while ECoG was recorded from their brains. These ECoG recordings were then analyzed to discover patterns in the neural responses to individual speech sounds (for a summary of the experimental setup, see Figure 3.7, panels A–C). The authors used a phonemic analysis of the TIMIT dataset to group the neural responses at each electrode, according to the phoneme that caused it. For examples, see panel D of Figure 3.7, which allows the comparison of responses to different speech sounds for a number of different sample electrodes labeled e1 to e5. The key observation here is that an electrode such as e1 gives similar responses for /d/ and /b/ but not for /d/ and /s/, and that the responses at each of the electrodes shown will respond strongly for some groups of speech sounds but not others. Given these data, we can ask the question: Do STG neurons group, or classify, speech segments through the similarity of their response patterns? And, if so, which classification scheme do they use?

Linguists and phoneticians often analyze individual speech sounds into feature classes, based, for example, on either the manner or the place of articulation that is characteristic for that speech act. Thus, /d/, /b/, and /t/ are all members of the plosive manner‐of‐articulation class because they are produced by an obstruction followed by a sudden release of air through the vocal tract, and /s/ and /f/ belong to the fricative class because both are generated by turbulent air hissing through a tight constriction in the vocal tract. At the same time, /d/ and /s/ also belong to the alveolar place‐of‐articulation class because, for both phonemes, the tip of the tongue is brought up toward the alveolar ridge just behind the top row of the teeth. In contrast, /b/ has a labial place of articulation because to articulate /b/ the airflow is constricted at the lips. Manner features are often associated with particular acoustic characteristics. Plosives involve characteristically brief intervals of silence followed by a short noise burst, while fricatives exhibit sustained aperiodic noise spread over a wide part of the spectrum. Classifying speech sounds by place and manner of articulation is certainly very popular among speech scientists, and is also implied in the structure of the International Phonetic Alphabet (IPA), but it is by no means the only possible scheme. Speech sounds can also be described and classified according to alternative acoustic properties or perceptual features, such as loudness and pitch. An example feature that is harder to characterize in articulatory or acoustic terms is sonority. Sonority defines a scale of perceived loudness (Clements, 1990) such that vowels are the most sonorous, and glides are the next most sonorous, followed by then liquids, nasals, and finally obstruents (i.e. fricatives and plosives). Despite the idea of sonority as a multitiered scale, phonemes are sometimes lumped into two groups of sonorant and nonsonorant, with everything but the obstruents counting as sonorants.


Figure 3.7 Feature‐based representations in the human STG. (a) shows left‐hemisphere cortex with black dots indicating ECoG electrodes. (b) shows an example acoustic stimulus (and what eyes they were), including orthography, waveform, spectrogram, and IPA transcription. (c) shows time‐aligned neural responses to the acoustic stimulus. The electrodes (y‐axis) are sorted spatially (anterior to posterior), with time (in seconds) along the x‐axis. (d) shows sample phoneme responses by electrode. For five electrodes (e1 to e5), the plots show cortical selectivity for English phonemes (y‐axis) as a function of time (x‐axis), with phoneme onsets indicated by vertical dashed lines. The phoneme selectivity index (PSI) is a summary over time of how selective the cortical response is for each phoneme. (e) shows phoneme responses (PSIs) for all electrodes, arranged for hierarchical clustering analyses. (f) and (g) show clustering analyses by phoneme and by electrode. These show how phonemes and electrodes are grouped, respectively, with reference to phonetic features. For example, (f) shows that electrodes can be grouped by selectivity to obstruents and sonorants.

Source: Mesgarani et al., © 2014, The American Association for the Advancement of Science.

As these examples illustrate, there could in principle be many different ways in which speech sounds are grouped. To ask which grouping is “natural” or “native” for the STG, Mesgarani et al. (2014) used hierarchical clustering of neural responses to speech, examples of which can be seen in the ECoG recordings depicted in Figure 3.7, panel D. The results of the clustering analysis follow in Figure 3.7, panels E–G. Perhaps surprisingly, Mesgarani et al. (2014) discovered that the STG was organized primarily by manner‐of‐articulation features and secondarily by place‐of‐articulation features. The prominence of manner‐of‐articulation features can be seen by clustering the phonemes directly (Figure 3.7, panel F). For example, on the right‐side dendrogram we find neat clusters of plosives /d b g p k t/, fricatives /ʃ z s f θ/, and nasals /m n ŋ/. Manner‐of‐articulation features also stand out when the electrodes are clustered (Figure 3.7, panel G). By going up a column from the bottom dendrogram, we can find the darkest cells (those with the greatest selectivity for phonemes), and then follow these rows to the left to identify the phonemes for which the electrode signal was strongest. The electrode indexed by the leftmost column, for example, recorded neural activity that appeared selective for the plosives /d b g p k t/. In this way, we may also find electrodes that respond to both manner and place of articulation features. For example, the fifth column from the left responds to the bilabial plosives /b p/. Thus, the types of features that phoneticians have for a long time employed for classifying speech sounds turn out to be reflected in the neural patterns across the STG. Mesgarani et al. (2014) argue that this pattern of organization, prioritizing manner over place‐of‐articulation features, is most consistent with auditory‐perceptual theories of feature hierarchies (Stevens, 2002; Clements, 1985). Auditory‐perceptual theories contrast, for instance, with articulatory or gestural theories, which Mesgarani et al. (2014) assert would have prioritized place‐of‐articulation features (Fowler, 1986).

The clustering analyses in Figure 3.7 (panels F and G) are doubly rich: they at the same time support a broadly auditory‐perceptual view of sound representations in the STG while also revealing limitations of that view. For instance, on the right‐side cluster (panel F), we find that the phonemes /v/ and /ð/ do not cluster with the other fricatives /ʃ z s f θ/. Instead the fricatives /v/ and /ð/ cluster with the sonorants. Moreover, /v/ and /ð/ are most closely clustered in a group of high front and central vowels and glides /j ɪ i ʉ/. This odd grouping may reflect noise at some level of the experiment or analysis, but it raises the intriguing possibility that the STG actually groups /j ɪ i ʉ v ð/ together, and thus does not strictly follow established phonetic conventions. Therefore, in addition to articulatory, acoustic, and auditory phonetics, studies such as this on the cortical response to speech may pave the way to innovative neural feature analyses. However, we would like to emphasize that these are early results in the field. The use of discrete segmental phonemes may, for example, be considered a useful first approximation to analyses using more complex, overlapping feature representations.

The Handbook of Speech Perception

Подняться наверх