Читать книгу The Handbook of Speech Perception - Группа авторов - Страница 62

Neural processing of feedback

Оглавление

There is an extensive literature on the neural substrates supporting speech production (see Guenther, 2016, for a review). Much of this is based on mapping the speech‐production network using fMRI (Guenther, Ghosh, & Tourville, 2006). Our focus here is more narrow – how speech sounds produced by the talker are dealt with in the nervous system. The neural processing of self‐produced sound necessitates mechanisms that allow the differentiation between sound produced by oneself and sound produced by others. Two coexisting processes may play a role in this: (1) a perceptual suppression of external sound and voices, and (2) specialized processing of one’s own speech (Eliades & Wang, 2008). Cortical suppression has unique adaptive functions depending on the species. In nonhuman primates, for example, the ability to discern self‐vocalization from external sound serves to promote antiphonal calling whereby the animal must recognize their species‐specific call and respond by producing the same call (Miller & Wang, 2006). Takahashi, Fenley, and Ghazanfar (2016) have invoked the development of self‐monitoring and self‐recognition as essential in developing coordinated turn taking in marmoset monkeys.

Vocal production in nonhuman primates has long been associated with suppressed neural firing in the auditory cortex (Muller‐Preuss & Ploog, 1981), which occurs just prior to the onset of vocalization (Eliades & Wang, 2003). The same effect has been shown in humans, whereby vocalization led to suppression of one third of superior temporal gyrus neurons during open brain surgery (Creutzfeldt, Ojemann, & Lettich, 1989). This suppression preceded vocalization by approximately 100 ms and subsided about 1 second post‐vocalization (Creutzfeldt, Ojemann, & Lettich, 1989). In contrast, when another person spoke in the absence of self‐vocalization by the recorded individual, temporal gyrus activity was not suppressed. Therefore, it is postulated that the same cortical regions that suppress auditory stimuli are responsible for the production and control of speech. In terms of auditory feedback, studies using unanesthetized marmoset monkeys have confirmed that specific neuron populations in the auditory cortex are sensitive to vocal feedback whereas others are not (Eliades & Wang, 2008). Neurons that are suppressed during speech production show increased firing in response to altered feedback, and thus appear to be more sensitive to errors during speech production (Eliades & Wang, 2008). At the same time, a smaller proportion of neurons that are generally excited during production show reduced firing in response to altered feedback (Eliades & Wang, 2008). Although these neural response changes could in principle be due to changes in the vocal signal as a result of feedback perturbations, playing recordings of the vocalizations and altered vocalizations does not change the differential neuronal firing pattern in response to altering the sound (Eliades & Wang, 2008).

Muller‐Preuss and Ploog (1981) found that most neurons in the primary auditory cortex of unanesthetized squirrel monkeys that were excited in response to a playback of self‐vocalization were either weakened or completely inhibited during phonation. However, approximately half of superior temporal gyrus (primary auditory cortex) neurons do not demonstrate that distinction (Muller‐Preuss & Ploog, 1981). This ultimately reflects phonation‐dependent suppression in specific populations of auditory cortical neurons. Electrocorticography data in humans has also supported the idea that specific portions of the auditory cortex are supporting auditory feedback processing (Chang et al., 2013).

In a magnetoencephalography (MEG) study, Houde and colleagues (2002) investigated directly whether vocalization‐induced auditory cortex suppression resulted from a neurological comparison between an incoming signal (auditory feedback) and an internal “prediction” of that signal. They created a discrepancy, or “mismatch,” between the signal and expectation by altering the auditory feedback. Specifically, participants heard a sum of their speech and white noise that lasted the duration of their utterance. The authors found that, when feedback was altered using the gated noise (speech plus white noise), self‐produced speech no longer suppressed M100 amplitude in the auditory cortex. Suppression was observed during normal self‐produced speech. Therefore, these findings support a forward model whereby expected auditory feedback during talking produces cortical suppression of the auditory cortex.

In order to determine whether a forward model system truly regulates cortical suppression of the human auditory cortex during speech production, Heinks‐Maldonado and colleagues (2005) examined event‐related potentials (N100) during speech production. Like Houde et al. (2002), they found that the amplitude of N100 was reduced in response to unaltered vocalization relative to both pitch shifted and speech from a different voice. Furthermore, during passive listening, neither perturbation produced any N100 amplitude differences. This suggests that suppression of the auditory cortex is greatest when afferent sensory feedback matches an expected outcome specifically during speech production (Heinks‐Maldonado et al., 2005).

Functional magnetic resonance imaging (fMRI) studies have broadly concurred with the electrophysiological evidence. For example, Tourville, Reilly, and Guenther (2008) compared the blood‐oxygen‐level‐dependent (BOLD) response in trials when there was a first formant feedback shift to trials in which there was no modification of the auditory feedback. This comparison showed activation in posterior temporal regions consistent with previous findings in which noise masked the speech (Christoffels, Formisano, & Schiller, 2007), auditory feedback was delayed (Hashimoto & Sakai, 2003), vocal pitch feedback was altered (Zarate & Zatorre, 2008 and MEG studies of pitch shifts: Franken et al., 2018). Tourville, Reilly, and Guenther (2008) also reported greater activity in the shift–no shift comparison in the right‐hemisphere ventral premotor cortex for trials in which the first formant was shifted. They interpreted their findings as support of the DIVA model components that perform auditory‐error detection and compensatory motor responses.

These studies and many others support the existence of neural mechanisms that use the auditory character of the talker’s speech output to control articulation. However, the challenge of mapping high‐level computational models to behavioral and neural data remains. The necessity of different levels of description and the units within the levels are difficult to determine. In short, while there may be only a single neural architecture that manages fluent speech, many abstract cognitive models could be consistent with this architecture (see Griffiths, Lieder, & Goodman, 2015, for a discussion of constraints on cognitive modeling). An additional approach is to examine ontogeny for relationships between perception and production.

The Handbook of Speech Perception

Подняться наверх