Читать книгу The Handbook of Speech Perception - Группа авторов - Страница 60

Real‐time manipulations of auditory feedback

Оглавление

Separate from clinical evidence, behavioral studies of auditory feedback in speech have been carried out for more than a century. In 1911 the otolaryngologist Étienne Lombard published “Le signe de l’élévation de la voix” (“The symptom of the raised voice”; Lombard, 1911), in which he noted a patient’s tendency to speak more loudly when a loud noise was transmitted to one ear. This became the first published evidence for a feedback mechanism by which real‐time speech perception could influence speech production (Brumm & Zollinger, 2011) and, more than 100 years later, the Lombard effect remains the most persistent and robust feedback phenomenon within psycholinguistic speech production research.

A notable feature of real‐time speech corrections is that they appear to be largely involuntary and often occur without awareness. In one study, speakers who wore headphones persisted in raising their volume when loud noises were played, even when informed by an interviewer that they were doing so (Mahl, 1972). While learned inhibition of the Lombard effect in humans is possible (Pick et al., 1989), it remains persistent in spontaneous speech and has been observed in young children (Siegel et al., 1976) as well as Old World monkeys (Sinnott, Stebbins, & Moody, 1975), whales (Parks et al., 2011), and a multitude of songbird species (see Cynx et al., 1998; Kobayasi & Okanoya, 2003; Leonard & Horn, 2005).

Other types of speech feedback distortion also show compensatory responses. In a common paradigm, speakers’ formants (resonances of the vocal tract) are adjusted away from what is actually being produced – for example, a speaker might produce the vowel /ε/, and hear themselves say the vowel /æ/. In response, the speaker may compensate by shifting their own production in the opposite direction in frequency. In this example, in compensation they might produce a vowel closer to /I/ (Houde & Jordan, 1998; Purcell & Munhall, 2006). Interestingly, such compensation is often incomplete, such that the relative magnitude of the response is less than the magnitude of the perturbation and individual variability is considerable (MacDonald, Purcell, & Munhall, 2011). In Figure 4.1, perturbations of the first formant (F1) and average compensations (dots) are shown (MacDonald, Goldberg, & Munhall, 2010). Three perturbations to F1 are introduced in steps over a series of trials. The dots show that, on average, subjects responded in a manner that counteracted the perturbation. However, as can be seen, even for the smallest perturbation of 50 Hz, the compensation is incomplete. Subjects make changes less than this even though they are capable of making a compensation large enough to correct this error as evidenced by their response to the 200 Hz perturbation at the end of the series.

Vocal‐pitch perturbations produce the same pattern of partial compensation and individual variability. When the fundamental frequency (F0) is raised or lowered, talkers tend to compensate by producing speech with F0 shifts in the opposite direction in frequency to the perturbation (Burnett et al., 1998; Jones & Keough, 2008). Such pitch compensations can be reduced but not eliminated with specific instruction in conjunction with intensive training (Zarate & Zatorre, 2008). As in the Lombard effect, compensation in response to formant and pitch perturbations appears to be largely automatic (Munhall et al., 2009).


Figure 4.1 Perturbation (solid line) and average compensation (dots) of first formant frequency in hertz. The frequencies have been normalized to the mean of the baseline phase

(Source: Adapted from MacDonald, Goldberg, & Munhall, 2010).

In birdsong, feedback perturbations result in similar responses. Pitch shifting single notes yields a compensatory response wherein vocal output shifts in the direction opposite to the perturbation (Sober & Brainard, 2009). As with humans, this response is often incomplete. Sober and Brainard (2009) found that a 100 percent pitch shift yielded a 50 percent change in response on average; however, contrary to the pattern observed in humans, this compensation is not immediate. In the same experiment, Sober and Brainard (2009) found that pitch shifts developed across a two‐week period, and that, once the pitch shift stimulus was removed, return to baseline was gradual. In humans, compensations in response to feedback perturbations are observed within single testing sessions (see Purcell & Munhall, 2006; Terband, van Brenk, & van Doornik‐van der Zee, 2014; Zarate & Zatorre, 2008) and even single trials (Tourville, Reilly, & Guenther, 2008), and speech acoustics return to baseline slowly within a session after removal of the perturbation stimulus (Purcell & Munhall, 2006). The reasons for these interspecies differences are unclear; however, the evidence overwhelmingly supports the notion that both humans and songbirds actively correct for “errors” in vocal production, comparing vocal output to some form of target in real time.

A notable exception to direct compensation occurs in response to delayed auditory feedback (DAF), wherein time delays are introduced between speech production and audition. DAF is nearly always followed by errors and interrupted flow of speech. In unaltered speech, the delay between speaking and hearing one’s own speech is about 1 millisecond (Yates, 1963). When this interval is artificially lengthened, numerous speech changes are introduced: vocal intensity rises, production speed slows, and stuttering or word repetitions are common (Chase et al., 1961). In birdsong, DAF yields similar errors as in humans: zebra finches produce more frequent stuttering (more repetitions of introductory notes) and more syllabic omissions when feedback is delayed (Cynx & von Rad, 2001).

One of the unique aspects of DAF is that it is not something that can be readily compensated for. Unlike feedback for vocal pitch, loudness, spectral detail, or even the detailed timing of the utterances (e.g. Mitsuya, MacDonald, & Munhall, 2014), all of which define the intentional characteristics of the signal, DAF is an indicator of the transmission speed of the sensorimotor organization. As such, feedback timing acts as a constraint on the use of speech motor feedback. Recently, Mitsuya, Munhall, and Purcell (2017) showed that the amount of compensation for perturbed formant frequency decreased linearly with delay in feedback. In this study a 200 Hz perturbation to F1 auditory feedback was introduced with 100 ms delay in feedback. Every 10 trials the delay was reduced by 10 ms though the magnitude of the frequency perturbation remained constant. The magnitude of F1 compensation grew as the delay was reduced. These findings demonstrate that auditory feedback beyond a temporal window ceases to play its role as an effective control signal for speech production.

Collectively, these findings provide consistent support for the importance of auditory feedback for the development and maintenance of spoken language. This feedback processing is evident for a variety of attributes of spoken language and the data imply the existence of some form of articulatory/acoustic goals that are supported by perceptual feedback. However, the mechanisms underlying this process remain unclear.

The Handbook of Speech Perception

Подняться наверх