Читать книгу The Handbook of Speech Perception - Группа авторов - Страница 63

Auditory feedback and vocal learning

Оглавление

Much is made about the uniqueness of human language and at the speech level, the frequent focus of these uniqueness claims is on the perceptual skills of the developing infant. However, the less emphasized side of communication, speaking, is clearly a specialized behavior. Humans are part of a small cohort of species that are classified as vocal learners and who acquire the sounds in their adult repertoire through social learning (Petkov & Jarvis, 2012). This trait seems to be an example of convergent evolution in a few mammalian (humans, dolphins, whales, seal, sea lions, elephants, and bats) and bird (songbirds, parrots, and hummingbirds) species. The behavioral similarities shown by these disparate species are mirrored by their neuroanatomy and gene expression. In a triumph of behavioral, genetic, and neuroanatomic research, a consortium of scientists has shown similarities in brain pathways for vocal learners that are not observed in species that do not learn their vocal repertoires (Pfenning et al., 2014).

As shown by the studies of deafness summarized earlier, hearing is vital for vocal learning. However, the role that auditory feedback plays in speech development is unclear. There are few human developmental studies that manipulate feedback in the early stages of speech development. Jones and colleagues (Scheerer, Liu, & Jones, 2013; Scheerer, Jacobson, & Jones, 2016, 2019) have shown that children as young as two years of age show compensation for F0 perturbations. However, feedback perturbation of segmental properties such as vowel formant frequency shows a different pattern of results. MacDonald et al. (2012) tested children at two and four years of age, as well as adults in a formant feedback perturbation paradigm. By the age of four, young children acted like adults and partially compensated in response to F1 and F2 perturbations. At the age of two, however, the toddlers showed two significant patterns in response to feedback perturbations (see Figure 4.2). On average there was no evidence of compensatory behavior when the two‐year‐old children were presented with altered feedback of vowel formants. Further, they produced utterances that were remarkably variable. Variability is one of the hallmarks of early speech and birdsong development. But what role does this variability play in development and does feedback processing have a developmental profile? We will return to this question.

The DIVA model proposes that early vocal development involves a closed‐loop imitation process driven initially by two stages of babbling and later a vocal learning stage that directly involves corrective feedback processing. In the first babbling phase, random articulatory motions generate somatosensory and acoustic consequences, and a mapping of the developing vocal tract as a speech device takes place. Separately, infants learn the perceptual categories of their native language. This is crucial to the model. As Guenther (1995, p. 599) states, “the model starts out with the ability to perceive all of the sounds that it will eventually learn to produce.” When the second stage of babbling begins, it involves the mapping between phonetic categories developed through perceptual learning and articulation. The babbling during this period tunes the feedback system to permit corrective responses to detected errors. In the next imitation phase, infants hear adult syllables that they try to reproduce. In a cyclical process involving sensory feedback from actual productions and better feedforward commands, the system shapes the early utterances toward the native language.

Simulations by Guenther and his students support the logic of this account. However, there are significant concerns. First among these is that the data supporting this process are weak or, in the case of MacDonald et al.’s (2012) results, contradict the hypothesis. Early speech feedback processing and the shaping of speech production targets is not well attested. Second, the proposal relies on a strong relationship between the representations of speech perception and speech production. Surprisingly, this relationship is controversial.

Models like DIVA predict a phenomenon that is frequently assumed but is not strongly supported by actual data – babbling drift. The hypothesis that the sounds of babbling drift over time was first proposed by Roger Brown (1958). Brown suggested that the phonetic repertoire in the babbling of infants slowly begins to resemble the phonetics of the language environment that they are exposed to and begins to not include sounds that are absent from the native language. As the review by Best et al. (2016) indicates, the support for this idea is mixed, particularly from transcription studies and perceptual studies where naive listeners attempted to identify the language environment of the infant’s babbling.

There is broad agreement that early babbling has common characteristics across languages and a somewhat limited phonetic repertoire. The evidence from later babbling lacks this broad consensus. Older transcription studies of late babbling were often plagued by small sample sizes and bias issues inherent in transcription. Studies with larger sample sizes, however, still show conflicting patterns of results. For example, de Boysson‐Bardies and Vihman (1991) reported that the prevalence of consonants of different manners and places of articulation in the babbling of 12‐month old infants from English, French, Japanese, and Swedish homes corresponded to the distributions of consonants in their language environments. In contrast, a number of other transcription studies have failed to find such differences (e.g. Kern, Davis, & Zink, 2009; Lee, Davis, & MacNeilage, 2010).


Figure 4.2 Average F1 (circles) and F2 (triangles) frequencies estimates across time for adults (top panel), young children (middle panel), and toddlers (bottom panel). The formant frequencies have been normalized to the average baseline frequencies. The shaded area indicates when subjects were given altered auditory feedback (from MacDonald et al., 2012).

Source: MacDonald et al., © 2012, Elsevier.

Another approach has been to use recordings of infants babbling as perceptual stimuli and ask adult listeners to categorize what native language the infants have. These studies have also shown mixed results, with some experiments reporting that listeners can discriminate the home language of the infants (e.g. de Boysson‐Bardies, Sagart, & Durand, 1984) while others showed no perceptual difference (e.g. Thevenin et al., 1985). The more serious concern about these studies is that listeners were likely tuning into prosodic differences in the babbling rather than the segmental differences that would be predicted by babbling drift. The ability to perceptually distinguish the language of babbling has been shown for low‐pass filtered stimuli (e.g. Whalen, Levitt, & Wang, 1991) and this supports the idea that it is prosodic differences that are driving these results. A recent controlled study (Lee et al., 2017) with a large number of stimuli found that perceptual categorizations of Chinese‐ and English‐learning babies’ utterances at 8, 10, and 12 months of age were only reliable for a small subset of the stimuli (words or canonical syllables that resembled words). These effects were modest and suggest that early lexical development rather than babbling may be where the home language shows its earliest influence.

Direct measurements of babbling acoustics have shown evidence for babbling drift, albeit only small effects. For example, Whalen, Levitt, and Goldstein (2007) measured voice onset time (VOT) in French‐ and English‐learning infants at ages 9 and 12 months. There were no differences in VOT or in the duration of prevoicing that was observed. However, there was a greater incidence of prevoicing in the French babies which corresponds to adult French–English differences.

The most serious concern from the existing data is that there is no evidence for speech‐production tuning of targets based on production errors. MacDonald et al. (2012) data suggest that young children do not correct errors. However, there are several caveats to that conclusion. First, the magnitude of the perturbation may have to be within a critical range and the perturbations for all ages in MacDonald et al. were the same in hertz. It is possible that younger children require larger perturbations to elicit compensations. A related issue is that the perturbations may have been within the noisy categories that the children were producing. The variability of production may be an indicator of the category status. However, even if this were true, it begs the question: How could an organism learn to produce adult targets under these conditions? The challenges are enormous. Juveniles in all species have vocal tracts that do not match their parents’ vocal tracts. Birds and other species show marked production variance as juveniles (e.g. Bertram et al., 2014). There is no obvious feedback base mechanism that permits the mapping from adult targets to young productions (see Messum & Howard, 2015). Error correction as normally envisioned in motor control may not be engaged.

The Handbook of Speech Perception

Подняться наверх