Читать книгу The Handbook of Language and Speech Disorders - Группа авторов - Страница 49
3.3.5 Prosody Perception
ОглавлениеAnother informational layer of speech that is affected by spectral degradation is the perception of prosody. The functions of prosody in language are threefold. First of all, prosody signals the meaning or morphological and syntactic structure of several levels of linguistic elements, such as words, sentences, and larger units of discourse. This is commonly referred to as linguistic prosody. For instance, it distinguishes certain segmentally identical words, such as REcord vs. reCORD (lexical stress) and marks word grouping (phrasing), such as blue bottle vs. bluebottle, and given as opposed to new information (topic vs. focus), such as My COLleague was supposed to do this, as opposed to, My colleague was supposed to do THIS, where capitals indicate sentential accents. Secondly, prosody reflects the emotional state of the speaker or their attitude in relation to their utterance, and this attribute of speech is termed emotional prosody. For example, any utterance, in principle, may be pronounced in a sad, happy, angry or fearful way, or with any other emotion. Attitudes such as surprise, irony and sarcasm may also be employed to signal a speaker’s stance with regards to the truthfulness of the utterance. Finally, indexical prosody is suprasegmental information about the identity of the speaker, such as age, health, and provenance (Lehiste, 1970; Rietveld & van Heuven, 2016). Prosody is mainly conveyed by means of variation in intensity (stress), voice fundamental pitch variation (F0, also referred to as intonation), duration (of any linguistic unit as well as pauses in speech) and voice quality, for example, harshness, strain and creakiness (Rietveld & van Heuven, 2016). The current discussion will focus on the ability of CI listeners to perceive linguistic and emotional prosody.
An important investigation of linguistic prosody was performed by Meister et al. (2015). They presented CI users and NH controls with increments of manipulated F0, intensity and duration cues for word stress. They also measured just‐noticeable difference discrimination thresholds of participants for these phonetic dimensions. The researchers showed that the clinical group’s performance was compromised by the F0 and intensity cue manipulations, but not by manipulation of the duration cue, suggesting that they relied more on duration than on the other cues. A similar pattern was observed in the discrimination thresholds reported in the study, which were least elevated for CI in comparison to NH listeners for duration (51 ms for CI; and 40 ms for NH), more elevated for intensity (3.9 dB for CI; and 1.8 dB for NH), and most elevated for F0 (5.8 semitones for CI; and 1.5 semitones for NH) (cf. Kalathottukaren, Purdy, & Ballard, 2015; See, Driscoll, Gfeller, Kliethermes, & Oleson, 2013). O’Halpin (2009) found that school‐aged children with CIs were outperformed by their NH peers on phrase/compound word discrimination (blue bottle vs. bluebottle) and identification of two‐way (It’s a BLUE book vs. It’s a blue BOOK) and three‐way sentence accent positions (The BOY is painting a boat vs. The boy is PAINTING a boat vs. The boy is painting a BOAT). Furthermore, the CI children had larger discrimination thresholds for F0, and relatively smaller discrimination thresholds for intensity and duration, when tested with manipulated nonsense disyllables. These discrimination thresholds were correlated per cue with the scores on linguistic prosody, which indicates that prosody perception may be supported by psychophysical capabilities in CI children. The relationship between psychophysical and speech perceptual performance may not be as straightforward when it comes to postlingually deafened adults, as Morris, Magnusson, Faulkner, Jönsson, and Juul (2013) included discrimination thresholds for F0, intensity, duration and vowel quality, which was first formant, in a logistic regression analysis with prosody identification as the dependent variable. The prosodic tasks were vowel length, word stress, and phrase/compound word identification, and these were performed in quiet and also in a 10 dB SNR noise background. Only the discrimination threshold for intensity was found to be a significant variable, indicating that adult CI recipients who can better make use of intensity changes are also better at these types of linguistic‐prosodic tasks.
In a recent review on emotional prosody processing, Jiam, Caldwell, Deroche, Chatterjee, and Limb (2017) concluded that CI users have considerable difficulties perceiving and producing emotional prosody. For example, in Gilbers et al. (2015) on a four‐way emotion identification test using nonsense words, CI users scored around 45% correct, NH listeners using vocoders around 70%, and NH listeners using unprocessed speech over 90% (chance level was 25%). Another emotional prosody review (Picou, Singh, Goy, Russo, Hickson, et al., 2018) attributes the deficits to CI recipients’ poor pitch processing capabilities. They rely more on other cues (like intensity and duration) than NH listeners, but despite their poor pitch discrimination abilities, CI listeners do still rely to some extent on pitch cues. This would explain their difficulties in emotional prosody perception, where pitch information can be very important. Another study by Picou et al. (2018) found no relationship between the degree of hearing loss, as measured by pure‐tone average thresholds, or pitch, loudness and gap detection sensitivity, on the one hand, and the recognition of emotions with which semantically neutral sentences were pronounced on the other. This literature shows that deficits in CI recipients’ emotional prosody perception seem more difficult to explain in terms of lower‐level psychophysical abilities than linguistic prosody perception. One speculative explanation is that emotion processing operates relatively independently to that of processing linguistic structure because perception of the emotional content of speech is more basic and universal (Scherer, Banse, & Wallbott, 2001) and less domain‐specific; for instance, affective emotions can also be conveyed by visual cues.
It has been demonstrated that problems with emotional prosody perception have repercussions for CI recipients’ social development, and this is, of course, a problem that is more pronounced in prelingually deafened CI recipients who have developed their receptive communication exclusively through their implants. Wiefferink, Rieffe, Ketelaar, De Raeve, and Frijns (2013) found that Dutch CI children, between 2.5 and 5 years old, performed worse on facial and situational emotion understanding and general expressive and receptive language development. The language test scores correlated with emotion tests that required verbal processing, suggesting that linguistic development can predict emotional development to some extent. Mancini et al. (2016) confirmed the connection between linguistic and emotional development for a group of 72 children aged between 4 and 11 years. However, 79% of their cohort showed no deviant understanding of emotion, which may be due to the fact that their participants were older and that a larger percentage of them used oral language exclusively.
In summary, CI listeners may to some extent, in their daily communication with the device, miss out on many if not most aspects of prosody. In prelingually deafened CI children this seems to be due to their more basic problems with pitch and spectral perception, although this causality is less clear for postlingually deafened CI recipients and the perception of emotional prosody by all CI recipients. Emotional prosody perception deficits in CI children have been shown to have consequences for more general socio‐emotional development.