Читать книгу The Handbook of Speech Perception - Группа авторов - Страница 55

Conclusion

Оглавление

Our journey through the auditory pathway has finally reached the end. It was a substantial trip, through the ear and auditory nerve, brainstem and midbrain, and many layers of cortical processing. We have seen how, along that path, speech information is initially encoded by some 30,000 auditory nerve fibers firing hundreds of thousands of impulses a second, and how their activity patterns across the tonotopic array encode formants, while their temporal firing patterns encode temporal fine structure cues to pitch and voicing. We have learned how, as these activity patterns then propagate and fan out over the millions of neurons of the auditory brainstem and midbrain, information from both ears are combined to add cues to sound‐source direction. Furthermore, temporal fine structure information gets recoded, so that temporal firing patterns at higher levels of the auditory brain no longer need to be read out with sub‐millisecond precision, and information about the pitch and timbre of speech sounds is instead encoded by a distributed and multiplexed firing‐rate code. We have seen that the neural activity patterns at levels up to and including the primary auditory cortex are generally thought to represent predominantly physical acoustic or relatively low‐level psychoacoustic features of speech sounds, and that this is then transformed into increasingly phonetic representations at the level of the STG, and into semantic representations as we move beyond the STG into the frontal and parietal brain areas. Finally we have seen how notions of embodied meaning, as well as of statistical learning, are shaping our thinking about how the brain represents the meaning of speech.

By the time they reach these meaning‐representing levels of the brain, the waves of neural activity racing up the auditory pathway will have passed through at least a dozen anatomical processing stations, each composed of between a few hundreds of thousands to hundreds of millions of neurons, each of which is richly and reciprocally interconnected both internally and with the previous and the next levels in the processing hierarchy. We hope readers will share our sense of awe when we consider that it takes a spoken word only a modest fraction of a second to travel through this entire stunningly intricate network to be transformed from sound wave to meaning.

Remember that the picture painted here of a feed‐forward hierarchical network that transforms acoustics to phonetics to semantics is a highly simplified one. It is well grounded in scientific evidence, but it is necessarily a rather selective telling of the story as we understand it to date. Recent years have been a particularly productive time in auditory neuroscience, as insights from animal research, human brain imaging, human patient data and ECoG studies, and artificial intelligence have begun to come together to provide the framework of understanding we have attempted to outline here. But many important details remain unknown, and, while we feel fairly confident that the insights and ideas presented here will stand the test of time, we must be aware that future work may not just complement and refine but even overturn some of the ideas that we currently put forward as our best approximations to the truth. One thing we are absolutely certain of, though, is that studying how human brains speak to each other will remain a profoundly rewarding intellectual pursuit for many years to come.

The Handbook of Speech Perception

Подняться наверх