Modelling Perceptual Effects of Phonology with ASR Systems
- Bing'er Jiang, Linguistics department, McGill University, Montreal, Quebec, Canada
- Ewan Dunbar, Laboratoire de Linguistique Formelle, CNRS - Université Paris Diderot - Sorbonne Paris Cité, Paris, France
- Morgan Sonderegger, McGill University, Montreal, Quebec, Canada
- Meghan Clayards, McGill University, Montreal, Quebec, Canada
- Emmanuel Dupoux, CoML, INRIA, Paris, France
AbstractThis paper explores the minimal knowledge a listener needs to compensate for phonological assimilation, one kind of phonological process responsible for variation in speech. We used standard automatic speech recognition models to represent English and French listeners. We found that, first, some types of models show language-specific assimilation patterns comparable to those shown by human listeners. Like English listeners, when trained on English, the models compensate more for place assimilation than for voicing assimilation, and like French listeners, the models show the opposite pattern when trained on French. Second, the models which best predict the human pattern use contextually-sensitive acoustic models and language models, which capture allophony and phonotactics, but do not make use of higher-level knowledge of a lexicon or word boundaries. Finally, some models overcompensate for assimilation, showing a (super-human) ability to recover the underlying form even in the absence of the triggering phonological context, pointing to an incomplete neutralization not exploited by human listeners.