Text Matters but Speech Influences: A Computational Analysis of Syntactic Ambiguity Resolution

AbstractAnalyzing how human beings resolve syntactic ambiguity has long been an issue of interest in the field of linguistics. It is, at the same time, one of the most challenging issues for spoken language understanding (SLU) systems as well. As syntactic ambiguity is intertwined with issues regarding prosody and semantics, the computational approach toward speech intention identification is expected to benefit from the observations of the human language processing mechanism. In this regard, we address the task with attentive recurrent neural networks that exploit acoustic and textual features simultaneously and reveal how the modalities interact with each other to derive sentence meaning. Utilizing a speech corpus recorded on Korean scripts of syntactically ambiguous utterances, we revealed that co-attention frameworks, namely multi-hop attention and cross-attention, show significantly superior performance in disambiguating speech intention. With further analysis, we demonstrate that the computational models reflect the internal relationship between auditory and linguistic processes.

Return to previous page