Phonemic learning based on articulatory-acoustic speech representations

AbstractInfants learn to imitate and recognize words at an early age, but phonemic awareness develops at a later age, guided by acquisition of literacy for example. We investigate a hypothesis that speech representations in the brain are formed partly due to articulatory-acoustic learning, and these representations may be used as a basis when learning an additional mapping to phonemes. We train a convolutional recurrent neural network, having an articulatory branch and a phonemic branch for multitask learning. When trained with real conversational speech and aligned synthesized articulation, it is shown that the use of the articulatory representation boosts phoneme recognition accuracy, when the first convolutional layers are shared between the two branches. It is hypothesized that representations involved in speech perception formed in the brain during childhood may be partly based on articulatory learning, and an additional mapping from these low-level speech representations to phonemes has to be learned.

Return to previous page