Learning from the acoustic signal: Error-driven learning of low-level acoustics discriminates vowel and consonant pairs
- Jessie Nixon, Quantitative Lingustics, University of Tübingen, Tübingen, Germany
- Fabian Tomaschek, Quantitative Lingustics, University of Tübingen, Tübingen, Germany
AbstractUntil the last couple of decades, research on speech acquisition generally assumed that infants were born with innate knowledge of a universal set of phonetic features that occurred across all the world's languages and that learning one's native language involved selecting the appropriate subset from this broader group. Over the last two decades, this account has given way to the idea that speech sounds are learned via general learning mechanisms. Statistical clustering models have become the most common way to explain how infants learn the sounds of their language. Over the first few months of life, infants go from being able to discriminate the sounds of all languages, to perceiving speech sounds in a way that is increasingly honed to their native language. However, recent empirical and computational evidence suggests that purely statistical clustering methods may not be sufficient to explain speech sound acquisition. The present study used discriminative, error-driven learning, an implementation of the Rescorla-Wagner model, to model early development of speech perception. Expectations about the upcoming acoustic speech signal were learned from the surrounding speech signal, with spectral slices from a spontaneous speech corpus as both inputs and outputs of the model. After training, the model was tested on vowel and fricative continua. The model was able to discriminate both the vowels and the consonants, with decreasing activation along the continuum for steps further away from the target sounds. Inspection of cue weights showed that both the vowels and the fricatives were discriminated based on cues in the expected spectral frequency ranges. Vowel discrimination occurred in the frequency bands corresponding to vowel formants; fricative discrimination occurred in the lowest frequencies corresponding to presence versus absence of voicing. These results suggest that a discriminative error-driven approach may provide a viable alternative to statistical clustering for modelling early infant speech acquisition.