A Computational Model of Early Word Learning from the Infant’s Point of View
- Satoshi Tsutsui, Indiana University , Bloomington, Indiana, United States
- Arjun Chandrasekaran, Max Planck Institute for Intelligent Systems, Germany, Germany
- Md. Alimoor Reza, Indiana University, Bloomington, Indiana, United States
- David Crandall, Indiana University, Bloomington, Indiana, United States
- Chen Yu, Department of Psychological and Brain Sciences, Indiana University, Bloomington, Bloomington, Indiana, United States
AbstractHuman infants have the remarkable ability to learn the associations between object names and visual objects from inherently ambiguous experiences. Researchers in cognitive science and developmental psychology have built formal models that implement in-principle learning algorithms, and then used pre-selected and pre-cleaned datasets to test the abilities of the models to find statistical regularities in the input data. In contrast to previous modeling approaches, the present study used egocentric video and gaze data collected from infant learners during natural toy play with their parents. This allowed us to capture the learning environment from the perspective of the learner's own point of view. We then used a Convolutional Neural Network (CNN) model to process sensory data from the infant's point of view and learn name-object associations from scratch. As the first model that takes raw egocentric video to simulate infant word learning, the present study provides a proof of principle that the problem of early word learning can be solved, using actual visual data perceived by infant learners. Moreover, we conducted simulation experiments to systematically determine how visual, perceptual, and attentional properties of infants' sensory experiences may affect word learning.