A Simple Computational Model of Salience Map Formation in the Brain
- Abe Leite, Cognitive Science Program, Indiana University, Bloomington, Indiana, United States
AbstractMany convolutional neural network (CNN)-based approaches are excellent functional models of visual attention, but lack cognitive and biological interpretations. In this work, I offer novel, cross-disciplinary justification for the Deep Gaze 1 model, which calculates salience as a weighted average of feature maps from a pre-trained CNN. In the cognitive realm, experiments demonstrate that visual attention depends on multiple levels of real-world features (edges, text, faces). This is well-modeled using features from a naturalistically-trained CNN. Furthermore, neuroscience research strongly suggests that visual attention is computed in the superior colliculus, using information from multiple levels of the ventral visual stream; all information flow in Deep Gaze follows analogous pathways. To encourage broader adoption of this model, whose source code remains unpublished, I offer a readable implementation with minor changes for biological plausibility. It is validated on the MIT1003 dataset using features from MobileNetV2, with results comparable to the original Deep Gaze.