Grounding Spatial Language in Perception by Combining Concepts in a Neural Dynamic Architecture

AbstractWe present a neural dynamic architecture that grounds sentences in perception which combine multiple concepts through nested spatial relations. Grounding entails that the model gets features and relations as categorical inputs and matches them to objects in space-continuous neural maps which represent visual input. The architecture is based on the neural principles of dynamic field theory. It autonomously generates sequences of processing steps in continuous time, based solely on highly recurrent connectivity. Simulations of the architecture show that it can ground sentences of varying complexity. We thus address two major challenges in dealing with nested relations: how concepts may appear in multiple different relational roles within the same sentence, and how in such a scenario various grounding outcomes may be “tried out” in a form of hypothesis testing. We close by discussing empirical evidence for crucial assumptions and choices made when developing the architecture.

Return to previous page