Learning what is relevant for rewards via value-based serial hypothesis testing
- Mingyu Song, Princeton Neuroscience Institute, Princeton University, Princeton, New Jersey, United States
- Yael Niv, Princeton University, Princeton, New Jersey, United States
- Mingbo Cai, International Research Center for Neurointelligence, University of Tokyo, Tokyo, Japan
AbstractLearning what is relevant for reward is a ubiquitous and crucial task in daily life, where stochastic reward outcomes can depend on an unknown number of task dimensions. We designed a paradigm tailored to study such complex scenarios. In the experiment, participants configured three-dimensional stimuli by selecting features for each dimension and received probabilistic feedback. Participants selected more rewarding features over time, demonstrating learning. To investigate the learning process, we tested two learning strategies, feature-based reinforcement learning and serial hypothesis testing, and found evidence for both. The extent to which each strategy was engaged depended on the instructed task complexity: when instructed that there were fewer relevant dimensions (and therefore fewer reward-generating rules were possible) people tended to serially test hypotheses, whereas they relied more on learning feature values when more dimensions were relevant. To explain the behavioral dependency on task complexity and instructions, we tested variants of the value-based serial hypothesis testing model. We found evidence that participants constructed their hypothesis space based on the instructed task condition, but they failed to use all the information provided (e.g. reward probabilities). Our current best model can qualitatively capture the difference in choice behavior and performance across task conditions.