Value-of-Information based Arbitration between Model-based and Model-free Control

AbstractThere have been numerous attempts in explaining the general learning behaviours using model-based and model-free methods. While the model-based control is flexible yet computationally expensive in planning, the model-free control is quick but inflexible. Multiple arbitration schemes have been suggested to achieve the data efficiency and computational efficiency of model-based and model-free control schemes, respectively. In this context, we propose a quantitative 'value-of-information' based arbitration between both the controllers in order to establish a general computational framework for skill learning. The interacting model-based and model-free reinforcement learning processes are arbitrated using an uncertainty-based value-of-information estimation. We further show that our algorithm performs better than Q-learning as well as Q-learning with experience replay.

Return to previous page