Probabilistic Inference for Learning Control

In this paper, we introduce pilco, a practical, data-efficient model-based policy search method. Pilco reduces model bias, one of the key problems of model-based reinforcement learning, in a principled way. By learning a probabilistic dynamics model and explicitly incorporating model uncertainty into long-term planning, pilco can cope with very little data and facilitates learning from scratch in only a few trials. Policy evaluation is performed in closed form using state-of-the- art approximate inference. Furthermore, policy gradients are computed analytically for policy improvement. We report unprecedented learning efficiency on challenging and high-dimensional control tasks.

Marc Peter Deisenroth and Carl Edward Rasmussen
PILCO: A Model-Based and Data-Efficient Approach to Policy Search
in Proceedings of the 28th International Conference on Machine Learning (ICML), June 2011, Bellevue, WA, USA [pdf].
Last modified: Mon Jun 6 17:46:14 BST 2011