Probabilistic Inference for Learning Control
In this paper, we introduce pilco, a practical,
data-efficient model-based policy search
method. Pilco reduces model bias, one of
the key problems of model-based reinforcement
learning, in a principled way. By learning
a probabilistic dynamics model and explicitly
incorporating model uncertainty into
long-term planning, pilco can cope with
very little data and facilitates learning from
scratch in only a few trials. Policy evaluation
is performed in closed form using state-of-the-
art approximate inference. Furthermore,
policy gradients are computed analytically
for policy improvement. We report unprecedented
learning efficiency on challenging and
high-dimensional control tasks.
Marc Peter Deisenroth and Carl Edward Rasmussen
PILCO: A Model-Based and Data-Efficient Approach to Policy Search
in Proceedings of the 28th International Conference on Machine
Learning (ICML), June 2011, Bellevue, WA, USA
[pdf].
Last modified: Mon Jun 6 17:46:14 BST 2011