Robert received his M.Sc. in Computer Science in 2017 from the Technical University of Darmstadt, where he worked with Prof. Gerhard Neumann, Prof. Jan Peters and Prof. Stefan Roth. He joined the group as a Ph.D. student in October 2017. He is supervised by Prof. Carl Rasmussen. His research interests include reinforcement learning and Bayesian machine learning. Robert receives funding from EPSRC.
Publications
Sample and Feedback Efficient Hierarchical Reinforcement Learning from Human Preferences
Robert Pinsler, Riad Akrour, Takayuki Osa, Jan Peters, Gerhard Neumann, May 2018. (In IEEE International Conference on Robotics and Automation). Brisbane, Australia.
Abstract▼ URL
While reinforcement learning has led to promising results in robotics, defining an informative reward function is challenging. Prior work considered including the human in the loop to jointly learn the reward function and the optimal policy. Generating samples from a physical robot and requesting human feedback are both taxing efforts for which efficiency is critical. We propose to learn reward functions from both the robot and the human perspectives to improve on both efficiency metrics. Learning a reward function from the human perspective increases feedback efficiency by assuming that humans rank trajectories according to a low-dimensional outcome space. Learning a reward function from the robot perspective circumvents the need for a dynamics model while retaining the sample efficiency of model-based approaches. We provide an algorithm that incorporates bi-perspective reward learning into a general hierarchical reinforcement learning framework and demonstrate the merits of our approach on a toy task and a simulated robot grasping task.
Bayesian batch active learning as sparse subset approximation
Robert Pinsler, Jonathan Gordon, Eric Nalisnick, Jose Miguel Hernández-Lobato, 2019. (In Advances in Neural Information Processing Systems 33).
Abstract▼ URL
Leveraging the wealth of unlabeled data produced in recent years provides great potential for improving supervised models. When the cost of acquiring labels is high, probabilistic active learning methods can be used to greedily select the most informative data points to be labeled. However, for many large-scale problems standard greedy procedures become computationally infeasible and suffer from negligible model change. In this paper, we introduce a novel Bayesian batch active learning approach that mitigates these issues. Our approach is motivated by approximating the complete data posterior of the model parameters. While naive batch construction methods result in correlated queries, our algorithm produces diverse batches that enable efficient active learning at scale. We derive interpretable closed-form solutions akin to existing active learning procedures for linear models, and generalize to arbitrary models using random projections. We demonstrate the benefits of our approach on several large-scale regression and classification tasks.
Factored Contextual Policy Search with Bayesian Optimization
Robert Pinsler, Peter Karkus, Andras Kupcsik, David Hsu, Wee Sun Lee, May 2019. (In IEEE International Conference on Robotics and Automation). Montreal, Canada.
Abstract▼ URL
Scarce data is a major challenge to scaling robot learning to truly complex tasks, as we need to generalize locally learned policies over different task contexts. Contextual policy search offers data-efficient learning and generalization by explicitly conditioning the policy on a parametric context space. In this paper, we further structure the contextual policy representation. We propose to factor contexts into two components: target contexts that describe the task objectives, e.g. target position for throwing a ball; and environment contexts that characterize the environment, e.g. initial position or mass of the ball. Our key observation is that experience can be directly generalized over target contexts. We show that this can be easily exploited in contextual policy search algorithms. In particular, we apply factorization to a Bayesian optimization approach to contextual policy search both in sampling-based and active learning settings. Our simulation results show faster learning and better generalization in various robotic domains. See our supplementary video: https://youtu.be/MNTbBAOufDY.