Marc Deisenroth's Homepage
Search Contact information
University of Cambridge Home Department of Engineering
Computational and Biological Learning Lab
University of Cambridge >  Department of Engineering >  Information Engineering  >  Computational and Biological Learning Lab  > Marc Deisenroth

Home Schedule Invited Talks Accepted Posters

NIPS-Workshop on Probabilistic Approaches for Robotics and Control

Invited Talks

Speaker Title Abstract Slides
Dieter Fox GP-BayesFilters: Gaussian Process Regression for Bayesian Filtering Bayes filters recursively estimate the state of dynamical systems from streams of sensor data. Key components of each Bayes filter are probabilistic prediction and observation models. In robotics, these models are typically based on parametric descriptions of the physical process generating the data. In this talk I will show how non-parametric Gaussian process prediction and observation models can be integrated into different versions of Bayes filters, namely particle filters and extended and unscented Kalman filters. The resulting GP-BayesFilters can have several advantages over standard filters. Most importantly, GP-BayesFilters do not require an accurate, parametric model of the system. Given enough training data, they enable improved tracking accuracy compared to parametric models, and they degrade gracefully with increased model uncertainty. We extend Gaussian Process Latent Variable Models to train GP-BayesFilters from partially or fully unlabeled training data. The techniques are evaluated in the context of visual tracking of a micro blimp and IMU-based tracking of a slotcar. [pdf]
Drew Bagnell Imitation Learning and Purposeful Prediction: Probabilistic and Non-probabilistic Methods Programming robot behavior remains a challenging task. While it is often easy to abstractly define or even demonstrate a desired behavior, designing a controller that embodies the same behavior is difficult, time consuming, and ultimately expensive. The machine learning paradigm offers the promise of enabling "programming by demonstration" for developing high-performance robotic systems. Unfortunately, many "behavioral cloning" approaches that utilize the classical tools of supervised learning (e.g. decision trees, neural networks, or support vector machines) do not fit the needs of modern robotic systems. Classical statistics and supervised machine learning exist in a vacuum: predictions made by these algorithms are explicitly assumed to not affect the world in which they operate.
In practice, robotic systems are often built atop sophisticated planning algorithms that efficiently reason far into the future; consequently, ignoring these planning algorithms in lieu of a supervised learning approach often leads to myopic and poor-quality robot performance. While planning algorithms have shown success in many real-world applications ranging from legged locomotion to outdoor unstructured navigation, such algorithms rely on fully specified cost functions that map sensor readings and environment models to quantifiable costs. Such cost functions are usually manually designed and programmed. Recently, our group has developed a set of techniques that learn these functions from human demonstration. These algorithms apply an Inverse Optimal Control (IOC) approach to find a cost function for which planned behavior mimics an expert's demonstration.
I'll discuss these methodologies, both probabilistic and otherwise, for imitation learning. I'll focus on the Principle of Causal Maximum Entropy that generalizes the classical Maximum Entropy Principle, widely used in many fields including physics, statistics, and computer vision, to problems of decision making and control. This generalization enables MaxEnt to apply to a new class of problems including Inverse Optimal Control and activity forecasting. This approach further elucidates the intimate connections between probabilistic inference and optimal control.
I'll consider case studies in activity forecasting of drivers and pedestrians as well as the imitation learning of robotic locomotion and rough-terrain navigation. These case-studies highlight key challenges in applying the algorithms in practical settings that utilize state-of-the-art planners and are constrained by efficiency requirements and imperfect expert demonstration.
[pdf]
Evangelos Theodorou Reinforcement Learning in High Dimensional State Spaces: A Path Integral Approach With the goal to generate more scalable algorithms with higher efficiency and fewer open parameters, reinforcement learning (RL) has recently moved towards combining classical techniques from optimal control and dynamic programming with modern learning techniques from statistical estimation theory. In this vein, this paper suggests to use the framework of stochastic optimal control with path integrals to derive a novel approach to RL with parameterized policies. While solidly grounded in value function estimation and optimal control based on the stochastic Hamilton-Jacobi-Bellman (HJB) equations, policy improvements can be transformed into an approximation problem of a path integral which has no open parameters other than the exploration noise. The resulting algorithm can be conceived of as model-based, semi-model-based, or even model free, depending on how the learning problem is structured. The update equations have no danger of numerical instabilites as neither matrix inversions nor gradient learning rates are required. Our new algorithm demonstrates interesting similarities with previous RL research in the framework of probability matching and provides intuition why the slightly heuristically motivated probability matching approach can actually perform well. Empirical evaluations demonstrate significant performance improvements over gradient-based policy learning and scalability to high-dimensional control problems. Finally, a learning experiment on a simulated 12 degree-of-freedom robot dog illustrates the functionality of our algorithm in a comoplex robot learning scenario. We believe that Policy Improvement with Path Integrals or PI^2 offers currently one of the most efficient, numerically robust, and easy to implement algorithms for RL based on trajectory roll-outs. [pdf]
Jovan Popovic Linear Bellman Combination for Simulation of Human Motion Simulation of natural human motion is challenging because the relevant system dynamics is high-dimensional, underactuated—no direct control over global position and orientation—and non-smooth—frequent and intermittent ground contacts. In order to succeed, control policy must look ahead to determine stabilizing actions and it must optimize to generate lifelike motion. In this talk, we will review recently developed control systems that yield high-quality agile movements for three-dimensional human simulations. Creating such controllers requires intensive computer optimization and reveals a need for reusing as many control policies as possible. We will answer this problem partially with an efficient combination that creates a new optimal control policy by reusing a set of optimal controls for related tasks. It remains to be seen if the same approach can also be applied to control systems needed to generate lifelike human motion. [pdf]
Konrad Körding Estimating the Sources of Motor Errors Motor adaptation is usually defined as the process by which our nervous system produces accurate movements while the properties of our bodies and our environment continuously change. Many experimental and theoretical studies have characterized this process by assuming that the nervous system uses internal models to compensate for motor errors. Here we extend these approaches and construct a probabilistic model that not only compensates for motor errors but estimates the sources of these errors. These estimates dictate how the nervous system should generalize. For example, estimated changes of limb properties will affect movements across the workspace but not movements with the other limb. We extend previous studies in that area to account for temporal and context effects. This extended model explains aspects of savings along with aspects of generalization. [pdf]
Marc Toussaint Approximate Inference Control Approximate Inference Control (AICO) is a method for solving Stochastic Optimal Control (SOC) problems. The general idea is to think of control as the problem of computing a posterior over trajectories and control signals conditioned on constraints and goals. Since exact inference is infeasible in realistic scenarios, the key for high-speed planning and control algorithms is the choice of approximations. In this talk I will introduce to the general approach, discuss its intimate relations to DDP and the current research on Kalman's duality, and discuss the approximations that we use to get towards real-time planning in high-dimensional robotic systems. I will also mention recent work on using Expectation Propagation and truncated Gaussians for inference under hard constraints and limits as they typically arise in robotics (collision and joint limit constraints). [pdf]
Miroslav Karny Probabilistic Design: Promises and Prospects The Fully Probabilistic Design (FPD) suggests a probabilistic description of the closed control loop behaviour as well as desired closed-loop behaviour. The optimal control strategy is selected as the minimiser of the Kullback-Leibler divergence of these distributions. The approach yields: (i) an explicit minimiser with the evaluation reduced to a conceptually feasible solution of integral equations; (ii) a randomised optimal strategy; (iii) a proper subset of FPDs formed via standard Bayesian designs; (iv) uncertain knowledge, multiple control goals, and optimisation constrains be expressed in the common probabilistic language. It implies: (i) an easier approximation of the dynamic programming counterpart; (ii) the optimal strategy is naturally explorative; (iii) the goals-expressing ideal distribution can be, even recursively, tailored to the observed closed-loop behavior; (iv) an opportunity to automatically harmonise knowledge and goals within a flat cooperation structure of decentralised task. An importance of the last point has been confirmed by a huge amount of societal/industrial problems that cannot be governed in a centralised way. The anticipated decentralised solution based on the FPD may concern either a number of interacting, locally independent elements, which have their local goals, but have to collaborate to reach a common group goal (e.g. cooperative robots, multi-agent systems, etc.); or a set of independent elements with own goals that need to coordinate their activities (e.g. transportation). The talk will recall the basic properties of FPD and discusses the promises of an exploitation of the FPD potential.
[pdf]
[pdf]
Nicholas Roy Planning under Uncertainty using Distributions over Posteriors Modern control theory has provided a large number of tools for dealing with probabilistic systems. However, most of these tools solve for local policies; there are relatively few tools for solving for complex plans that, for instance, gather information. In contrast, the planning community has provided ways to compute plans that handle complex probabilistic uncertainty, but these often don't work for large or continuous problems. Recently, our group has developed techniques for planners that can efficiently search for complex plans in probabilistic domains by taking advantage of local solutions provided by feedback and open-loop controllers, and predicting a distribution over the posteriors. This approach of planning over distributions of posteriors can incorporate a surprisingly wide variety of sensor models and objective functions. I will show some results in a couple of domains including helicopter flight in GPS-denied environments. [pdf]
Roderick Murray-Smith Probabilistic Control in Human Computer Interaction Continuous interaction with computers can be treated as a control problem subject to various sources of uncertainty. We present examples of interaction based on multiple noisy sensors (capacitive sensing, location- and bearing sensing and EEG), in domains which rely on inference about user intention, and where the use of particle filters can improve performance. We use the "H-metaphor" for automated, flexibly handover of level of autonomy in control, as a function of the certainty of control actions from the user, in an analogous fashion to 'loosening the reins' when horse-riding. Integration of the inference mechanisms with probabilistic feedback designs can have a significant effect on behaviour, and some examples are presented. (Joint work with John Williamson, Simon Rogers and Steven Strachan). [pdf]
Bert Kappen KL Control Theory and Decision making under Uncertainty KL control theory consists of a class of control problems for which the control computation can be solved as a graphical model inference problem. In this talk, we show how to apply this theory in the context of a delayed choice task and for collaborating agents. We first introduce the KL control framework. Then we show that in a delayed reward task when the future is uncertain it is optimal to delay the timing of your decision. We show preliminary results on human subjects that confirm this prediction. Subsequently, we discuss two player games, such as the stag-hunt game, where collaboration can improve or worsten as a result of recursive reasoning about the opponents actions. The Nash equilibria appear as local minima of the optimal cost to go, but may disappear when monetary gain decreases. This behaviour is in agreement with experimental findings in humans. [pdf]
Emanuel Todorov Linear Bellman Equations: Theory and Applications I will provide a brief overview of a class stochastic optimal control problems recently developed by our group as well as by Bert Kappen's group. This problem class is quite general and yet has a number of unique properties, including linearity of the exponentially-transformed (Hamilton-Jacobi) Bellman equation, duality with Bayesian inference, convexity of the inverse optimal control problem, compositionality of optimal control laws, path-integral representation of the exponentially-transformed value function. I will then focus on function approximation methods that exploit the linearity of the Bellman equation, and illustrate how such methods scale to high-dimensional continuous dynamical systems. Computing the weights for a fixed set of basis functions can be done very efficiently by solving a large but sparse linear problem. This enables us to work with hundreds of millions of (localized) bases. Still, the volume of a high-dimensional state space is too large to be filled with localized bases, forcing us to consider adaptive methods for positioning and shaping those bases. Several such methods will be compared. [pdf]
PASCAL2

© University of Cambridge, Department of Engineering
Information provided by Marc Deisenroth (mpd37)