| Speaker |
Title |
Abstract |
Slides |
| Dieter Fox |
GP-BayesFilters: Gaussian Process Regression for Bayesian Filtering |
Bayes filters recursively estimate the state of dynamical systems from streams
of sensor data. Key components of each Bayes filter are probabilistic
prediction and observation models. In robotics, these models are typically
based on parametric descriptions of the physical process generating the data. In this
talk I will show how non-parametric Gaussian process prediction and observation
models can be integrated into different versions of Bayes filters, namely
particle filters and extended and unscented Kalman filters. The resulting
GP-BayesFilters can have several advantages over standard filters. Most
importantly, GP-BayesFilters do not require an accurate, parametric model of the
system. Given enough training data, they enable improved tracking accuracy
compared to parametric models, and they degrade gracefully with increased model
uncertainty. We extend Gaussian Process Latent Variable Models to train
GP-BayesFilters from partially or fully unlabeled training data. The techniques
are evaluated in the context of visual tracking of a micro blimp and IMU-based
tracking of a slotcar.
|
[pdf]
|
| Drew Bagnell |
Imitation Learning and Purposeful Prediction: Probabilistic and
Non-probabilistic Methods
|
Programming robot behavior remains a challenging task. While it is often
easy to abstractly define or even demonstrate a desired behavior, designing
a controller that embodies the same behavior is difficult, time consuming,
and ultimately expensive. The machine learning paradigm offers the promise
of enabling "programming by demonstration" for developing high-performance
robotic systems. Unfortunately, many "behavioral cloning" approaches that
utilize the classical tools of supervised learning (e.g. decision trees,
neural networks, or support vector machines) do not fit the needs of modern
robotic systems. Classical statistics and supervised machine learning exist
in a vacuum: predictions made by these algorithms are explicitly assumed to
not affect the world in which they operate.
In practice, robotic systems are often built atop sophisticated planning
algorithms that efficiently reason far into the future; consequently,
ignoring these planning algorithms in lieu of a supervised learning approach
often leads to myopic and poor-quality robot performance. While planning
algorithms have shown success in many real-world applications ranging from
legged locomotion to outdoor unstructured navigation, such algorithms rely
on fully specified cost functions that map sensor readings and environment
models to quantifiable costs. Such cost functions are usually manually
designed and programmed. Recently, our group has developed a set of
techniques that learn these functions from human demonstration. These
algorithms apply an Inverse Optimal Control (IOC) approach to find a cost
function for which planned behavior mimics an expert's demonstration.
I'll discuss these methodologies, both probabilistic and otherwise, for
imitation learning. I'll focus on the Principle of Causal Maximum Entropy
that generalizes the classical Maximum Entropy Principle, widely used in
many fields including physics, statistics, and computer vision, to problems
of decision making and control. This generalization enables MaxEnt to apply
to a new class of problems including Inverse Optimal Control and activity
forecasting. This approach further elucidates the intimate connections
between probabilistic inference and optimal control.
I'll consider case studies in activity forecasting of drivers and pedestrians
as well as the imitation learning of robotic locomotion and rough-terrain
navigation. These case-studies highlight key challenges in applying the
algorithms in practical settings that utilize state-of-the-art planners and
are constrained by efficiency requirements and imperfect expert
demonstration. |
[pdf]
|
| Evangelos Theodorou |
Reinforcement Learning in High Dimensional State
Spaces: A Path Integral Approach |
With the goal to generate more scalable algorithms with higher
efficiency and fewer open parameters, reinforcement learning (RL) has
recently moved towards combining classical techniques from optimal
control and dynamic programming with modern learning techniques from
statistical estimation theory. In this vein, this paper suggests to
use the framework of stochastic optimal control with path integrals to
derive a novel approach to RL with parameterized policies. While
solidly grounded in value function estimation and optimal control
based on the stochastic Hamilton-Jacobi-Bellman (HJB) equations,
policy improvements can be transformed into an approximation problem
of a path integral which has no open parameters other than the
exploration noise. The resulting algorithm can be conceived of as
model-based, semi-model-based, or even model free, depending on how
the learning problem is structured. The update equations have no
danger of numerical instabilites as neither matrix inversions nor
gradient learning rates are required. Our new algorithm demonstrates
interesting similarities with previous RL research in the framework of
probability matching and provides intuition why the slightly
heuristically motivated probability matching approach can actually
perform well. Empirical evaluations demonstrate significant
performance improvements over gradient-based policy learning and
scalability to high-dimensional control problems. Finally, a learning
experiment on a simulated 12 degree-of-freedom robot dog illustrates
the functionality of our algorithm in a comoplex robot learning
scenario. We believe that Policy Improvement with
Path Integrals or PI^2 offers currently one of the
most efficient, numerically robust, and easy to implement algorithms
for RL based on trajectory roll-outs. |
[pdf]
|
| Jovan Popovic |
Linear Bellman Combination for Simulation of Human Motion |
Simulation of natural human motion is challenging because the
relevant system dynamics is high-dimensional, underactuated—no
direct control over global position and orientation—and
non-smooth—frequent and intermittent ground contacts. In order
to succeed, control policy must look ahead to determine
stabilizing actions and it must optimize to generate lifelike
motion. In this talk, we will review recently developed control
systems that yield high-quality agile movements for
three-dimensional human simulations. Creating such controllers
requires intensive computer optimization and reveals a need for
reusing as many control policies as possible. We will answer
this problem partially with an efficient combination that creates
a new optimal control policy by reusing a set of optimal controls
for related tasks. It remains to be seen if the same approach
can also be applied to control systems needed to generate
lifelike human motion. |
[pdf]
|
| Konrad Körding |
Estimating the Sources of Motor Errors |
Motor adaptation is usually defined as the process by which
our nervous system produces accurate movements while the properties of
our bodies and our environment continuously change. Many experimental
and theoretical studies have characterized this process by assuming
that the nervous system uses internal models to compensate for motor
errors. Here we extend these approaches and construct a probabilistic
model that not only compensates for motor errors but estimates the
sources of these errors. These estimates dictate how the nervous
system should generalize. For example, estimated changes of limb
properties will affect movements across the workspace but not
movements with the other limb. We extend previous studies in that area
to account for temporal and context effects. This extended model
explains aspects of savings along with aspects of generalization.
|
[pdf]
|
| Marc Toussaint |
Approximate Inference Control
|
Approximate Inference Control (AICO) is a method for solving
Stochastic Optimal Control (SOC) problems. The general idea is to
think of control as the problem of computing a posterior over
trajectories and control signals conditioned on constraints and
goals. Since exact inference is infeasible in realistic scenarios, the
key for high-speed planning and control algorithms is the choice of
approximations. In this talk I will introduce to the general approach,
discuss its intimate relations to DDP and the current research on
Kalman's duality, and discuss the approximations that we use to get
towards real-time planning in high-dimensional robotic systems. I will
also mention recent work on using Expectation Propagation and
truncated Gaussians for inference under hard constraints and limits as
they typically arise in robotics (collision and joint limit
constraints).
|
[pdf]
|
| Miroslav Karny |
Probabilistic Design: Promises and Prospects |
The Fully Probabilistic Design (FPD) suggests a probabilistic description
of the closed control loop behaviour as well as desired closed-loop
behaviour. The optimal control strategy is selected as the minimiser of the
Kullback-Leibler divergence of these distributions. The approach yields: (i)
an explicit minimiser with the evaluation reduced to a conceptually feasible
solution of integral equations; (ii) a randomised optimal strategy; (iii) a
proper subset of FPDs formed via standard Bayesian designs; (iv) uncertain
knowledge, multiple control goals, and optimisation constrains be expressed
in the common probabilistic language. It implies: (i) an easier approximation
of the dynamic programming counterpart; (ii) the optimal strategy is
naturally explorative; (iii) the goals-expressing ideal distribution can be,
even recursively, tailored to the observed closed-loop behavior; (iv) an
opportunity to automatically harmonise knowledge and goals within a flat
cooperation structure of decentralised task.
An importance of the last point has been confirmed by a huge amount of
societal/industrial problems that cannot be governed in a centralised way. The
anticipated decentralised solution based on the FPD may concern either a number
of interacting, locally independent elements, which have their local goals,
but have to collaborate to reach a common group goal (e.g. cooperative robots,
multi-agent systems, etc.); or a set of independent elements with own goals
that need to coordinate their activities (e.g. transportation).
The talk will recall the basic properties of FPD and discusses the promises of
an exploitation of the FPD potential.
[pdf]
|
[pdf]
|
| Nicholas Roy |
Planning under Uncertainty using Distributions over Posteriors |
Modern control theory has provided a large number of tools for dealing
with probabilistic systems. However, most of these tools solve for
local policies; there are relatively few tools for solving for complex
plans that, for instance, gather information. In contrast, the
planning community has provided ways to compute plans that handle
complex probabilistic uncertainty, but these often don't work for
large or continuous problems. Recently, our group has developed
techniques for planners that can efficiently search for complex plans
in probabilistic domains by taking advantage of local solutions
provided by feedback and open-loop controllers, and predicting a
distribution over the posteriors. This approach of planning over
distributions of posteriors can incorporate a surprisingly wide
variety of sensor models and objective functions. I will show some
results in a couple of domains including helicopter flight in
GPS-denied environments. |
[pdf]
|
| Roderick Murray-Smith |
Probabilistic Control in Human Computer Interaction |
Continuous interaction with computers can be treated as a control problem
subject to various sources of uncertainty. We present examples of
interaction based on multiple noisy sensors (capacitive sensing, location-
and bearing sensing and EEG), in domains which rely on inference about user
intention, and where the use of particle filters can improve performance. We
use the "H-metaphor" for automated, flexibly handover of level of autonomy
in control, as a function of the certainty of control actions from the user,
in an analogous fashion to 'loosening the reins' when
horse-riding. Integration of the inference mechanisms with probabilistic
feedback designs can have a significant effect on behaviour, and some
examples are presented. (Joint work with John Williamson, Simon Rogers and
Steven Strachan). |
[pdf]
|
| Bert Kappen |
KL Control Theory and Decision making under Uncertainty |
KL control theory consists of a class of control problems for which
the control computation can be solved as a graphical model inference
problem. In this talk, we show how to apply this theory in the context of
a delayed choice task and for collaborating agents. We first introduce
the KL control framework. Then we show that in a delayed reward task
when the future is uncertain it is optimal to delay the timing of your
decision. We show preliminary results on human subjects that confirm
this prediction. Subsequently, we discuss two player games, such as the
stag-hunt game, where collaboration can improve or worsten as a result
of recursive reasoning about the opponents actions. The Nash equilibria
appear as local minima of the optimal cost to go, but may disappear when
monetary gain decreases. This behaviour is in agreement with
experimental findings in humans.
|
[pdf]
|
| Emanuel Todorov |
Linear Bellman Equations: Theory and Applications |
I will provide a brief overview of a class stochastic optimal control
problems recently developed by our group as well as by Bert Kappen's group.
This problem class is quite general and yet has a number of unique
properties, including linearity of the exponentially-transformed
(Hamilton-Jacobi) Bellman equation, duality with Bayesian inference,
convexity of the inverse optimal control problem, compositionality of
optimal control laws, path-integral representation of the
exponentially-transformed value function. I will then focus on function
approximation methods that exploit the linearity of the Bellman equation,
and illustrate how such methods scale to high-dimensional continuous
dynamical systems. Computing the weights for a fixed set of basis functions
can be done very efficiently by solving a large but sparse linear problem.
This enables us to work with hundreds of millions of (localized) bases.
Still, the volume of a high-dimensional state space is too large to be
filled with localized bases, forcing us to consider adaptive methods for
positioning and shaping those bases. Several such methods will be compared.
|
[pdf]
|