David Kristjanson Duvenaud
Publications | Videos | Misc | TalksE-mail me at: dkd23@cam.ac.uk
My interests lie mainly in machine learning, nonparametric modeling, and inference. My advisors at Cambridge are Carl Rasmussen and Zoubin Ghahramani. My previous advisor was Kevin Murphy at the University of British Columbia, where I worked mostly on machine vision.
I spent last summer at the Max Planck Institute for Intelligent Systems. I spent the two summers before that at Google Research, in the Video Content Analysis group. In 2006, I co-founded Invenia, a small energy forecasting and trading firm where I still consult.
Publications
|
|
Warped Mixtures for Nonparametric Cluster Shapes If you fit a mixture of Gaussians to a single cluster, but that cluster is curved or heavy-tailed, your model will report that the data contains many clusters! We instead introduce a model which warps a latent mixture of Gaussians to produce nonparametric cluster shapes. The possibly low-dimensional latent mixture model allows one to summarize the properties of the high-dimensional clusters (or density manifolds) describing the data. Tomoharu Iwata, David Duvenaud, Zoubin Ghahramani Uncertainty in Artificial Intelligence, 2013 preprint | code | slides | bibtex |
|
|
Structure Discovery in Nonparametric Regression through Compositional Kernel Search How could an AI do statistics? To search through an open-ended class of structured nonparametric regression models, we introduce a simple grammar which specifies composite kernels. These structured models often allow an interpretable decomposition of the function being modeled, as well as long-range extrapolation. Many common regression methods are special cases of this large family of models. David Duvenaud, James Robert Lloyd, Roger Grosse, Joshua B. Tenenbaum, Zoubin Ghahramani International Conference on Machine Learning, 2013 preprint | code | slides | bibtex |
|
|
Active Learning of Model Evidence using Bayesian Quadrature Instead of the usual Monte-Carlo based methods for computing integrals of likelihood functions, we instead construct a model of the likelihood function, and infer its integral conditioned on a set of evaluations. This allows us to evaluate the likelihood wherever is most informative, instead of running a Markov chain. The upshot is that we need many fewer samples to estimate integrals. Michael Osborne, David Duvenaud, Roman Garnett, Carl Rasmussen, Stephen Roberts, Zoubin Ghahramani Neural Information Processing Systems, 2012 pdf | code | slides | bibtex |
|
Optimally-Weighted Herding is Bayesian Quadrature We prove several connections between an efficient procedure for estimating moments (herding) which minimizes a worst-case error, and a model-based way of estimating integrals (Bayesian Quadrature). It turns out that both are optimizing the same criterion, and that Bayesian Quadrature is doing this in an optimal way. This means, among other things, that we can place worst-case error bounds on the optimal Bayesian estimator! The talk slides also contain strong equivalences between Bayesian Quadrature and kernel two-sample tests, the Hilbert-Schmidt Independence Criterion, and the Determinantal Point Processes MAP objective. Ferenc Huszár and David Duvenaud Uncertainty in Artificial Intelligence, 2012 pdf | code | slides | talk | bibtex |
|
Additive Gaussian Processes We use an algebraic trick to efficiently integrate over all possible ways of modeling a function as a sum of low-dimensional functions. When functions have this low-dimensional additive structure, we can extrapolate further than with standard Gaussian process models. David Duvenaud, Hannes Nickisch, Carl Rasmussen Neural Information Processing Systems, 2011 pdf | code | slides | bibtex |
|
Multiscale Conditional Random Fields for Semi-supervised Labeling and Classification How can we take advantage of images labeled only by what objects they contain? By combining information across different scales, we use image labels to infer what different objects look like at the pixel-level, and where they occur in images. David Duvenaud, Benjamin Marlin, Kevin Murphy Canadian Conference on Computer and Robot Vision, 2011 pdf | code | slides | bibtex |
|
Causal Learning without DAGs When predicting the results of new actions, it's often better to simply average over flexible conditional models than to attempt to identify the true causal structure as embodied by a directed acyclic graph. David Duvenaud, Daniel Eaton, Kevin Murphy, Mark Schmidt Journal of Machine Learning Research, W&CP, 2010 pdf | code | slides | poster | bibtex |
Videos
|
|
HarlMCMC Shake Two short animations illustrate the differences between a Metropolis-Hastings (MH) sampler and a Hamiltonian Monte Carlo (HMC) sampler, to the tune of the Harlem shake. This inspired several followup videos - benchmark your MCMC algorithm on these distributions! Tamara Broderick and David Duvenaud youtube | code |
|
Visualizing a draw from a deep Gaussian process A short video showing an initially simple distribution being warped by successive random functions. Andreas Damianou and Neil Lawrence are working on variational inference in these models. youtube |
|
Evolution of Locomotion A fun project from my undergrad: using the genetic algorithm (a terrible algorithm!) to learn locomotion strategies. The plan was for the population to learn to walk, but instead they evolved falling, rolling and shaking strategies. Eventually they exploited numerical problems in the physics engine to achieve arbitrarily high fitness, without ever having learned to walk! youtube |
Misc
|
The Kernel Cookbook Have you ever wondered which kernel to use for Gaussian process regression? This tutorial goes through the basic properties of functions that you can express by choosing or combining kernels, along with lots of examples. html |
|
Causal Learning Contest: Winning Entry I won a biological prediction contest, the DREAM4 Predictive Signaling Network Modeling Challenge, using a very simple model inspired by our causal learning paper: a Gaussian process regression from actions to protein concentrations. My takeaway: For prediction, it's usually better to learn an ensemble of flexible nonparamteric models than to try to identify a single, but more interpretable, model. writeup |
|
M.Sc. Thesis: Multiscale Conditional Random Fields for Machine Vision. A CRF-based method for doing semi-supervised learning in machine vision, by letting evidence flow between different scales of the image. The 2011 CRV paper is a clearer write-up of the work, but without quite as many figures. University of British Columbia, 2010 pdf | code | bibtex |
Talks
|
Introduction to Probabilistic Programming and Automated Inference Computational and Biological Learning Lab, University of Cambridge, March 2013 What is the most general class of statistical models? And how can we perform inference without designing custom algorithms? Just as automated inference algorithms make working with graphical models easy (e.g. BUGS), a new class of automated inference procedures is being developed for the more general case of Turing-complete generative models. In this tutorial, we introduce the practice of specifying generative models as programs which produce a stochastic output, and then automatically performing inference on the execution trace of that program, conditioned on the algorithm having produced a specific output. We give examples of how to specify complex models and run inference in several ways, including recent advances in automatic Hamiltonian Monte Carlo and variational inference. related links With James Robert Lloyd slides |
|
Meta-reasoning and Bounded Rationality Tea talk, Feb 2013 Meatreasoning is simply decision theory applied to choosing computations: Thinking about what to think about. It formalizes of an otherwise ad-hoc part of inference and decision-making, and is presumably necessary for automated modeling. HTML slide software thanks to Christian Steinruecken slides |
|
Sanity Checks Tea talk, April 2012 When can we trust our experiments? I've collected some simple sanity checks that catch a wide class of bugs. slides |
