David Kristjanson Duvenaud
Publications  Videos  Misc  TalksI'm a final year Ph.D. candidate whose interests lie mainly in machine learning, inference, and nonparametric modeling. My advisors at Cambridge are Carl Rasmussen and Zoubin Ghahramani. My previous advisor was Kevin Murphy at the University of British Columbia, where I worked mostly on machine vision.
This fall I'll be starting a postdoc at the Harvard Intelligent Probabilistic Systems group, working on modelbased optimization with Ryan Adams.
I spent a summer at the Max Planck Institute for Intelligent Systems, and the two summers before that at Google Research, in the Video Content Analysis group.
I cofounded Invenia, an energy forecasting and trading firm where I still consult.
Email me at: dkd23@cam.ac.uk
Publications
PhD Thesis: Automatic Model Construction with Gaussian Processes
I tried to provide a readable introduction to the automatic model building project. Chapter 2 even contains a detailed tutorial on how to express structure using kernels. Links to individual chapters:
 
Probabilistic ODE Solvers with RungeKutta Means
We show that some standard differential equation solvers are equivalent to Gaussian process predictive means, giving them a natural way to handle uncertainty. Michael Schober, David Duvenaud, Philipp HennigUnder review pdf  bibtex  
Automatic Construction and NaturalLanguage Description of Nonparametric Regression Models
We wrote a program which automatically writes reports summarizing automatically constructed models. James Robert Lloyd, David Duvenaud, Roger Grosse, Joshua B. Tenenbaum, Zoubin GhahramaniAssociation for the Advancement of Artificial Intelligence (AAAI), 2014 pdf  slides  code  example report  airline  example report  solar  more examples  bibtex  
Avoiding Pathologies in Very Deep Networks
To help suggest better deep neural network architectures, we analyze the related problem of constructing useful priors on compositions of functions. We study deep Gaussian process, a type of infinitelywide, deep neural net. We also examine deep covariance functions, obtained by composing infinitely many feature transforms. Finally, we characterize the model class you get if you do dropout on Gaussian processes. David Duvenaud, Oren Rippel, Ryan Adams, Zoubin GhahramaniArtificial Intelligence and Statistics, 2014 pdf  code  slides  video of 50layer warping  bibtex  
Active Learning of Intuitive Control Knobs for Synthesizers Using Gaussian Processes
To enable composers to directly adjust personalized highlevel qualities during sound synthesis, our system actively learns functions that map from the space of synthesizer control parameters to perceived levels of highlevel qualities. Anna Huang, David Duvenaud, Kenneth Arnold, Brenton Partridge, Josiah Wolf Oberholtzer, Krzysztof GajosIntelligent User Interfaces, 2014 pdf  slides  workshop version  example sounds  bibtex  
Raiders of the Lost Architecture: Kernels for Bayesian Optimization in Conditional Parameter Spaces
We may wish to optimize over neural network architectures with an unknown number of layers. To relate performance data gathered for different architectures, we define a new kernel for conditional parameter spaces that explicitly includes information about which parameters are relevant in a given structure. Kevin Swersky, David Duvenaud, Jasper Snoek, Frank Hutter, Michael OsborneNIPS workshop on Bayesian optimization, 2013 pdf  bibtex  
Warped Mixtures for Nonparametric Cluster Shapes
If you fit a mixture of Gaussians to a single cluster that is curved or heavytailed, your model will report that the data contains many clusters! To fix this problem, we simply warp a latent mixture of Gaussians to produce nonparametric cluster shapes. The lowdimensional latent mixture model summarizes the properties of the highdimensional clusters (or density manifolds) describing the data. Tomoharu Iwata, David Duvenaud, Zoubin GhahramaniUncertainty in Artificial Intelligence, 2013 pdf  code  slides  talk  bibtex  
Structure Discovery in Nonparametric Regression through Compositional Kernel Search
How could an AI do statistics? To search through an openended class of structured, nonparametric regression models, we introduce a simple grammar which specifies composite kernels. These structured models often allow an interpretable decomposition of the function being modeled, as well as longrange extrapolation. Many common regression methods are special cases of this large family of models. David Duvenaud, James Robert Lloyd, Roger Grosse, Joshua B. Tenenbaum, Zoubin GhahramaniInternational Conference on Machine Learning, 2013 pdf  code  short slides  long slides  bibtex  
Active Learning of Model Evidence using Bayesian Quadrature
Instead of the usual MonteCarlo based methods for computing integrals of likelihood functions, we instead construct a model of the likelihood function, and infer its integral conditioned on a set of evaluations. This allows us to evaluate the likelihood wherever is most informative, instead of running a Markov chain. The upshot is that we need many fewer samples to estimate integrals. Michael Osborne, David Duvenaud, Roman Garnett, Carl Rasmussen, Stephen Roberts, Zoubin GhahramaniNeural Information Processing Systems, 2012 pdf  code  slides  bibtex  
OptimallyWeighted Herding is Bayesian Quadrature
We prove several connections between an efficient procedure for estimating moments (herding) which minimizes a worstcase error, and a modelbased way of estimating integrals (Bayesian Quadrature). It turns out that both are optimizing the same criterion, and that Bayesian Quadrature is doing this in an optimal way. This means, among other things, that we can place worstcase error bounds on the optimal Bayesian estimator! The talk slides also contain strong equivalences between Bayesian Quadrature and kernel twosample tests, the HilbertSchmidt Independence Criterion, and the Determinantal Point Processes MAP objective. Ferenc Huszár and David DuvenaudUncertainty in Artificial Intelligence, 2012 pdf  code  slides  talk  bibtex  
Additive Gaussian Processes
When functions have additive structure, we can extrapolate further than with standard Gaussian process models. We use an algebraic trick to efficiently integrate over eponentiallymany ways of modeling a function as a sum of lowdimensional functions. David Duvenaud, Hannes Nickisch, Carl RasmussenNeural Information Processing Systems, 2011 pdf  code  slides  bibtex  
Multiscale Conditional Random Fields for Semisupervised Labeling and Classification
How can we take advantage of images labeled only by what objects they contain? By combining information across different scales, we use wholeimage labels to infer what objects look like at the pixellevel, and where they occur in images. David Duvenaud, Benjamin Marlin, Kevin MurphyConference on Computer and Robot Vision, 2011 pdf  code  slides  bibtex  
Causal Learning without DAGs
When predicting the results of new actions, it's sometimes better to simply average over flexible conditional models than to attempt to identify the true causal structure as embodied by a directed acyclic graph. David Duvenaud, Daniel Eaton, Kevin Murphy, Mark SchmidtJournal of Machine Learning Research, W&CP, 2010 pdf  code  slides  poster  bibtex  
Videos
Visualizing Mappings of Deep Functions
By viewing deep networks as a prior on functions, we can ask which architectures give rise to which sorts of mappings. Here we visualize a mapping drawn from a deep Gaussian process, using the inputconnected architecture described in this paper. youtube  code 

Machine Learning to Drive
Andrew McHutchon and Carl Rasmussen are working on a modelbased reinforcement learning system that can learn from small amounts of experience. For fun, we hooked up a 3D physics engine to the learning system, and tried to get it to learn to drive a simple twowheel car in a certain direction, starting with no knowledge of the dynamics. It only took about 10 seconds of practice to solve the problem, although not in realtime. Details are in the video description. by Andrew McHutchon and David Duvenaudyoutube  related paper  
HarlMCMC Shake
Two short animations illustrate the differences between a MetropolisHastings (MH) sampler and a Hamiltonian Monte Carlo (HMC) sampler, to the tune of the Harlem shake. This inspired several followup videos  benchmark your MCMC algorithm on these distributions! by Tamara Broderick and David Duvenaudyoutube  code  
Visualizing a draw from a deep Gaussian process
A short video showing an initially simple distribution being warped by successive random functions. Andreas Damianou and Neil Lawrence are working on variational inference in these models.youtube 

Evolution of Locomotion
A fun project from undergrad: using the genetic algorithm (a terrible algorithm!) to learn locomotion strategies. The plan was for the population to learn to walk, but instead they evolved falling, rolling and shaking strategies. Eventually they exploited numerical problems in the physics engine to achieve arbitrarily high fitness, without ever having learned to walk! youtube  
Misc
Kernel Cookbook
Have you ever wondered which kernel to use for Gaussian process regression? This tutorial goes through the basic properties of functions that you can express by choosing or combining kernels, along with lots of examples. html  
Causal Learning Contest: Winning Entry
I won a biological prediction contest, the DREAM4 Predictive Signaling Network Modeling Challenge, using a very simple model inspired by our causal learning paper: a Gaussian process regression from actions to protein concentrations. My takeaway: For prediction, it's usually better to learn an ensemble of flexible nonparamteric models than to try to identify a single, but more interpretable, model. writeup  
M.Sc. Thesis: Multiscale Conditional Random Fields for Machine Vision.
We developed a CRFbased method for doing semisupervised learning in machine vision, by letting evidence flow between different scales of the image. The 2011 CRV paper is a clearer writeup of the work, but without quite as many figures. University of British Columbia, 2010pdf  code  bibtex 

Talks
Fast Random Feature Expansions
A fundamental result of Johnson and Lindenstrauss states that one may randomly project a collection of data points into a lower dimensional space while preserving pairwise point distances. Recent developments have gone even further: nonlinear randomised projections can be used to approximate kernel machines and scale them to datasets with millions of features and samples. In this talk we will explore the theoretical aspects of the random projection method, and demonstrate its effectiveness on nonlinear regression problems. We also motivate and describe the recent 'Fastfood' method. With David LopezPazComputational and Biological Learning Lab, University of Cambridge, November 2013 slides  video  code  
Introduction to Probabilistic Programming and Automated Inference What is the most general class of statistical models? And how can we perform inference without designing custom algorithms? Just as automated inference algorithms make working with graphical models easy (e.g. BUGS), a new class of automated inference procedures is being developed for the more general case of Turingcomplete generative models. In this tutorial, we introduce the practice of specifying generative models as programs which produce a stochastic output, and then automatically performing inference on the execution trace of that program, conditioned on the algorithm having produced a specific output. We give examples of how to specify complex models and run inference in several ways, including recent advances in automatic Hamiltonian Monte Carlo and variational inference. related links With James Robert LloydComputational and Biological Learning Lab, University of Cambridge, March 2013 slides  
Metareasoning and Bounded Rationality
Metareasoning is simply decision theory applied to choosing computations: Thinking about what to think about. It formalizes of an otherwise adhoc part of inference and decisionmaking, and is presumably necessary for automated modeling. HTML slide software thanks to Christian SteinrueckenTea talk, February 2013 slides  
Sanity Checks When can we trust our experiments? I've collected some simple sanity checks that catch a wide class of bugs. Roger Grosse and I also wrote a short tutorial (part one and part two) on how to write correct machine learning code. Richard Mann also wrote a gripping blog post of the aftermath of finding a subtle bug in one of his landmark papers. Tea talk, April 2012slides  