Ed Snelson

now a Head of Data Science at Marks and Spencer

Email:

Publications

Compact Approximations to Bayesian Predictive Distributions

Edward Snelson, Zoubin Ghahramani, August 2005. (In 22nd International Conference on Machine Learning). Bonn, Germany. Omnipress.

Abstract▼ URL

We provide a general framework for learning precise, compact, and fast representations of the Bayesian predictive distribution for a model. This framework is based on minimizing the KL divergence between the true predictive density and a suitable compact approximation. We consider various methods for doing this, both sampling based approximations, and deterministic approximations such as expectation propagation. These methods are tested on a mixture of Gaussians model for density estimation and on binary linear classification, with both synthetic data sets for visualization and several real data sets. Our results show significant reductions in prediction time and memory footprint.

Sparse Gaussian Processes using Pseudo-inputs

Edward Snelson, Zoubin Ghahramani, 2006. (In Advances in Neural Information Processing Systems 18). Edited by Y. Weiss, B. Schölkopf, J. Platt. Cambridge, MA. The MIT Press.

Abstract▼ URL

We present a new Gaussian process (GP) regression model whose covariance is parameterized by the the locations of M pseudo-input points, which we learn by a gradient based optimization. We take M<<N, where N is the number of real data points, and hence obtain a sparse regression method which has O(NM2) training cost and O(M2) prediction cost per test case. We also find hyperparameters of the covariance function in the same joint optimization. The method can be viewed as a Bayesian regression model with particular input dependent noise. The method turns out to be closely related to several other sparse GP approaches, and we discuss the relation in detail. We finally demonstrate its performance on some large data sets, and make a direct comparison to other sparse GP methods. We show that our method can match full GP performance with small M, i.e. very sparse solutions, and it significantly outperforms other approaches in this regime.

Variable noise and dimensionality reduction for sparse Gaussian processes

Edward Snelson, Zoubin Ghahramani, 2006. (In 22nd Conference on Uncertainty in Artificial Intelligence). Edited by R. Dechter, T. S. Richardson. AUAI Press.

Abstract▼ URL

The sparse pseudo-input Gaussian process (SPGP) is a new approximation method for speeding up GP regression in the case of a large number of data points N. The approximation is controlled by the gradient optimization of a small set of M pseudo-inputs, thereby reducing complexity from O(N3) to O(NM2). One limitation of the SPGP is that this optimization space becomes impractically big for high dimensional data sets. This paper addresses this limitation by performing automatic dimensionality reduction. A projection of the input space to a low dimensional space is learned in a supervised manner, alongside the pseudo-inputs, which now live in this reduced space. The paper also investigates the suitability of the SPGP for modeling data with input-dependent noise. A further extension of the model is made to make it even more powerful in this regard - we learn an uncertainty parameter for each pseudo-input. The combination of sparsity, reduced dimension, and input-dependent noise makes it possible to apply GPs to much larger and more complex data sets than was previously practical. We demonstrate the benefits of these methods on several synthetic and real world problems.

Local and global sparse Gaussian process approximations

Edward Snelson, Zoubin Ghahramani, 2007. (In 11th International Conference on Artificial Intelligence and Statistics). Edited by M. Meila, X. Shen. Omnipress.

Abstract▼ URL

Gaussian process (GP) models are flexible probabilistic nonparametric models for regression, classification and other tasks. Unfortunately they suffer from computational intractability for large data sets. Over the past decade there have been many different approximations developed to reduce this cost. Most of these can be termed global approximations, in that they try to summarize all the training data via a small set of support points. A different approach is that of local regression, where many local experts account for their own part of space. In this paper we start by investigating the regimes in which these different approaches work well or fail. We then proceed to develop a new sparse GP approximation which is a combination of both the global and local approaches. Theoretically we show that it is derived as a natural extension of the framework developed by Quiñonero-Candela and Rasmussen for sparse GP approximations. We demonstrate the benefits of the combined approximation on some 1D examples for illustration, and on some large real-world data sets.

Warped Gaussian Processes

Edward Snelson, Carl Edward Rasmussen, Zoubin Ghahramani, December 2004. (In Advances in Neural Information Processing Systems 16). Edited by S. Thrun, L. Saul, B. Schölkopf. Cambridge, MA, USA. The MIT Press. ISBN: 0-262-20152-6.

Abstract▼ URL

We generalise the Gaussian process (GP) framework for regression by learning a nonlinear transformation of the GP outputs. This allows for non-Gaussian processes and non-Gaussian noise. The learning algorithm chooses a nonlinear transformation such that transformed data is well-modelled by a GP. This can be seen as including a preprocessing transformation as an integral part of the probabilistic modelling problem, rather than as an ad-hoc step. We demonstrate on several real regression problems that learning the transformation can lead to significantly better performance than using a regular GP, or a GP with a fixed transformation.