Publications

Deep Bayesian Active Learning with Image Data

Deep Bayesian Active Learning with Image Data Link to this paper

Even though active learning forms an important pillar of machine learning, deep learning tools are not prevalent within it. Relying on Bayesian approaches to deep learning, in this paper we combine recent advances in Bayesian deep learning into the active learning framework in a practical way. We develop an active learning framework for high dimensional data, a task which has been extremely challenging so far with very sparse existing literature.
Yarin Gal, Riashat Islam, Zoubin Ghahramani
Bayesian Deep Learning workshop, NIPS, 2016
[PDF] [Poster] [BibTex]

Thesis: Uncertainty in Deep Learning

Thesis: Uncertainty in Deep Learning Link to this paper

So I finally submitted my PhD thesis. In it I organised the already published results on how to obtain uncertainty in deep learning, and collected lots of bits and pieces of new research I had lying around (which I hadn't had the time to publish yet).
Yarin Gal
PhD Thesis, 2016
[PDF] [Blog post] [BibTex]

A Theoretically Grounded Application of Dropout in Recurrent Neural Networks

A Theoretically Grounded Application of Dropout in Recurrent Neural Networks Link to this paper

We present a new technique for recurrent neural network regularisation, relying on recent results at the intersection of Bayesian modelling and deep learning. Our RNN dropout variant is theoretically motivated and its effectiveness is demonstrated empirically, with the new approach improving on the single model state-of-the-art in language modelling with the Penn Treebank (73.4 test perplexity). This extends our arsenal of variational tools in deep learning.
Yarin Gal, Zoubin Ghahramani
arXiv, 2015
[arXiv] [Software] [BibTex]
Data-Efficient Machine Learning workshop, ICML, 2016
[Paper] [Poster]
NIPS, 2016
[Paper] [BibTex]

Improving PILCO with Bayesian Neural Network Dynamics Models

Improving PILCO with Bayesian Neural Network Dynamics Models Link to this paper

We attempt to answer PILCO's shortcomings by replacing its Gaussian process with a Bayesian deep dynamics model, while maintaining the framework’s probabilistic nature and its data-efficiency benefits. This task poses several interesting difficulties. First, we have to handle small data, and neural networks are notoriously known for their tendency to overfit. Furthermore, we must retain PILCO's ability to capture 1) dynamics model output uncertainty and 2) input uncertainty.
Yarin Gal, Rowan Mcallister and Carl E. Rasmussen
Data-Efficient Machine Learning workshop, ICML, 2016
[Paper] [Abstract] [Poster] [BibTex]

On Modern Deep Learning and Variational Inference

On Modern Deep Learning and Variational Inference Link to this paper

Bayesian modelling and variational inference are rooted in Bayesian statistics, and easily benefit from the vast literature in the field. In contrast, deep learning lacks a solid mathematical grounding. Instead, empirical developments in deep learning are often justified by metaphors, evading the unexplained principles at play. In this paper we extend previous results casting modern deep learning models as performing approximate variational inference in a Bayesian setting, and survey open problems to research.
Yarin Gal, Zoubin Ghahramani
Advances in Approximate Bayesian Inference workshop, NIPS, 2015
[PDF] [Poster] [BibTex]
We thank the workshop organisers for the travel award.

Rapid Prototyping of Probabilistic Models: Emerging Challenges in Variational Inference

Rapid Prototyping of Probabilistic Models: Emerging Challenges in Variational Inference Link to this paper

Perhaps ironically, the deep learning community is far closer to our vision of ``automated modelling'' than the probabilistic modelling community. Many complex models in deep learning can be easily implemented and tested, while variational inference (VI) techniques require specialised knowledge and long development cycles, making them extremely challenging for non-experts. We discuss a possible solution lifted from manufacturing. Similar ideas in deep learning have led to rapid development in model complexity, speeding up the innovation cycle.
Yarin Gal
Advances in Approximate Bayesian Inference workshop, NIPS, 2015
[PDF] [Poster] [BibTex]
We thank the workshop organisers for the travel award.

Bayesian Convolutional Neural Networks with Bernoulli Approximate Variational Inference

Bayesian Convolutional Neural Networks with Bernoulli Approximate Variational Inference Link to this paper

We present an efficient Bayesian convolutional neural network (convnet). The model offers better robustness to over-fitting on small data and achieves a considerable improvement in classification accuracy compared to previous approaches. We give state-of-the-art results on CIFAR-10 following our insights.
Yarin Gal, Zoubin Ghahramani
arXiv, 2015
[arXiv] [Software] [BibTex]
ICLR workshop, 2016
[CMT Reviews] [OpenReview] [BibTex]

Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning

Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning Link to this paper

We show that dropout in multilayer perceptron models (MLPs) can be interpreted as a Bayesian approximation. Results are obtained for modelling uncertainty for dropout MLP models - extracting information that has been thrown away so far, from existing models. This mitigates the problem of representing uncertainty in deep learning without sacrificing computational performance or test accuracy.
Yarin Gal, Zoubin Ghahramani
arXiv, 2015
[arXiv] [BibTex] [Appendix] [BibTex] [Software]
Invited for presentation at the First Deep Learning Symposium at NIPS 2015.
ICML, 2016
[Paper] [Presentation] [Poster] [BibTex]
We thank ICML for the travel award.

Dropout as a Bayesian Approximation: Insights and Applications

Dropout as a Bayesian Approximation: Insights and Applications Link to this paper

Deep learning techniques lack the ability to reason about uncertainty over the features. We show that a multilayer perceptron (MLP) with arbitrary depth and non-linearities, with dropout applied after every weight layer, is mathematically equivalent to an approximation to a well known Bayesian model. This paper is a short version of the appendix of "Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning".
Yarin Gal, Zoubin Ghahramani
Deep Learning Workshop, ICML, 2015
[PDF] [Poster] [BibTex]

An Infinite Product of Sparse Chinese Restaurant Processes

An Infinite Product of Sparse Chinese Restaurant Processes Link to this paper

We define a new process that gives a natural generalisation of the Indian buffet process (used for binary feature allocation) into categorical latent features. For this we take advantage of different limit parametrisations of the Dirichlet process and its generalisation the Pitman–Yor process.
Yarin Gal, Tomoharu Iwata, Zoubin Ghahramani
10th Conference on Bayesian Nonparametrics (BNP), 2015
[Presentation] [BibTex]
We thank BNP for the travel award.

Improving the Gaussian Process Sparse Spectrum Approximation by Representing Uncertainty in Frequency Inputs

Improving the Gaussian Process Sparse Spectrum Approximation by Representing Uncertainty in Frequency Inputs Link to this paper

Standard sparse pseudo-input approximations to the Gaussian process (GP) cannot handle complex functions well. Sparse spectrum alternatives attempt to answer this but are known to over-fit. We use variational inference for the sparse spectrum approximation to avoid both issues. We extend the approximate inference to the distributed and stochastic domains.
Yarin Gal, Richard Turner
ICML, 2015
[PDF] [Presentation] [Poster] [Software] [BibTex]

Latent Gaussian Processes for Distribution Estimation of Multivariate Categorical Data

Latent Gaussian Processes for Distribution Estimation of Multivariate Categorical Data Link to this paper

Multivariate categorical data occur in many applications of machine learning. One of the main difficulties with these vectors of categorical variables is sparsity. The number of possible observations grows exponentially with vector length, but dataset diversity might be poor in comparison. Recent models have gained significant improvement in supervised tasks with this data. These models embed observations in a continuous space to capture similarities between them. Building on these ideas we propose a Bayesian model for the unsupervised task of distribution estimation of multivariate categorical data.
Yarin Gal, Yutian Chen, Zoubin Ghahramani
Workshop on Advances in Variational Inference, NIPS, 2014
[PDF] [Poster] [Presentation] [BibTex]
ICML, 2015
[PDF] [Presentation] [Poster] [Software] [BibTex]
We thank Google DeepMind for the travel award.

Distributed Variational Inference in Sparse Gaussian Process Regression and Latent Variable Models

Distributed Variational Inference in Sparse Gaussian Process Regression and Latent Variable Models Link to this paper

We develop parallel inference for sparse Gaussian process regression and latent variable models. These processes are used to model functions in a principled way and for non-linear dimensionality reduction in linear time complexity. Using parallel inference we allow the models to work on much larger datasets than before.
Yarin Gal, Mark van der Wilk, Carl E. Rasmussen
Workshop on New Learning Models and Frameworks for Big Data, ICML, 2014
[arXiv] [Presentation] [Software] [BibTex]
NIPS, 2014
[PDF] [BibTex]
We thank NIPS for the travel award.

Feature Partitions and Multi-View Clusterings

Feature Partitions and Multi-View Clusterings Link to this paper

We define a new combinatorial structure that unifies Kingman's random partitions and Broderick, Pitman, and Jordan's feature frequency models. This structure underlies non-parametric multi-view clustering models, where data points are simultaneously clustered into different possible clusterings. The de Finetti measure is a product of paintbox constructions. Studying the properties of feature partitions allows us to understand the relations between the models they underlie and share algorithmic insights between them.
Yarin Gal, Zoubin Ghahramani
International Society for Bayesian Analysis (ISBA), 2014
[Link] [Poster]
We thank ISBA for the travel award.

Dirichlet Fragmentation Processes

Dirichlet Fragmentation Processes Link to this paper

We introduce a new class of models over trees based on the theory of fragmentation processes. The Dirichlet Fragmentation Process Mixture Model is an example model derived from this new class. This model has efficient and simple inference, and significantly outperforms existing approaches for hierarchical clustering and density modelling.
Hong Ge, Yarin Gal, Zoubin Ghahramani
In submission, 2014
[PDF] [BibTex]

Pitfalls in the use of Parallel Inference for the Dirichlet Process

Pitfalls in the use of Parallel Inference for the Dirichlet Process Link to this paper

We show that the recently suggested parallel inference for the Dirichlet process is conceptually invalid. The Dirichlet process is important for many fields such as natural language processing. However the suggested inference would not work in most real-world applications.
Yarin Gal, Zoubin Ghahramani
Workshop on Big Learning, NIPS, 2013
[PDF] [Presentation] [BibTex]
ICML, 2014
[PDF] [Talk] [Presentation] [Poster] [BibTex]

Variational Inference in the Gaussian Process Latent Variable Model and Sparse GP Regression – a Gentle Tutorial

Variational Inference in the Gaussian Process Latent Variable Model and Sparse GP Regression – a Gentle Tutorial Link to this paper

We present an in-depth and self-contained tutorial for sparse Gaussian Process (GP) regression. We also explain GP latent variable models, a tool for non-linear dimensionality reduction. The sparse approximation reduces the time complexity of the models from cubic to linear but its development is scattered across the literature. The various results are collected here.
Yarin Gal, Mark van der Wilk
Tutorial, 2014
[arXiv] [BibTex]

Semantics, Modelling, and the Problem of Representation of Meaning – a Brief Survey of Recent Literature

Semantics, Modelling, and the Problem of Representation of Meaning – a Brief Survey of Recent Literature Link to this paper

Over the past 50 years many have debated what representation should be used to capture the meaning of natural language utterances. Recently new needs of such representations have been raised in research. Here I survey some of the interesting representations suggested to answer for these new needs.
Yarin Gal
Literature survey, 2013
[arXiv] [BibTex]

A Systematic Bayesian Treatment of the IBM Alignment Models

A Systematic Bayesian Treatment of the IBM Alignment Models Link to this paper

We used a non-parametric process — the hierarchical Pitman–Yor process — in models that align words between pairs of sentences. These alignment models are used at the core of all machine translation systems. We obtained a significant improvement in translation using the process.

Yarin Gal, Phil Blunsom
Association for Computational Linguistics (NA-ACL), 2013
[PDF] [Presentation] [BibTex]

Relaxing HMM Alignment Model Assumptions for Machine Translation Using a Bayesian Approach

Relaxing HMM Alignment Model Assumptions for Machine Translation Using a Bayesian Approach Link to this paper

We used a non-parametric process — the hierarchical Pitman–Yor process — to relax some of the restricting assumptions often used in machine translation. When a long history of word alignments is not available the process falls-back onto shorter histories in a principled way.
Yarin Gal
Master's Dissertation, 2012
[PDF] [BibTex]

Overcoming Alpha-Beta Limitations Using Evolved Artificial Neural Networks

Overcoming Alpha-Beta Limitations Using Evolved Artificial Neural Networks Link to this paper

We trained a feed-forward neural network to play checkers. The network acts as both the value function for a min-max algorithm and a heuristic for pruning tree branches in a reinforcement learning setting. We used no supervised signal for training - a set of networks was assessed by playing against each-other and the winning networks' weights were adapted following the ES algorithm.
Yarin Gal, Mireille Avigal
Machine Learning and Applications (IEEE), 2010
[Paper] [BibTex]

Contact me

Email

yg279 -at- cam.ac.uk

Post

Cambridge University
Engineering Department
Cambridge, CB2 1PZ
United Kingdom