David Janz

now a postdoc at University of Alberta

Email:

I’m a PhD student, supervised by José Miguel Hernández-Lobato and Zoubin Ghahramani. I did my undergraduate degree at the University of Oxford, supervised by Frank Wood and Jan-Willem van de Meent. My core interests are: exploration in reinforcement learning, Bayesian optimisation and bandits.

Publications

Adapting the Linearised Laplace Model Evidence for Modern Deep Learning

Javier Antorán, David Janz, James Urquhart Allingham, Erik A. Daxberger, Riccardo Barbano, Eric T. Nalisnick, José Miguel Hernández-Lobato, 2022. (In 39th International Conference on Machine Learning). Edited by Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvári, Gang Niu, Sivan Sabato. PMLR. Proceedings of Machine Learning Research.

Abstract▼ URL

The linearised Laplace method for estimating model uncertainty has received renewed attention in the Bayesian deep learning community. The method provides reliable error bars and admits a closed-form expression for the model evidence, allowing for scalable selection of model hyperparameters. In this work, we examine the assumptions behind this method, particularly in conjunction with model selection. We show that these interact poorly with some now-standard tools of deep learning–stochastic approximation methods and normalisation layers–and make recommendations for how to better adapt this classic method to the modern setting. We provide theoretical support for our recommendations and validate them empirically on MLPs, classic CNNs, residual networks with and without normalisation layers, generative autoencoders and transformers.

Bandit optimisation of functions in the Matérn kernel RKHS

David Janz, David Burt, Javier Gonzalez, 2020. (In 23rd International Conference on Artificial Intelligence and Statistics).

Abstract▼ URL

We consider the problem of optimising functions in the reproducing kernel Hilbert space (RKHS) of a Matérn kernel with smoothness parameter u over the domain [0,1]^d under noisy bandit feedback. Our contribution, the π-GP-UCB algorithm, is the first practical approach with guaranteed sublinear regret for all u gt;1 and d ≥ 1. Empirical validation suggests better performance and drastically improved computational scalablity compared with its predecessor, Improved GP-UCB.

Successor Uncertainties: exploration and uncertainty in temporal difference learning

David Janz, Jiri Hron, Przemyslaw Mazur, José Miguel Hernández-Lobato, Katja Hofmann, Sebastian Tschiatschek, 2019. (NeurIPS).

Abstract▼ URL

Posterior sampling for reinforcement learning (PSRL) is an effective method for balancing exploration and exploitation in reinforcement learning. Randomised value functions (RVF) can be viewed as a promising approach to scaling PSRL. However, we show that most contemporary algorithms combining RVF with neural network function approximation do not possess the properties which make PSRL effective, and provably fail in sparse reward problems. Moreover, we find that propagation of uncertainty, a property of PSRL previously thought important for exploration, does not preclude this failure. We use these insights to design Successor Uncertainties (SU), a cheap and easy to implement RVF algorithm that retains key properties of PSRL. SU is highly effective on hard tabular exploration benchmarks. Furthermore, on the Atari 2600 domain, it surpasses human performance on 38 of 49 games tested (achieving a median human normalised score of 2.09), and outperforms its closest RVF competitor, Bootstrapped DQN, on 36 of those.