John Bronskill

Email: jfb54@cam.ac.uk

John Bronskill is a postdoc in the Machine Learning Group at the University of Cambridge supervised by Richard Turner . He received a Bachelor’s degree and a Master’s degree in Electrical Engineering from the University of Toronto and a PhD in Engineering from the University of Cambridge.

Publications

Data and computation efficient meta-learning

John Bronskill, November 2020. University of Cambridge, Cambridge, UK.

Abstract▼ URL

In order to make predictions with high accuracy, conventional deep learning systems require large training datasets consisting of thousands or millions of examples and long training times measured in hours or days, consuming high levels of electricity with a negative impact on our environment. It is desirable to have have machine learning systems that can emulate human behavior such that they can quickly learn new concepts from only a few examples. This is especially true if we need to quickly customize or personalize machine learning models to specific scenarios where it would be impractical to acquire a large amount of training data and where a mobile device is the means for computation. We define a data efficient machine learning system to be one that can learn a new concept from only a few examples (or shots) and a computation efficient machine learning system to be one that can learn a new concept rapidly without retraining on an everyday computing device such as a smart phone. In this work, we design, develop, analyze, and extend the theory of machine learning systems that are both data efficient and computation efficient. We present systems that are trained using multiple tasks such that it “learns how to learn” to solve new tasks from only a few examples. These systems can efficiently solve new, unseen tasks drawn from a broad range of data distributions, in both the low and high data regimes, without the need for costly retraining. Adapting to a new task requires only a forward pass of the example task data through the trained network making the learning of new tasks possible on mobile devices. In particular, we focus on few-shot image classification systems, i.e. machine learning systems that can distinguish between numerous classes of objects depicted in digital images given only a few examples of each class of object to learn from.

TaskNorm: rethinking batch normalization for meta-learning

John Bronskill, Jonathan Gordon, James Requeima, Sebastian Nowozin, Richard E. Turner, 2020. (In 37th International Conference on Machine Learning). Proceedings of Machine Learning Research.

Abstract▼ URL

Modern meta-learning approaches for image classification rely on increasingly deep networks to achieve state-of-the-art performance, making batch normalization an essential component of meta-learning pipelines. However, the hierarchical nature of the meta-learning setting presents several challenges that can render conventional batch normalization ineffective, giving rise to the need to rethink normalization in this setting. We evaluate a range of approaches to batch normalization for meta-learning scenarios, and develop a novel approach that we call TASKNORM. Experiments on fourteen datasets demonstrate that the choice of batch normalization has a dramatic effect on both classification accuracy and training time for both gradient based and gradient free meta-learning approaches. Importantly, TASKNORM is found to consistently improve performance. Finally, we provide a set of best practices for normalization that will allow fair comparison of meta-learning algorithms.

Memory efficient meta-learning with large images

John Bronskill, Daniela Massiceti, Massimiliano Patacchiola, Katja Hofmann, Sebastian Nowozin, Richard E. Turner, 2021. (In Advances in Neural Information Processing Systems 35).

Abstract▼ URL

Meta learning approaches to few-shot classification are computationally efficient at test time, requiring just a few optimization steps or single forward pass to learn a new task, but they remain highly memory-intensive to train. This limitation arises because a task’s entire support set, which can contain up to 1000 images, must be processed before an optimization step can be taken. Harnessing the performance gains offered by large images thus requires either parallelizing the meta-learner across multiple GPUs, which may not be available, or trade-offs between task and image size when memory constraints apply. We improve on both options by proposing LITE, a general and memory efficient episodic training scheme that enables meta-training on large tasks composed of large images on a single GPU. We achieve this by observing that the gradients for a task can be decomposed into a sum of gradients over the task’s training images. This enables us to perform a forward pass on a task’s entire training set but realize significant memory savings by back-propagating only a random subset of these images which we show is an unbiased approximation of the full gradient. We use LITE to train meta-learners and demonstrate new state-of-the-art accuracy on the real-world ORBIT benchmark and 3 of the 4 parts of the challenging VTAB+ MD benchmark relative to leading meta-learners. LITE also enables meta-learners to be competitive with transfer learning approaches but at a fraction of the test-time computational cost, thus serving as a counterpoint to the recent narrative that transfer learning is all you need for few-shot classification.

Meta-learning probabilistic inference for prediction

Jonathan Gordon, John Bronskill, Matthias Bauer, Sebastian Nowozin, Richard Turner, April 2019. (In 7th International Conference on Learning Representations). New Orleans.

Abstract▼ URL

This paper introduces a new framework for data efficient and versatile learning. Specifically: 1) We develop ML-PIP, a general framework for Meta-Learning approximate Probabilistic Inference for Prediction. ML-PIP extends existing probabilistic interpretations of meta-learning to cover a broad class of methods. 2) We introduce , an instance of the framework employing a flexible and versatile amortization network that takes few-shot learning datasets as inputs, with arbitrary numbers of shots, and outputs a distribution over task-specific parameters in a single forward pass. Versa substitutes optimization at test time with forward passes through inference networks, amortizing the cost of inference and relieving the need for second derivatives during training. 3) We evaluate on benchmark datasets where the method sets new state-of-the-art results, and can handle arbitrary number of shots, and for classification, arbitrary numbers of classes at train and test time. The power of the approach is then demonstrated through a challenging few-shot ShapeNet view reconstruction task.

Orbit: A real-world few-shot dataset for teachable object recognition

Daniela Massiceti, Luisa Zintgraf, John Bronskill, Lida Theodorou, Matthew Tobias Harris, Edward Cutrell, Cecily Morrison, Katja Hofmann, Simone Stumpf, 2021. (In Proceedings of the IEEE/CVF International Conference on Computer Vision).

Abstract▼ URL

Object recognition has made great advances in the last decade, but predominately still relies on many high-quality training examples per object category. In contrast, learning new objects from only a few examples could enable many impactful applications from robotics to user personalization. Most few-shot learning research, however, has been driven by benchmark datasets that lack the high variation that these applications will face when deployed in the real-world. To close this gap, we present the ORBIT dataset and benchmark, grounded in the real-world application of teachable object recognizers for people who are blind/low-vision. The dataset contains 3,822 videos of 486 objects recorded by people who are blind/low-vision on their mobile phones. The benchmark reflects a realistic, highly challenging recognition problem, providing a rich playground to drive research in robustness to few-shot, high-variation conditions. We set the benchmark’s first state-of-the-art and show there is massive scope for further innovation, holding the potential to impact a broad range of real-world vision applications including tools for the blind/low-vision community.

Attacking Few-Shot Classifiers with Adversarial Support Poisoning

Elre T. Oldewage, John Bronskill, Richard E. Turner, 2021. (In A Blessing in Disguise: The Prospects and Perils of Adversarial Machine Learning, Workshop at ICML 2021).

Abstract▼ URL

This paper examines the robustness of deployed few-shot meta-learning systems when they are fed an imperceptibly perturbed few-shot dataset, showing that the resulting predictions on test inputs can become worse than chance. This is achieved by developing a novel attack, Adversarial Support Poisoning or ASP, which crafts a poisoned set of examples. When even a small subset of malicious data points is inserted into the support set of a meta-learner, accuracy is significantly reduced. We evaluate the new attack on a variety of few-shot classification algorithms and scenarios, and propose a form of adversarial training that significantly improves robustness against both poisoning and evasion attacks.

Adversarial Attacks are a Surprisingly Strong Baseline for Poisoning Few-Shot Meta-Learners

Elre T. Oldewage, John Bronskill, Richard E. Turner, 2022. (In I Can’t Believe It’s Not Better, Workshop at Neurips 2022).

Abstract▼

This paper examines the robustness of deployed few-shot meta-learning systems when they are fed an imperceptibly perturbed few-shot dataset. We attack amortized meta-learners, which allows us to craft colluding sets of inputs that are tailored to fool the system’s learning algorithm when used as training data. Jointly crafted adversarial inputs might be expected to synergistically manipulate a classifier, allowing for very strong data-poisoning attacks that would be hard to detect. We show that in a white box setting, these attacks are very successful and can cause the target model’s predictions to become worse than chance. However, in opposition to the well-known transferability of adversarial examples in general, the colluding sets do not transfer well to different classifiers. We explore two hypotheses to explain this: ‘overfitting’ by the attack, and mismatch between the model on which the attack is generated and that to which the attack is transferred. Regardless of the mitigation strategies suggested by these hypotheses, the colluding inputs transfer no better than adversarial inputs that are generated independently in the usual way.

Fast and Flexible Multi-Task Classification using Conditional Neural Adaptive Processes

James Requeima, Jonathan Gordon, John Bronskill, Sebastian Nowozin, Richard E. Turner, 2019. (In Advances in Neural Information Processing Systems 33).

Abstract▼ URL

The goal of this paper is to design image classification systems that, after an initial multi-task training phase, can automatically adapt to new tasks encountered at test time. We introduce a conditional neural process based approach to the multi-task classification setting for this purpose, and establish connections to the meta- and few-shot learning literature. The resulting approach, called CNAPs, comprises a classifier whose parameters are modulated by an adaptation network that takes the current task’s dataset as input. We demonstrate that CNAPs achieves state-of-the-art results on the challenging Meta-Dataset benchmark indicating high-quality transfer-learning. We show that the approach is robust, avoiding both over-fitting in low-shot regimes and under-fitting in high-shot regimes. Timing experiments reveal that CNAPs is computationally efficient at test-time as it does not involve gradient based adaptation. Finally, we show that trained models are immediately deployable to continual learning and active learning where they can outperform existing approaches that do not leverage transfer learning.

Fs-mol: A few-shot learning dataset of molecules

Megan Stanley, John Bronskill, Krzysztof Maziarz, Hubert Misztela, Jessica Lanini, Marwin Segler, Nadine Schneider, Marc Brockschmidt, 2021. (In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2)).

Abstract▼ URL

Small datasets are ubiquitous in drug discovery as data generation is expensive and can be restricted for ethical reasons (eg in vivo experiments). A widely applied technique in early drug discovery to identify novel active molecules against a protein target is modelling quantitative structure-activity relationships (QSAR). It is known to be extremely challenging, as available measurements of compound activities range in the low dozens or hundreds. However, many such related datasets exist, each with a small number of datapoints, opening up the opportunity for few-shot learning after pre-training on a substantially larger corpus of data. At the same time, many few-shot learning methods are currently evaluated in the computer-vision domain. We propose that expansion into a new application, as well as the possibility to use explicitly graph-structured data, will drive exciting progress in few-shot learning. Here, we provide a few-shot learning dataset (FS-Mol) and complementary benchmarking procedure. We define a set of tasks on which few-shot learning methods can be evaluated, with a separate set of tasks for use in pre-training. In addition, we implement and evaluate a number of existing single-task, multi-task, and meta-learning approaches as baselines for the community. We hope that our dataset, support code release, and baselines will encourage future work on this extremely challenging new domain for few-shot learning.