
Wide Mean-Field Bayesian Neural Networks Ignore the Data

Beau Coker, Wessel P. Bruinsma, David R. Burt, Weiwei Pan, Finale Doshi-Velez, 2022. (In 25th International Conference on Artificial Intelligence and Statistics).

Abstract URL

Bayesian neural networks (BNNs) combine the expressive power of deep learning with the advantages of Bayesian formalism. In recent years, the analysis of wide, deep BNNs has provided theoretical insight into their priors and posteriors. However, we have no analogous insight into their posteriors under approximate inference. In this work, we show that mean-field variational inference entirely fails to model the data when the network width is large and the activation function is odd. Specifically, for fully-connected BNNs with odd activation functions and a homoscedastic Gaussian likelihood, we show that the optimal mean-field variational posterior predictive (i.e., function space) distribution converges to the prior predictive distribution as the width tends to infinity. We generalize aspects of this result to other likelihoods. Our theoretical results are suggestive of underfitting behavior previously observered in BNNs. While our convergence bounds are non-asymptotic and constants in our analysis can be computed, they are currently too loose to be applicable in standard training regimes. Finally, we show that the optimal approximate posterior need not tend to the prior if the activation function is not odd, showing that our statements cannot be generalized arbitrarily.

The Indian Buffet Process: Scalable Inference and Extensions

Finale Doshi-Velez, August 2009. University of Cambridge, Cambridge, UK.

Abstract URL

Many unsupervised learning problems seek to identify hidden features from observations. In many real-world situations, the number of hidden features is unknown. To avoid specifying the number of hidden features a priori, one can use the Indian Buffet Process (IBP): a nonparametric latent feature model that does not bound the number of active features in a dataset. While elegant, the lack of efficient inference procedures for the IBP has prevented its application in large-scale problems. The core contribution of this thesis are three new inference procedures that allow inference in the IBP to be scaled from a few hundred to 100,000 observations. This thesis contains three parts: (1) An introduction to the IBP and a review of inference techniques and extensions. The first chapters summarise three constructions for the IBP and review all currently published inference techniques. Appendix C reviews extensions of the IBP to date. (2) Novel techniques for scalable Bayesian inference. This thesis presents three new inference procedures: (a) an accelerated Gibbs sampler for efficient Bayesian inference in a broad class of conjugate models, (b) a parallel, asynchronous Gibbs sampler that allows the accelerated Gibbs sampler to be distributed across multiple processors, and (c) a variational inference procedure for the IBP. (3) A framework for structured nonparametric latent feature models. We also present extensions to the IBP to model more sophisticated relationships between the co-occurring hidden features, providing a general framework for correlated non-parametric feature models.

The Infinite Partially Observable Markov Decision Process

Finale Doshi-Velez, December 2009. (In Advances in Neural Information Processing Systems 23). Cambridge, MA, USA. The MIT Press.

Abstract URL

The Partially Observable Markov Decision Process (POMDP) framework has proven useful in planning domains where agents must balance actions that provide knowledge and actions that provide reward. Unfortunately, most POMDPs are complex structures with a large number of parameters. In many real-world problems, both the structure and the parameters are difficult to specify from domain knowledge alone. Recent work in Bayesian reinforcement learning has made headway in learning POMDP models; however, this work has largely focused on learning the parameters of the POMDP model. We define an infinite POMDP (iPOMDP) model that does not require knowledge of the size of the state space; instead, it assumes that the number of visited states will grow as the agent explores its world and only models visited states explicitly. We demonstrate the iPOMDP on several standard problems.

Accelerated Gibbs sampling for the Indian buffet process

Finale Doshi-Velez, Zoubin Ghahramani, June 2009. (In 26th International Conference on Machine Learning). Edited by Léon Bottou, Michael Littman. Montréal, QC, Canada. Omnipress.

Abstract URL

We often seek to identify co-occurring hidden features in a set of observations. The Indian Buffet Process (IBP) provides a non-parametric prior on the features present in each observation, but current inference techniques for the IBP often scale poorly. The collapsed Gibbs sampler for the IBP has a running time cubic in the number of observations, and the uncollapsed Gibbs sampler, while linear, is often slow to mix. We present a new linear-time collapsed Gibbs sampler for conjugate likelihood models and demonstrate its efficacy on large real-world datasets.

Accelerated sampling for the Indian Buffet Process

Finale Doshi-Velez, Zoubin Ghahramani, 2009. (In ICML). Edited by Andrea Pohoreckyj Danyluk, Léon Bottou, Michael L. Littman. acm. ACM International Conference Proceeding Series. ISBN: 978-1-60558-516-1.

Abstract URL

We often seek to identify co-occurring hidden features in a set of observations. The Indian Buffet Process (IBP) provides a nonparametric prior on the features present in each observation, but current inference techniques for the IBP often scale poorly. The collapsed Gibbs sampler for the IBP has a running time cubic in the number of observations, and the uncollapsed Gibbs sampler, while linear, is often slow to mix. We present a new linear-time collapsed Gibbs sampler for conjugate likelihood models and demonstrate its efficacy on large real-world datasets.

Correlated non-parametric latent feature models

F. Doshi-Velez, Z. Ghahramani, June 2009. (In Conference on Uncertainty in Artificial Intelligence (UAI 2009)). Montréal, QC, Canada. AUAI Press.

Abstract URL

We are often interested in explaining data through a set of hidden factors or features. To allow for an unknown number of such hidden features, one can use the IBP: a non-parametric latent feature model that does not bound the number of active features in a dataset. However, the IBP assumes that all latent features are uncorrelated, making it inadequate for many real-world problems. We introduce a framework for correlated non-parametric feature models, generalising the IBP. We use this framework to generate several specific models and demonstrate applications on real-world datasets.

A Comparison of Human and Agent Reinforcement Learning in Partially Observable Domains

Finale Doshi-Velez, Zoubin Ghahramani, 2011. (In 33rd Annual Meeting of the Cognitive Science Society). Boston, MA.

Abstract URL

It is commonly stated that reinforcement learning (RL) algorithms learn slower than humans. In this work, we investigate this claim using two standard problems from the RL literature. We compare the performance of human subjects to RL techniques. We find that context—the meaningfulness of the observations—–plays a significant role in the rate of human RL. Moreover, without contextual information, humans often fare much worse than classic algorithms. Comparing the detailed responses of humans and RL algorithms, we also find that humans appear to employ rather different strategies from standard algorithms, even in cases where they had indistinguishable performance to them. Our research both sheds light on human RL and provides insights for improving RL algorithms.

Large Scale Non-parametric Inference: Data Parallelisation in the Indian Buffet Process

Finale Doshi-Velez, David Knowles, Shakir Mohamed, Zoubin Ghahramani, December 2009. (In Advances in Neural Information Processing Systems 23). Cambridge, MA, USA. The MIT Press.

Abstract URL

Nonparametric Bayesian models provide a framework for flexible probabilistic modelling of complex datasets. Unfortunately, the high-dimensional averages required for Bayesian methods can be slow, especially with the unbounded representations used by nonparametric models. We address the challenge of scaling Bayesian inference to the increasingly large datasets found in real-world applications. We focus on parallelisation of inference in the Indian Buffet Process (IBP), which allows data points to have an unbounded number of sparse latent features. Our novel MCMC sampler divides a large data set between multiple processors and uses message passing to compute the global likelihoods and posteriors. This algorithm, the first parallel inference scheme for IBP-based models, scales to datasets orders of magnitude larger than have previously been possible.

Variational inference for the Indian buffet process

F. Doshi-Velez, K.T. Miller, J. Van Gael, Y.W. Teh, April 2009. (In 12th International Conference on Artificial Intelligence and Statistics). Clearwater Beach, FL, USA. Journal of Machine Learning Research.

Abstract URL

The Indian Buffet Process (IBP) is a nonparametric prior for latent feature models in which observations are influenced by a combination of hidden features. For example, images may be composed of several objects and sounds may consist of several notes. Latent feature models seek to infer these unobserved features from a set of observations; the IBP provides a principled prior in situations where the number of hidden features is unknown. Current inference methods for the IBP have all relied on sampling. While these methods are guaranteed to be accurate in the limit, samplers for the IBP tend to mix slowly in practice. We develop a deterministic variational method for inference in the IBP based on a truncated stick-breaking approximation, provide theoretical bounds on the truncation error, and evaluate our method in several data regimes.

Variational Inference for the Indian Buffet Process

Finale Doshi-Velez, Kurt T. Miller, Jurgen Van Gael, Yee Whye Teh, April 2009. University of Cambridge, Computational and Biological Learning Laboratory, Department of Engineering.

Abstract URL

The Indian Buffet Process (IBP) is a nonparametric prior for latent feature models in which observations are influenced by a combination of hidden features. For example, images may be composed of several objects and sounds may consist of several notes. Latent feature models seek to infer these unobserved features from a set of observations; the IBP provides a principled prior in situations where the number of hidden features is unknown. Current inference methods for the IBP have all relied on sampling. While these methods are guaranteed to be accurate in the limit, samplers for the IBP tend to mix slowly in practice. We develop a deterministic variational method for inference in the IBP based on truncating to infinite models, provide theoretical bounds on the truncation error, and evaluate our method in several data regimes. This technical report is a longer version of Doshi-Velez et al. (2009).

No matching items
Back to top