Jurgen Van Gael

now a CTO & Board Member, Stripe Payments UK Ltd

Email:

Publications

Scaling the iHMM: Parallelization versus Hadoop

Sébastien Bratières, Jurgen Van Gael, Andreas Vlachos, Zoubin Ghahramani, 2010. (In Proceedings of the 2010 10th IEEE International Conference on Computer and Information Technology). Bradford, UK. IEEE Computer Society. DOI: 10.1109/CIT.2010.223. ISBN: 978-0-7695-4108-2.

Abstract▼ URL

This paper compares parallel and distributed implementations of an iterative, Gibbs sampling, machine learning algorithm. Distributed implementations run under Hadoop on facility computing clouds. The probabilistic model under study is the infinite HMM Beal, Ghahramani and Rasmussen, 2002, in which parameters are learnt using an instance blocked Gibbs sampling, with a step consisting of a dynamic program. We apply this model to learn part-of-speech tags from newswire text in an unsupervised fashion. However our focus here is on runtime performance, as opposed to NLP-relevant scores, embodied by iteration duration, ease of development, deployment and debugging.

Variational inference for the Indian buffet process

F. Doshi-Velez, K.T. Miller, J. Van Gael, Y.W. Teh, April 2009. (In 12th International Conference on Artificial Intelligence and Statistics). Clearwater Beach, FL, USA. Journal of Machine Learning Research.

Abstract▼ URL

The Indian Buffet Process (IBP) is a nonparametric prior for latent feature models in which observations are influenced by a combination of hidden features. For example, images may be composed of several objects and sounds may consist of several notes. Latent feature models seek to infer these unobserved features from a set of observations; the IBP provides a principled prior in situations where the number of hidden features is unknown. Current inference methods for the IBP have all relied on sampling. While these methods are guaranteed to be accurate in the limit, samplers for the IBP tend to mix slowly in practice. We develop a deterministic variational method for inference in the IBP based on a truncated stick-breaking approximation, provide theoretical bounds on the truncation error, and evaluate our method in several data regimes.

Variational Inference for the Indian Buffet Process

Finale Doshi-Velez, Kurt T. Miller, Jurgen Van Gael, Yee Whye Teh, April 2009. University of Cambridge, Computational and Biological Learning Laboratory, Department of Engineering.

Abstract▼ URL

The Indian Buffet Process (IBP) is a nonparametric prior for latent feature models in which observations are influenced by a combination of hidden features. For example, images may be composed of several objects and sounds may consist of several notes. Latent feature models seek to infer these unobserved features from a set of observations; the IBP provides a principled prior in situations where the number of hidden features is unknown. Current inference methods for the IBP have all relied on sampling. While these methods are guaranteed to be accurate in the limit, samplers for the IBP tend to mix slowly in practice. We develop a deterministic variational method for inference in the IBP based on truncating to infinite models, provide theoretical bounds on the truncation error, and evaluate our method in several data regimes. This technical report is a longer version of Doshi-Velez et al. (2009).

Ranking Biomedical Passages for Relevance and Diversity: University of Wisconsin, Madison at TREC Genomics 2006

A. B. Goldberg, D. Andrzejewski, J. Van Gael, B. Settles, X. Zhu, M. Craven, November 2006. (In Proceedings of the Fifteenth Text REtrieval Conference (TREC 2006)). Gaithersburg, MD, USA.

Abstract▼ URL

We report on the University of Wisconsin, Madison’s experience in the TREC Genomics 2006 track, which asks participants to retrieve passages from scientific articles that satisfy biologists’ information needs. An emphasis is placed on returning relevant passages that discuss different aspects of the topic. Using an off-the-shelf information retrieval (IR) engine, we focused on query generation and reranking query results to encourage relevance and diversity. For query generation, we automatically identify noun phrases from the topic descriptions, and use online resources to gather synonyms as expansion terms. Our first submission uses the baseline IR engine results. We rerank the passages using a naive clustering-based approach in our second run, and we test GRASSHOPPER, a novel graph-theoretic algorithm based on absorbing random walks, in our third run. While our aspect-level results appear to compare favorably with other participants on average, our query generation techniques failed to produce adequate query results for several topics, causing our passage and document-level evaluation scores to suffer. Furthermore, we surprisingly achieved higher aspect-level scores using the initial ranking than our methods aimed specifically at promoting diversity. While this sounds discouraging, we have several ideas as to why this happened and hope to produce new methods that correct these shortcomings.

Improving diversity in ranking using absorbing random walks

A.B. Goldberg, X. Zhu, J. Van Gael, D. Andrzejewski, April 2007. (In Proceedings of NAACL HLT). Rochester, NY, USA.

URL

Bayesian Knowledge Corroboration with Logical Rules and User Feedback

G. Kasneci, J. Van Gael, T. Graepel, R. Herbrich, September 2010. (In European Conference on Machine Learning (ECML)). Barcelona, Spain.

Abstract▼ URL

Current knowledge bases suffer from either low coverage or low accuracy. The underlying hypothesis of this work is that user feedback can greatly improve the quality of automatically extracted knowledge bases. The feedback could help quantify the uncertainty associated with the stored statements and would enable mechanisms for searching, ranking and reasoning at entity-relationship level. Most importantly, a principled model for exploiting user feedback to learn the truth values of statements in the knowledge base would be a major step forward in addressing the issue of knowledge base curation. We present a family of probabilistic graphical models that builds on user feedback and logical inference rules derived from the popular Semantic-Web formalism of RDFS [1]. Through internal inference and belief propagation, these models can learn both, the truth values of the statements in the knowledge base and the reliabilities of the users who give feedback. We demonstrate the viability of our approach in extensive experiments on real-world datasets, with feedback collected from Amazon Mechanical Turk.

Message Passing Algorithms for the Dirichlet Diffusion Tree

David A. Knowles, Jurgen Van Gael, Zoubin Ghahramani, 2011. (In 28th International Conference on Machine Learning).

Abstract▼ URL

We demonstrate efficient approximate inference for the Dirichlet Diffusion Tree (Neal, 2003), a Bayesian nonparametric prior over tree structures. Although DDTs provide a powerful and elegant approach for modeling hierarchies they haven’t seen much use to date. One problem is the computational cost of MCMC inference. We provide the first deterministic approximate inference methods for DDT models and show excellent performance compared to the MCMC alternative. We present message passing algorithms to approximate the Bayesian model evidence for a specific tree. This is used to drive sequential tree building and greedy search to find optimal tree structures, corresponding to hierarchical clusterings of the data. We demonstrate appropriate observation models for continuous and binary data. The empirical performance of our method is very close to the computationally expensive MCMC alternative on a density estimation problem, and significantly outperforms kernel density estimators.

Comment: web site

Traffic Classification in Information Poor Environments

C. Rotsos, J. Van Gael, A.W. Moore, Z. Ghahramani, July 2010. (In 1st International Workshop on Traffic Analysis and Classification (IWCMC ’10)). Caen, France.

Abstract▼ URL

Traffic classification using machine learning continues to be an active research area. The majority of work in this area uses off-the-shelf machine learning tools and treats them as black-box classifiers. This approach turns all the modelling complexity into a feature selection problem. In this paper, we build a problem-specific solution to the traffic classification problem by designing a custom probabilistic graphical model. Graphical models are a modular framework to design classifiers which incorporate domain-specific knowledge. More specifically, our solution introduces semi-supervised learning which means we learn from both labelled and unlabelled traffic flows. We show that our solution performs competitively compared to previous approaches while using less data and simpler features.

Probabilistic Graphical Models for Semi-Supervised Traffic Classification

Charalampos Rotsos, Jurgen Van Gael, Andrew W. Moore, Zoubin Ghahramani, 2010. (In The 6th International Wireless Communications and Mobile Computing Conference). Caen, France.

Abstract▼ URL

Traffic classification using machine learning continues to be an active research area. The majority of work in this area uses off-the-shelf machine learning tools and treats them as black-box classifiers. This approach turns all the modelling complexity into a feature selection problem. In this paper, we build a problem-specific solution to the traffic classification problem by designing a custom probabilistic graphical model. Graphical models are a modular framework to design classifiers which incorporate domain-specific knowledge. More specifically, our solution introduces semi-supervised learning which means we learn from both labelled and unlabelled traffic flows. We show that our solution performs competitively compared to previous approaches while using less data and simpler features.

Beam sampling for the infinite hidden Markov model

Jurgen Van Gael, Yunus Saatçi, Yee-Whye Teh, Zoubin Ghahramani, 2008. (In 25th International Conference on Machine Learning). Helsinki, Finland. Association for Computing Machinery.

Abstract▼ URL

The infinite hidden Markov model is a non-parametric extension of the widely used hidden Markov model. Our paper introduces a new inference algorithm for the infinite Hidden Markov model called beam sampling. Beam sampling combines slice sampling, which limits the number of states considered at each time step to a finite number, with dynamic programming, which samples whole state trajectories efficiently. Our algorithm typically outperforms the Gibbs sampler and is more robust. We present applications of iHMM inference using the beam sampler on changepoint detection and text prediction problems.

The infinite factorial hidden Markov model

J. Van Gael, Y.W. Teh, Z. Ghahramani, December 2008. (In Advances in Neural Information Processing Systems 21). Edited by D. Koller, D. Schuurmans, L. Bottou, Y. Bengio. Cambridge, MA, USA. The MIT Press.

Abstract▼ URL

The infinite factorial hidden Markov model is a non-parametric extension of the factorial hidden Markov model. Our model defines a probability distribution over an infinite number of independent binary hidden Markov chains which together produce an observable sequence of random variables. Central to our model is a new type of non-parametric prior distribution inspired by the Indian Buffet Process which we call the Indian Buffet Markov Process.

The infinite HMM for unsupervised PoS tagging

J. Van Gael, A. Vlachos, Z. Ghahramani, August 2009. (In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing (EMNLP)). Singapore. Association for Computational Linguistics. ISBN: 978-1-932432-62-6.

Abstract▼ URL

We extend previous work on fully unsupervised part-of-speech tagging. Using a non-parametric version of the HMM, called the infinite HMM (iHMM), we address the problem of choosing the number of hidden states in unsupervised Markov models for PoS tagging. We experiment with two non-parametric priors, the Dirichlet and Pitman-Yor processes, on the Wall Street Journal dataset using a parallelized implementation of an iHMM inference algorithm. We evaluate the results with a variety of clustering evaluation metrics and achieve equivalent or better performances than previously reported. Building on this promising result we evaluate the output of the unsupervised PoS tagger as a direct replacement for the output of a fully supervised PoS tagger for the task of shallow parsing and compare the two evaluations.

Correlation clustering for crosslingual link detection

J. Van Gael, X. Zhu, January 2007. (In International Joint Conference on Artificial Intelligence). Edited by Manuela M. Veloso. Hyderabad, India.

Abstract▼ URL

The crosslingual link detection problem calls for identifying news articles in multiple languages that report on the same news event. This paper presents a novel approach based on constrained clustering. We discuss a general way for constrained clustering using a recent, graph-based clustering framework called correlation clustering. We introduce a correlation clustering implementation that features linear program chunking to allow processing larger datasets. We show how to apply the correlation clustering algorithm to the crosslingual link detection problem and present experimental results that show correlation clustering improves upon the hierarchical clustering approaches commonly used in link detection, and, hierarchical clustering approaches that take constraints into account.