Google awards $750,000 for “The Automatic Statistician”

The Automatic Statistician, a project led by Zoubin Ghahramani, has won a $750,000 Google Focused Research Award. This Award consists of a no-strings attached donation to support research in the Cambridge Machine Learning Group on this topic.

Automating the process of statistical modeling would have a tremendous impact on fields that currently rely on expert statisticians, machine learning researchers, and data scientists. Such expertise in the data sciences is increasingly in demand, especially with the growth in Big Data problems in the sciences and in industry. The Automatic Statistician is a system which explores an open-ended space of possible statistical models to discover a good explanation of the data, and then produces a detailed report with figures and natural-language text. The Cambridge group, including PhD students James Lloyd and David Duvenaud working with Roger Grosse and Joshua Tenenbaum at MIT, has developed an early version of this system which not only automatically produces a 10-15 page report describing patterns discovered in the data, but returns a statistical model with state-of-the-art extrapolation performance evaluated over real time series data sets from various domains. The system is based on reasoning over an open-ended language of nonparametric models using Bayesian inference.

Kevin P. Murphy, Senior Research Scientist at Google says: “In recent years, machine learning has made tremendous progress in developing models that can accurately predict future data. However, there are at still several obstacles in the way of its more widespread use in the data sciences. The first problem is that current ML methods still require considerable human expertise in devising appropriate features and models. The second problem is that the output of current methods, while accurate, is often hard to understand, which makes it hard to trust. The “automatic statistician” project from Cambridge aims to address both problems, by using Bayesian model selection strategies to automatically choose good models/ features, and to interpret the resulting fit in easy-to-understand ways, in terms of human readable, automatically generated reports. This is a very promising direction for ML research, which is likely to find many applications at Google and beyond.”

The ultimate aim of the Automatic Statistician is to produce an artificially intelligent (AI) system for statistics and the data sciences.

Visit the project’s website at http://www.automaticstatistician.com/.