MPhil Advanced Computer Science Projects

Potential projects supervised by Zoubin Ghahramani (zoubin - at - eng.cam.ac.uk)


The Automated Statistician

We are living in an era of abundant data - data is transforming the sciences, society and commerce. However, to unleash the power of this data we need to be able to transform this data into useful knowledge. While the discipline of Data Science has emerged over the last few years (and is a natural extension of the much more established field of Statistics), it is abundantly clear that there are not enough Data Scientists to meet the growing demand for this expertise. The goal of this project is to help the Cambridge Machine Learning group in its ambitious project to build an Automated Statistician (or Data Scientist). The overall goal of the project is to use Bayesian reasoning and machine learning as a foundation to learning model structure from data, and to making sense of data. Imagine a system where a user uploads some data and the Automated Statistician generates a useful report about this data. This report could contain many things: inferences about individual data points (e.g. outlier detection), inferences about data types, inferences about plausible models given the data. Your goal will be to help out with this ambitious project working together with PhD students and postdocs in the group. A specific subproject will be developed early on based on the student's interests. Excellent programming skills and knowledge of probabilistic modelling will be required. This is a collaboration with Google and MIT.


Bayesian Nonparametric Machine Learning

One of the most important trends in modern machine learning is the use of flexible probabilistic models. The theoretical tools that underpin this come from the field of Bayesian nonparametrics, and make extensive use of probability theory and stochastic processes (Gaussian, Poisson, Dirichlet, Levy etc processes). The machine learning group at Cambridge is one of the world's centres for research on this topic. The project will involve research extending the current state of the art in this field, in conjunction with PhD students and postdocs in the group. A specific subproject will be developed early on based on the student's interests. This project is ideal for mathematically strong MPhil students interested in pursuing a PhD in machine learning.


Probabilistic Programming in Church

Probabilistic Programming is a new paradigm in Machine Learning whereby a fully expressive programming language is used to define a probabilistic model, and an automated inference engine then runs on the program traces so as to implement statistical inference. We are collaborating with MIT and Oxford on a project advancing the state of the art in this area, specifically looking at implementing sophisticated Bayesian models. This project involves implementing and testing a variety of non-trivial models in the Church / Venture programming language (a variant of Lisp / Scheme), identifying weaknesses of the current probabilistic programming paradigm, and investigating directions for improving this framework.