# MPhil Advanced Computer Science Projects

Potential projects supervised by Zoubin Ghahramani
(`zoubin - at - eng.cam.ac.uk`)

## The Automated Statistician

We are living in an era of abundant data - data is transforming the
sciences, society and commerce. However, to unleash the power of this
data we need to be able to transform this data into useful
knowledge. While the discipline of Data Science has emerged over the
last few years (and is a natural extension of the much more
established field of Statistics), it is abundantly clear that there
are not enough Data Scientists to meet the growing demand for this
expertise. The goal of this project is to help the Cambridge Machine
Learning group in its ambitious project to build an Automated
Statistician (or Data Scientist). The overall goal of the project is
to use Bayesian reasoning and machine learning as a foundation to
learning model structure from data, and to making sense of
data. Imagine a system where a user uploads some data and the
Automated Statistician generates a useful report about this data. This
report could contain many things: inferences about individual data
points (e.g. outlier detection), inferences about data types,
inferences about plausible models given the data. Your goal will be to
help out with this ambitious project working together with PhD
students and postdocs in the group. A specific subproject will be
developed early on based on the student's interests. Excellent
programming skills and knowledge of probabilistic modelling will be
required. This is a collaboration with Google and MIT.

## Bayesian Nonparametric Machine Learning

One of the most important trends in modern machine learning is the use
of flexible probabilistic models. The theoretical tools that underpin
this come from the field of Bayesian nonparametrics, and make
extensive use of probability theory and stochastic processes
(Gaussian, Poisson, Dirichlet, Levy etc processes). The machine
learning group at Cambridge is one of the world's centres for research
on this topic. The project will involve research extending the current
state of the art in this field, in conjunction with PhD students and
postdocs in the group. A specific subproject will be developed early
on based on the student's interests. This project is ideal for
mathematically strong MPhil students interested in pursuing a PhD in
machine learning.

## Probabilistic Programming in `Church`

Probabilistic Programming is a new paradigm in Machine Learning whereby
a fully expressive programming language is used to define a
probabilistic model, and an automated inference engine then runs on
the program traces so as to implement statistical inference. We are
collaborating with MIT and Oxford on a project advancing the state of
the art in this area, specifically looking at implementing
sophisticated Bayesian models. This project involves implementing and
testing a variety of non-trivial models in the ` Church / Venture
` programming language (a variant of Lisp / Scheme), identifying
weaknesses of the current probabilistic programming paradigm, and
investigating directions for improving this framework.