Probabilistic Machine Learning 4f13 Michaelmas 2024

Keywords: Machine learning, probabilistic modelling, graphical models, approximate inference, Bayesian statistics

Taught By: Carl Edward Rasmussen

The main page for the course is on moodle, but if you're just attending lectures there is no need to access the moodle site.

Code and Term: 4F13 Michaelmas term

Year: 4th year (part IIB) Engineering and MPhil in Machine Learning and Machine Intelligence; the lectures are also open to students in any department (but if you want to take it for credit, you need to make arrangements for assessment within your own department, as our capacity to mark coursework is already severely stretched).

Structure & Assessment:14 lectures, 2 coursework revisions, 3 pieces of course work. The assessment depends on which cohort you are in. For undergrad students and PhD students the assessment will be based on you handing in reports for each of the three pieces of coursework (and there will be no final exam); the three pieces of coursework carry equal weight. For students in the Machine Learning and Machine Intelligence (MLMI) MPhil program (the MLMI module number is MLMI17), the assessment will be by a short oral exam, which will be held on Friday December 6th, somehwere in the interval 8:00 - 17:00. The MLMI students don't hand in written reports, but should work on these questions in preparation for the oral. More information on the exact format of the oral to follow.

Format:This year the course will be taught in person, in LT 1, weekly on Mondays 9:00-10:00 and Tuesdays 9:00-10:00, the first lecture on Monday Oct 14th. There will also be an (entirely optional) office hour on Thursdays 15:00-16:00 (first time Oct 17th) in the CBL seminar room BE4-38 (4th floor Baker Building).

Prerequisites: A good background in statistics, calculus, linear algebra, and computer science. 3F3 Signal and Pattern Processing. You should thoroughly review the maths in the following cribsheet [pdf] [ps] before the start of the course. The following Matrix Cookbook is also a useful resource. If you want to do the optional coursework you need to know Matlab or Octave, or be willing to learn it on your own. Any student or researcher at Cambridge meeting these requirements is welcome to attend the lectures. Students wishing to take it for credit should consult with the course lecturers.

Textbook: There is no required textbook. However, the material covered is treated excellent recent text books:

Kevin P. Murphy Machine Learning: a Probabilistic Perspective, the MIT Press (2012).

David Barber Bayesian Reasoning and Machine Learning, Cambridge University Press (2012), avaiable freely on the web.

Christopher M. Bishop Pattern Recognition and Machine Learning. Springer (2006)

David J.C. MacKay Information Theory, Inference, and Learning Algorithms, Cambridge University Press (2003), available freely on the web.

Lecture Syllabus

This year, the exposition of the material will be centered around three specific machine learning areas: 1) supervised non-parametric probabilistic inference using Gaussian processes, 2) the TrueSkill ranking system and 3) the latent Dirichlet Allocation model for unsupervised learning in text.

The organisation of the handouts is changing. This year the material will be structured into small chunks, each containing a single core concept. Printed handouts won't be provided at the lectures, but will be available on this web site. I recommend that you don't bring printed slides to the lectures, but of course you can do so if you think it works better for you.

Note: the links in the table below aren't up to date. If you want to see lecture slides from a similar but not identical course taught previously go to Michaelmas 2021 course website, but be warned that the slides may change slightly.


Introduction to Probabilistic Machine Learning (2L):
Modelling data
Linear in the parameters regression
Likelihood and the concept of noise
Probability fundamentals
Bayesian inference and prediction with finite regression models
Marginal likelihood
Gaussian Processes (3L):
Parameters and functions
Gaussian Process, wee sequential generation demo
Correspondence between linear models and GPs
Should we use finite or infinite models?
Covariance functions
Quick introduction to the gpml toolbox
Probabilistic Ranking (3L):
Introduction to ranking
Gibbs sampling
Gibbs sampling demo, matlab script
Gibbs sampling in the TrueSkill model
Factor graphs
Message passing in TrueSkill
Approximation by moment matching
Modelling Document Collections
models of text
discrete binary distributions
categorical, multinomial, discrete distributions
Modelling Document Collections
Simple categorical and mixture models
Learning in models with latent variables: the EM algorithm
Modelling Document Collections
Gibbs sampling in mixture models, collapsed Gibbs
Latent Dirichlet Allocation topic models

Coursework

Course work is to be submitted via moodle in electronic form no later than 12:00 noon on the date due. If you are not an egineering undergraduate, please make sure you are signed up for the module on moodle, check with Kimberly Cole kc429@cam.ac.uk, in room BE4-45 if you are in doubt. Each of the three pieces of course work carry an equal weight in the evaluation. The course work will be updated about two weeks before it is due, coursework 1 is up-to-date. The due-dates this year are:

Coursework #1
Coursework 1 is about regression using Gaussian processes. You will need the following files cw1a.mat and cw1e.mat.
Due: Friday 8th November, 2024 at 12:00 noon online.

Coursework #2
Coursework 2 will be about Probabilistic Ranking. This is the data file: tennis_data.mat. For matlab, use cw2.m, gibbsrank.m and eprank.m, or for python use coursework2.ipynb, cw2.py, gibbsrank.py and eprank.py.
Due: Friday 22nd November, 2024 at 12:00 noon online.

Coursework #3
Coursework 3 is about the Latent Dirichlet Allocation (LDA) model. You will need the kos_doc_data.mat, and code for matlab bmm.m, lda.m, sampDiscrete.m, or code for python bmm.py, lda.py, sampleDiscrete.py.
Due: Friday 6th December, 2024 at 12:00 noon online.