Unsupervised Learning Course Web Page

Unsupervised Learning 2005 Course Web Page

Gatsby Computational Neuroscience Unit
University College London

Keywords: Machine learning, probabilistic modelling, graphical models, approximate inference, Bayesian statistics

For a summary of the entire course you can read the following chapter:

Ghahramani (2004) Unsupervised Learning. In Bousquet, O., Raetsch, G. and von Luxburg, U. (eds) Advanced Lectures on Machine Learning LNAI 3176. Springer-Verlag.

Code: COMP GI02 / COMP 4c51 / Gatsby

Year: MSc in Intelligent Systems, PhD course at the Gatsby Unit

Prerequisites: A good background in statistics, calculus, linear algebra, and computer science. You should thoroughly review the maths in the following cribsheet [pdf] [ps] before the start of the course. You must either know Matlab or Octave, be taking a class on Matlab/Octave, or be willing to learn it on your own. Any student or researcher at UCL meeting these requirements is welcome to attend the lectures. Students wishing to take it for credit should consult with the course lecturer (email:

Term: 1, 2005

Time: 11.00 to 13.00 Mondays and Thursdays

Location: 4th floor, Gatsby Unit, 17 Queen Square

Taught By: Zoubin Ghahramani and Maneesh Sahani

Teaching Assistant: Richard Turner.

Homework Assignments: all assignments (coursework) for this course are to be handed in to the Gatsby Unit, not to the CS department. Please hand in all assignments at the beginning of lecture on the due date to either Zoubin or Richard. Late assignments will be penalised. If you are unable to come to class, you can also hand in assignments to Alexandra Boss, Room 408, Gatsby Unit.

Late Assignment Policy: Assignments that are handed in late will be penalised as follows: 10% penalty per day for every weekday late, until the answers are discussed in a review session. NO CREDIT will be given for assignments that are handed in after answers are discussed in the review session.

Textbook: There is no required textbook. However, I recommend the following two textbooks as excellent sources for many of the topics here, and I will be occasionally assigning reading from them:

David J.C. MacKay (2003) Information Theory, Inference, and Learning Algorithms, Cambridge University Press. (also available online)
Christopher M. Bishop (in preparation) Pattern Recognition and Machine Learning.

This chapter summarises the entire course:

Ghahramani (2004) Unsupervised Learning. In Bousquet, O., Raetsch, G. and von Luxburg, U. (eds) Advanced Lectures on Machine Learning LNAI 3176. Springer-Verlag.

NOTE: If you want to see lecture slides from last year click on the 2004 course website, but be warned that the slides may change this year.

Tentative Dates and Titles Topics Materials
Oct 3, Oct 6
Introduction and Statistical Foundations

Maximum Likelihood

Bayesian learning

The relation to coding length

Supervised vs Unsupervised vs Reinforcement Learning

Lecture Slides
Assignment 1 (due Thurs Oct 13)
Suggested Further Readings:

Cribsheet [pdf] [ps]of Basic Maths Needed for Machine Learning
Nuances of Probability Theory by Tom Minka.
Probability Theory: The Logic of Science by ET Jaynes
Mike Jordan (1986) Introduction to Linear Algebra [djvu format]
Daniel Osherson (1990) chapter on Judgement discusses Dutch Books [djvu format]
Sam Roweis' notes on matrix algebra
Tom Minka's notes on matrix algebra
Probability and Statistics Online Reference
Chapter 1 and 2 of Bishop, C. M. (draft) Pattern Recognition and Machine Learning. Hard copies will be made available in class.

Oct 10 and Oct 13
Latent Variable Models

Mixture of Gaussians (MoG) and k-means

Factor Analysis (FA) and PCA

Lecture Slides
Suggested Further Readings:

David MacKay's Book, Chapters 20, 22 and 23 on k-means and MoG
Chapter 4 and 6 of Bishop, C. M. (draft) Pattern Recognition and Machine Learning. Hard copies will be made available in class.
Max Welling's Class Notes on PCA and FA [pdf] [ps]

Oct 17 and 20
The EM Algorithm

General Theory

Application to MoG and to FA

Extensions

Lecture Slides
Assignment 2 (due Oct 27)
binarydigits.txt
bindigit.m

Suggested Further Readings:

Chapter 4 of Bishop, C. M. (draft) Pattern Recognition and Machine Learning.

Oct 24 and Oct 27
Latent Variable Time Series Models

Hidden Markov Models (HMMs)

Forward-Backward and Viterbi

Linear Dynamical Systems

Kalman Filtering (KF) and Extended KF

Hybrid and Nonlinear Time Series Models

Lecture Slides
Matlab Demo of State Space Model

Suggested Further Readings:

Chapter 7 of Bishop, C. M. (draft) Pattern Recognition and Machine Learning.
Ghahramani, Z. and Hinton, G.E. (1996) Parameter estimation for linear dynamical systems.
Minka, T. (1999) From Hidden Markov Models to Linear Dynamical Systems
Welling (2002) The Kalman Filter (class notes).

Oct 31
Introduction to Graphical Models I

Conditional Independence

Undirected Graphs (Markov Networks)

Hammersley-Clifford Theorem

Directed Graphs (Bayesian Networks)

Factor Graphs

Lecture Slides
Suggested Further Readings:

Chapter 3 of Bishop, C. M. (draft) Pattern Recognition and Machine Learning.
The following three related articles appear in Arbib (ed): The Handbook of Brain Theory and Neural Networks (2nd edition)

Jordan and Weiss (2002) Probabilistic Inference in Graphical Models

Ghahramani (2002) Graphical Models: Parameter Learning

Heckerman (2002) Graphical Models: Structure Learning

Shachter (1998) Bayes Ball

Nov 3
Introduction to Graphical Models II

Belief Propagation

Belief Propagation Slides (Fluffy and Moby)
Factor Graph Propagation Slides
Assignment 3 (due Mon Nov 14)
Data Sets: geyser.txt, data1.txt

Nov 7 and 10
Reading Week
NO LECTURES
.

Nov 14 and 17
Hierarchical and Nonlinear Models

Independent Components Analysis (ICA)

Sigmoid Belief Networks

Boltzmann Machines

Lecture Slides
Assignment 4 (due Mon Nov 28)
Demo
Suggested Further Readings: Max Welling's Notes on ICA
David MacKay's Book, Ch 34 on ICA

Nov 21 and Nov 24
Sampling and
Markov Chain Monte Carlo Methods

Monte Carlo:

simple Monte Carlo,
Rejection Sampling,
Importance Sampling

Markov chain Monte Carlo (MCMC):

Gibbs Sampling

Metropolis

Hybrid Monte Carlo and other methods

Lecture Slides (MCMC)
Suggested Further Readings: David MacKay's Book, Ch 29 and 30 on Monte Carlo methods;
A more in-depth treatment of Monte Carlo methods is in Radford Neal's Technical Report;
The following textbook is also good: Monte Carlo Statistical Methods (2nd Ed) by Christian P. Robert, George Casella. Springer Texts in Statistics. 2005.

Nov 28
Variational Approximations

Review of EM
Variational lower bounds and mean field methods
The Binary Latent Factor Model
Variational Message Passing
Expectation Propagation
Lecture Slides (Variational)
Suggested Further Readings:

David MacKay's Book, Ch 33 on variational methods
Winn and Bishop Variational Message Passing
Jordan et al's Introduction to Variational Methods [ps.gz] [pdf]
Tom Minka's: Roadmap to EP

Dec 1
Bayesian Model Comparison

Occam's Razor

Model comparison and averaging

BIC, Laplace and sampling approximations

Variational Bayesian EM algorithm

Lecture Slides (Bayesian Model Comparison)
Assignment 5 (due Fri Dec 16)
Data: images.jpg
Code: genimages.m
MStep.m
Suggested Reading:

Ghahramani (2004) Unsupervised Learning. In Bousquet, O., Raetsch, G. and von Luxburg, U. (eds) Advanced Lectures on Machine Learning LNAI 3176. Springer-Verlag.
This book chapter is a summary of the whole course.

Dec 5 and Dec 8
NO LECTURES
.

Thurs Dec 15
Review Session in Gatsby Unit Basement B10 (led by Richard Turner)
.

Aims: This course provides students with an in-depth introduction to statistical modelling and unsupervised learning techniques. It presents probabilistic approaches to modelling and their relation to coding theory and Bayesian statistics. A variety of latent variable models will be covered including mixture models (used for clustering), dimensionality reduction methods, time series models such as hidden Markov models which are used in speech recognition and bioinformatics, independent components analysis, hierarchical models, and nonlinear models. The course will present the foundations of probabilistic graphical models (e.g. Bayesian networks and Markov networks) as an overarching framework for unsupervised modelling. We will cover Markov chain Monte Carlo sampling methods and variational approximations for inference. Time permitting, students will also learn about other topics in machine learning.

Learning Outcomes: To be able to understand the theory of unsupervised learning systems; to have in-depth knowledge of the main models used in UL; to understand the methods of exact and approximate inference in probabilistic models; to be able to recognise which models are appropriate for different real-world applications of machine learning methods.

Method: Lecture presentations with associated class problems.

Assessment:

The course has the following assessment components:
- Written Examination (2.5 hours, 50%)
- Weekly Assignments (50%)
To pass this course, students must:
- Obtain an overall pass mark for all sections combined

Course Location:

Gatsby Unit
17 Queen Square [map]
Mondays and Thursdays 11:00 - 13:00

Tel:

Zoubin 020 7679 1199

Emails:

Tentative Dates and Titles	Topics	Materials
Oct 3, Oct 6 Introduction and Statistical Foundations	Maximum Likelihood Bayesian learning The relation to coding length Supervised vs Unsupervised vs Reinforcement Learning	Lecture Slides Assignment 1 (due Thurs Oct 13) Suggested Further Readings: Cribsheet [pdf] [ps]of Basic Maths Needed for Machine Learning Nuances of Probability Theory by Tom Minka. Probability Theory: The Logic of Science by ET Jaynes Mike Jordan (1986) Introduction to Linear Algebra [djvu format] Daniel Osherson (1990) chapter on Judgement discusses Dutch Books [djvu format] Sam Roweis' notes on matrix algebra Tom Minka's notes on matrix algebra Probability and Statistics Online Reference Chapter 1 and 2 of Bishop, C. M. (draft) Pattern Recognition and Machine Learning. Hard copies will be made available in class.
Oct 10 and Oct 13 Latent Variable Models	Mixture of Gaussians (MoG) and k-means Factor Analysis (FA) and PCA	Lecture Slides Suggested Further Readings: David MacKay's Book, Chapters 20, 22 and 23 on k-means and MoG Chapter 4 and 6 of Bishop, C. M. (draft) Pattern Recognition and Machine Learning. Hard copies will be made available in class. Max Welling's Class Notes on PCA and FA [pdf] [ps]
Oct 17 and 20 The EM Algorithm	General Theory Application to MoG and to FA Extensions	Lecture Slides Assignment 2 (due Oct 27) binarydigits.txt bindigit.m Suggested Further Readings: Chapter 4 of Bishop, C. M. (draft) Pattern Recognition and Machine Learning.
Oct 24 and Oct 27 Latent Variable Time Series Models	Hidden Markov Models (HMMs) Forward-Backward and Viterbi Linear Dynamical Systems Kalman Filtering (KF) and Extended KF Hybrid and Nonlinear Time Series Models	Lecture Slides Matlab Demo of State Space Model Suggested Further Readings: Chapter 7 of Bishop, C. M. (draft) Pattern Recognition and Machine Learning. Ghahramani, Z. and Hinton, G.E. (1996) Parameter estimation for linear dynamical systems. Minka, T. (1999) From Hidden Markov Models to Linear Dynamical Systems Welling (2002) The Kalman Filter (class notes).
Oct 31 Introduction to Graphical Models I	Conditional Independence Undirected Graphs (Markov Networks) Hammersley-Clifford Theorem Directed Graphs (Bayesian Networks) Factor Graphs	Lecture Slides Suggested Further Readings: Chapter 3 of Bishop, C. M. (draft) Pattern Recognition and Machine Learning. The following three related articles appear in Arbib (ed): The Handbook of Brain Theory and Neural Networks (2nd edition) Jordan and Weiss (2002) Probabilistic Inference in Graphical Models Ghahramani (2002) Graphical Models: Parameter Learning Heckerman (2002) Graphical Models: Structure Learning Shachter (1998) Bayes Ball
Nov 3 Introduction to Graphical Models II	Belief Propagation	Belief Propagation Slides (Fluffy and Moby) Factor Graph Propagation Slides Assignment 3 (due Mon Nov 14) Data Sets: geyser.txt, data1.txt
Nov 7 and 10 Reading Week	NO LECTURES	.
Nov 14 and 17 Hierarchical and Nonlinear Models	Independent Components Analysis (ICA) Sigmoid Belief Networks Boltzmann Machines	Lecture Slides Assignment 4 (due Mon Nov 28) Demo Suggested Further Readings: Max Welling's Notes on ICA David MacKay's Book, Ch 34 on ICA
Nov 21 and Nov 24 Sampling and Markov Chain Monte Carlo Methods	Monte Carlo: simple Monte Carlo, Rejection Sampling, Importance Sampling Markov chain Monte Carlo (MCMC): Gibbs Sampling Metropolis Hybrid Monte Carlo and other methods	Lecture Slides (MCMC) Suggested Further Readings: David MacKay's Book, Ch 29 and 30 on Monte Carlo methods; A more in-depth treatment of Monte Carlo methods is in Radford Neal's Technical Report; The following textbook is also good: Monte Carlo Statistical Methods (2nd Ed) by Christian P. Robert, George Casella. Springer Texts in Statistics. 2005.
Nov 28 Variational Approximations	Review of EM Variational lower bounds and mean field methods The Binary Latent Factor Model Variational Message Passing Expectation Propagation	Lecture Slides (Variational) Suggested Further Readings: David MacKay's Book, Ch 33 on variational methods Winn and Bishop Variational Message Passing Jordan et al's Introduction to Variational Methods [ps.gz] [pdf] Tom Minka's: Roadmap to EP
Dec 1 Bayesian Model Comparison	Occam's Razor Model comparison and averaging BIC, Laplace and sampling approximations Variational Bayesian EM algorithm	Lecture Slides (Bayesian Model Comparison) Assignment 5 (due Fri Dec 16) Data: images.jpg Code: genimages.m MStep.m Suggested Reading: Ghahramani (2004) Unsupervised Learning. In Bousquet, O., Raetsch, G. and von Luxburg, U. (eds) Advanced Lectures on Machine Learning LNAI 3176. Springer-Verlag. This book chapter is a summary of the whole course.
Dec 5 and Dec 8	NO LECTURES	.
Thurs Dec 15	Review Session in Gatsby Unit Basement B10 (led by Richard Turner)	.