## Probabilistic Machine Learning chunks

The topics and concepts taught in the Probabilisitc Machine Learning course is broken down into a number of chunks, which are detailed in this page. The goal of this organisation is to help students to be able to identify and find material. Chunks are designed to be concise, and fairly self contained, and clearly labeled with content, prerequisites and relationships to other chunks.

The entire course falls naturally in three parts, Gaussian processes, probabilistic ranking and text modeling.

## Part I: Supervised non-parametric probabilistic inference using Gaussian processes

In a nutshell, part I is concerned with...

1. Modelling data
• goals of building a model
• requirements for good models
• data, parameters and latent variables
2. Linear in the parameters regression
• making predictions, concept of a model
• least squares fit, and the Normal equations
• requires linear algebra
• model complexity: underfitting and overfitting
3. Likelihood and the concept of noise
• Gaussian independent and identically distributed (iid) noise
• Maximum likelihood fitting
• Equivalence to least squares
• Motivation of inference with multiple hypothesis
4. Probability fundamentals
• Medical example
• Joint, conditional and marginal probabilities
• The two rules of probability: sum and product
• Bayes' rule
5. Bayesian inference and prediction with finite regression models
• Likelihood and prior
• Posterior and predictive distribution, with algebra and pictorially
• the marginal likelihood
6. Background: Some useful Gaussian and Matrix equations
• matrix inversion lemma
• mean and variance of Gaussian
• mean and variance of projection of Gaussian
• marginal and conditional of Gaussian
• products of Gaussians
7. Marginal likelihood
• Bayesian model selection
• MCMC based explanation of how the marginal likelihood works
• Average the likelihood over the prior: example
8. Distributions over parameters and over functions
• Concept of prior over functions and over parameters
• nuissance parameters
• Could we sidestep parameters, and work directly with functions?
9. Gaussian process
• From scalar Gaussians to multivariate Gaussians to Gaussian processes
• Functions are like infinitely long vectors, GPs are distributions over functions
• Marginal and conditional Gaussian
• GP definition
• Conditional generation and joint generation
10. Gaussian processes and data
• In pictures: prior and posterior
• In algebra: prior and posterior
• An analytic marginal likelihood, and some intuition
11. Gaussian process marginal likelihood and hyperparameters
• the GP marginal likelihood, and it interpretation
• hyperparameters can control the properties of functions
• example: finding hyperparameters by maximizing the marginal likelihood
• Occam's Razor
12. Correspondence between Linear in the parameters models and GPs
• From linear in the parameters models to GPs
• From GPs to linear in the parameters models
• Computational considerations: which is more efficient?
13. covariance functions
• Stationary covariance functions, squared exponential, rational quadratic, Matérn covariance function
• periodic covariance
• neural network covariance function
• Combining simple covariance functions into more interesting ones
14. The gpml toolbox

## Part II: Ranking

1. Ranking: motivation and tennis example
• Competition in sports and games (TrueSkill problem, match making)
• Tennis: the ATP ranking system explained
• Shortcomings: what does one need to make actual predictions? (who wins?)
• The TrueSkill ranking model
2. Gibbs sampling
• Calulating integrals using sampling
• Markov chains and invariant distributions
• Gibbs sampling
3. Gibbs sampling in TrueSkill
• Conditional distributions in TrueSkill are tractable
4. Representing distributions using factor graphs
• the cost of computing marginal distributions
• algebraic and graphical representations
• local computations on the graph
• message passing: the sum-product rules
5. message passing in TrueSkill
• messages are not all tractable
6. Approximate messages using moment matching
• How to approximate a step function by a Gaussian?

## Part III: Modeling text

1. Modeling text
• Modeling collections of documents
• probabilistic models of text
• Bag of words models
• Zipf's law
2. Discrete distributions on binary variables (tossing coins)
• Binary variables and the Bernoulli distribution
• Sequences, the binomial and discrete distributions
• Inference and the Beta distributinos: probabilities over probabilities
3. Discrete distributions over multiple outcomes
• multinomials, categorical and discrete distributions
• inference and the Dirichlet prior
4. Document models
• Categorical model
• Mixture of categoricals model
• Trainig mixture models with EM
• A Bayesian mixture model
5. The Expectation Maximization (EM) algoritm
• Maximum likelihood in models with latent variables
6. Gibbs sampling for Bayesian mixture model
• Gibbs sampling
• Collapsed Gibbs sampling
7. Latent Dichichlet Allocation topic models
• A more interesting topic model
• Inference using Gibbs sampling