The entire course falls naturally in three parts, Gaussian processes, probabilistic ranking and text modeling.

In a nutshell, part I is concerned with...

- Modelling data
- goals of building a model
- requirements for good models
- data, parameters and latent variables

- Linear in
the parameters regression
- making predictions, concept of a model
- least squares fit, and the Normal equations
- requires linear algebra
- model complexity: underfitting and overfitting

- Likelihood and the concept of noise
- Gaussian independent and identically distributed (iid) noise
- Maximum likelihood fitting
- Equivalence to least squares
- Motivation of inference with multiple hypothesis

- Probability fundamentals
- Medical example
- Joint, conditional and marginal probabilities
- The two rules of probability: sum and product
- Bayes' rule

- Bayesian inference and
prediction with finite regression models
- Likelihood and prior
- Posterior and predictive distribution, with algebra and pictorially
- the marginal likelihood

- Background: Some useful
Gaussian and Matrix equations
- matrix inversion lemma
- mean and variance of Gaussian
- mean and variance of projection of Gaussian
- marginal and conditional of Gaussian
- products of Gaussians

- Marginal likelihood
- Bayesian model selection
- MCMC based explanation of how the marginal likelihood works
- Average the likelihood over the prior: example

- Distributions over
parameters and over functions
- Concept of prior over functions and over parameters
- nuissance parameters
- Could we sidestep parameters, and work directly with functions?

- Gaussian process
- From scalar Gaussians to multivariate Gaussians to Gaussian processes
- Functions are like infinitely long vectors, GPs are distributions over functions
- Marginal and conditional Gaussian
- GP definition
- Conditional generation and joint generation

- Gaussian processes and data
- In pictures: prior and posterior
- In algebra: prior and posterior
- An analytic marginal likelihood, and some intuition

- Gaussian process marginal likelihood
and hyperparameters
- the GP marginal likelihood, and it interpretation
- hyperparameters can control the properties of functions
- example: finding hyperparameters by maximizing the marginal likelihood
- Occam's Razor

- Correspondence between Linear in the parameters models and GPs
- From linear in the parameters models to GPs
- From GPs to linear in the parameters models
- Computational considerations: which is more efficient?
- covariance functions
- Stationary covariance functions, squared exponential, rational quadratic, Matérn covariance function
- periodic covariance
- neural network covariance function
- Combining simple covariance functions into more interesting ones

- The gpml toolbox

- Ranking: motivation and tennis example
- Competition in sports and games (TrueSkill problem, match making)
- Tennis: the ATP ranking system explained
- Shortcomings: what does one need to make actual predictions? (who wins?)
- The TrueSkill ranking model

- Gibbs sampling
- Calulating integrals using sampling
- Markov chains and invariant distributions
- Gibbs sampling

- Gibbs sampling in TrueSkill
- Conditional distributions in TrueSkill are tractable

- Representing distributions using
factor graphs
- the cost of computing marginal distributions
- algebraic and graphical representations
- local computations on the graph
- message passing: the sum-product rules

- message passing in
TrueSkill
- messages are not all tractable

- Approximate messages using moment
matching
- How to approximate a step function by a Gaussian?

- Modeling text
- Modeling collections of documents
- probabilistic models of text
- Bag of words models
- Zipf's law
- Discrete distributions on binary
variables (tossing coins)
- Binary variables and the Bernoulli distribution
- Sequences, the binomial and discrete distributions
- Inference and the Beta distributinos: probabilities over probabilities

- Discrete distributions over multiple outcomes
- multinomials, categorical and discrete distributions
- inference and the Dirichlet prior
- Document models
- Categorical model
- Mixture of categoricals model
- Trainig mixture models with EM
- A Bayesian mixture model
- The Expectation Maximization (EM) algoritm
- Maximum likelihood in models with latent variables
- Gibbs sampling for Bayesian mixture model
- Gibbs sampling
- Collapsed Gibbs sampling
- Latent Dichichlet Allocation topic models
- A more interesting topic model
- Inference using Gibbs sampling