The purpose of this web page is to provide some links for people interested in the application of Bayesian ideas to Machine Learning.

We can read this in the following way: *"the probability of the model
given the data (P(M|D)) is the probability of the data given the
model (P(D|M)) times the prior probability of the model (P(M))
divided by the probability of the data (P(D))".*

Bayesian statistics, more precisely, the Cox theorems, tells us that
we should use Bayes rule to represent and manipulate our degree of
belief in some model or hypothesis. In other words, we should treat
degrees of beliefs in exactly the same way as we treat
probabilities. Thus, the prior P(M) above represents numerically how
much we believe model M to be the true model of the data
*before* we actually observe the data, and the posterior P(M|D)
represents how much we believe model M *after* observing the
data. See Chapters 1 and 2 of E T Jaynes' book.

We can think of machine learning as learning models of data. The Bayesian framework for machine learning states that you start out by enumerating all reasonable models of the data and assigning your prior belief P(M) to each of these models. Then, upon observing the data D, you evaluate how probable the data was under each of these models to compute P(D|M). Multiplying this likelihood by the prior and renormalizing results in the posterior probability over models P(M|D) which encapsulates everything that you have learned from the data regarding the possible models under consideration. Thus, to compare two models M and M', we need to compute their relative probability given the data: P(M)P(D|M) / P(M')P(D|M').

Incidentally, if our beliefs are not coherent, in other words, if they
violate the rules of probability which include Bayes rule, then the Dutch
Book theorem says that if we are willing to accept bets with odds
based on the strength of our beliefs, there always exists a set of
bets (called a "Dutch book") which we will accept but which is
*guaranteed to lose us money no matter what the outcome.* The
only way to avoid being swindled by a Dutch book is to be
Bayesian. This has important implications for Machine
Learning. If our goal is to design an ideally rational agent,
then this agent must represent and manipulate its beliefs using
the rules of probability.

In practice, for real world problem domains, applying Bayes rule
exactly is usually impractical because it involves summing or
integrating over too large a space of models. These computationally
intractable sums or integrals can be avoided by using **approximate
Bayesian methods**. There is a very large body of current research
on ways of doing approximate Bayesian machine learning. Some examples
of approximate Bayesian methods include Laplace's
approximation, variational
approximations, expectation
propagation, and Markov
chain Monte Carlo methods (many papers on MCMC can be found
in this repository)

**Bayesian decision theory **deals with the problem of making
optimal decisions -- that is, decisions or actions that minimize our
expected loss. Let's say we have a choice of taking one of k possible
actions A_{1} ... A_{k} and we are considering m
possible hypothesis for what the true model of the data is:
M_{1} ... M_{m}. Assume that if the true model of the
data is M_{i} and we take action A_{j} we incur a loss
of L_{ij} dollars. Then the optimal action
A^{*} given the data is the one that minimizes the expected
loss: In other words A^{*} is the action A_{j} which
has the smallest value of Σ_{i}
L_{ij}P(M_{i}|D)

We can derive the fundamentals of the branch of machine learning known as reinforcement learning from Bayesian sequential decision theory. See, for example, Michael Duff's PhD Thesis.

For a description of the debate between Bayesians and frequentists see Chapter 37 of David MacKay's excellent textbook.

Tom Minka provides a short but excellent description of some nuances in the use of probability, especially as it relates to machine learning and pattern recognition.

Zoubin Ghahramani (mail to

Last modified: Thu Nov 11 12:29:51 GMT 2004