Metropolis-Hastings algorithms are a class of Markov chains which are commonly
used to perform large scale calculations and simulations in Physics and
Statistics. The button below opens a separate window from your browser
containing a demonstation of some of the most common chains which are used for this
purpose. The window is resizable, and you may need to adjust its
dimensions depending on your system configuration.
MCMC methods in a nutshell
Two common problems which are approached via M-H algorithms are simulation
and numerical integration.
The former problem consists in generating a random variable with a prescribed distribution
say, which is typically known only to within a constant factor. The MCMC answer is to
construct a Markov chain, that is a random process , in such a way that its equilibrium
distribution is the "target" .
By generating successive values , , etc. it can be shown
that the distribution of when is large is close to .
We may therefore say that for sufficiently large, the random variable
is approximately the variable we are looking for.
Moreover after that, , etc. are all distributed close to also.
The latter problem consists in calculating the value of a (typically complicated) integral which
we write as
It is very rare that integrals which arise in real problems can be
evaluated explicitly in closed form, and the best one can hope for in general is an approximate
numerical value generated by a computer. The MCMC approach consists in defining as before
a Markov chain with equilibrium distribution , and to use the theorem that
In other words, when is large, the average value of up to time
is approximately equal to the value of the integral we were looking for.
There are many practical problems associated with the above ideas, which to explain here is
beyond the scope of this document. Suffice to say that the main question of deciding how large to
take for any given problem is far from solved, although many individual results are known.
You can read more about MCMC methods in Bayesian statistics in the following papers (or read
any of the growing number of books on the subject - see your local library for details)
- Tierney, L. (1994) Markov chains for exploring posterior distributions.
The Annals of Statistics, Vol. 22, No. 4, 1701-1762.
- Smith, A.F.M. and Roberts, G.O. (1993) Bayesian computation via the Gibbs sampler and related
Markov chain Monte Carlo methods. J. Roy. Stat. Soc. B 55, 1, 3-23.
For up-to-date information and the latest techniques, check out the
MCMC
preprint server, the McDiag
Home Page and David Wilson's Perfect
Simulation Home Page.
Description of the applet
The applet you are seeing consists of a window which is subdivided into a number of regions.
The left half of the display consists of some animated graphs, while the right half consists of
some input fields and choice lists, which allow you influence what you see on the left.
Choosing the target distribution
The top third of the screen consists of a graph of some distribution defined on the interval
[0,1]. The first list box on the right allows you to choose the type of distribution .
By default, is Gaussian. As you make a selection, some input fields with various parameters
will be displayed just below, and you can change these too.
One of the choices in the list is the "user defined" distribution. When you select this, the box with
the graph of p will be initially blank. You should see four buttons on the right, marked "New",
"Accept", "Load" and "Save". Press the "New" button, and the graph should turn orange. Click
somewhere in the orange box, and start dragging the mouse to draw a density. Dragging to the
left erases part of the graph. Once you are happy with the distribution, press the button
"Accept". The drawing screen should turn white again, and the graph you have just drawn is
rescaled and displayed. To draw another, press "New" again etc. The "Load" and "Save"
buttons allow you to load and save distributions which you have drawn, but for Java security
reasons, these will probably not function properly over the web.
Choosing the algorithm
The middle third of the screen consists of a moving red line on the left just below the graph of ,
and some controls on the right. The red line represents the path of the Markov chain , while
the controls on the right allow you to choose the type of chain that is running.
By clicking anywhere in the region where the red line is, you can place the chain at a new
starting point and see how it evolves from there. Try selecting the Trimodal distribution in the top
list box, and placing the Markov chain in each of the three bell curves.
Each of the algorithms featured in this applet functions in basically the same way. Suppose that
the random variable is known, and generate as follows:
propose a move to a new location according to some prescribed rule (we must be able
to write down a probability density function of going from to )
accept or reject the proposed move according to the probability
which preserves the equilibrium distribution .
The algorithms below differ only in the proposal rule. We now give brief descriptions of all the
algorithms you can run.
Gaussian Random Walk Metropolis
Given , this algorithm proposes a move to , where
is a standard Gaussian random variable, and therefore regulates the size of the proposed jump.
Much is known about this algorithm. See for example the following papers:
-
Mengersen, K.L. and Tweedie, R.L. (1996) Rates of convergence of the Hastings and
Metropolis algorithms. Ann. Statist. 24, No. 1, 101-121.
-
Gelman, A., Roberts, G.O. and Gilks, W.R. (1996) Efficient Metropolis jumping rules.
Bayesian statistics, 5 (Alicante, 1994), 599--607, Oxford Sci. Publ., Oxford Univ. Press, New
York.
Uniform Random Walk Metropolis
Given , this algorithm proposes a move to , where
is uniformly distributed over
the range [-1,1].
Langevin
Given , the proposed move is to .
Unlike the Random Walk Metropolis, this algorithm uses some information about the shape of to
make better proposals, which have a higher chance of being accepted.
Some of the things known about this algorithm can be found in the following papers:
-
Roberts, G.O. and Tweedie, R.L. (1996) Exponential convergence of Langevin
distributions and their discrete approximations. Bernoulli 2, no. 4, 341--363.
-
Roberts, G.O. and Rosenthal, J.S. (1995) Optimal Scaling of Discrete Approximations
to Langevin Diffusions. University of Cambridge Research Report.
Lévy Flight Metropolis
This is a variant of the Gaussian random walk Metropolis algorithm, where the proposal is still
, but now the random variable has a symmetric stable distribution.
The great French probabilist Paul Lévy pioneered the study of these random variables.
When , the variable is simply a standard Gaussian. If ,
has a Cauchy distribution. The smaller
is, the heavier the tails of the distribution of .
Student t random walk
Like the Levy Flight Metropolis algorithm, but this time the proposal increment has a Student
t distribution with , etc. degrees of freedom.
U[0,1] Independence Sampler
The proposal is uniformly distributed over the interval [0,1].
This is independent of the current
position of the algorithm, and that is why it is called an Independence Sampler.
Some of the known properties of this type of algorithm can be found in the following paper:
-
Mengersen, K.L. and Tweedie, R.L. (1996) Rates of convergence of the Hastings and
-
Metropolis algorithms. Ann. Statist. 24, No. 1, 101-121.
General Independence Sampler
As above, is independent of the current value . However this time, you can draw the
distribution of yourself!
Gareth's diffusion algorithm
This was suggested by Gareth Roberts to be a "mode hopping" algorithm. The proposal is
, where is a
Gaussian random variable and is the maximum
value of (since this appears in a ratio, we only need to know the function
to within a constant factor).
The idea is that whenever the chain is in a valley with low density, it should increase the
variance of the jump so as to find the next mode more quickly.
Gaussian Autoregressive(1)
The proposal is , where is a standard Gaussian.
This is a generalization of both the Independence Sampler and the Random Walk Metropolis chain.
If this strategy shrinks the current state toward the point a before adding
the increment . If then the current state is first reflected about the
point .
Hamiltonian (Hybrid) Monte Carlo
This algorithm is an example of the use of auxiliary variables, and operates a little differently
from the other algorithms above. First, we augment the number of variables which p depends on,
by taking the product of p with a standard Gaussian. This gives a two dimensional state space,
consisting of "position" and "momentum" variables. The algorithm proceeds from a value
in phase space to a new value
by first randomizing according to the
autoregressive formula where
is a standard Gaussian. After that, we use an Euler-type "leapfrog" scheme to move from the
value to the new value
along the contour curves of the Hamiltonian .
This last value is
accepted or rejected according to the usual Metropolis-Hastings rule.
The orange and yellow bars which you can see scrolling in step with the red path of the chain
represent the current values of kinetic energy (orange) and potential energy (yellow) of the
chain in phase space.
When , there is no randomization of momentum. In that case, the total energy will vary
very little (due to errors in the leapfrog scheme) and the random variables do not converge to
. When , the algorithm behaves like the Langevin algorithm.
Diagnostics
In the lower third of the applet window, you will see on the left a box with a continuously
updated histogram. On the right, you should see a plot of the estimated autocorrelation function
(ACF) for the position of the chain. Just above that, there is a reset button, and two input fields
which allow you to increase the number of bins in the histogram and the maximum lag used for
computing the ACF.
Histogram
On the left side of the window, you can see a continually updated histogram of the position of the
chain. As described earlier, for any function , the time averaged value of
should converge
to the value . If we pick a point and let
for all in a narrow bin around the
value , but set otherwise, we can get an estimate of
as the bin gets ever narrower. The theoretical value of this estimate is shown by the height of the grey bins, whereas
the blue bins represent the current computed average. As the number of iterations increases,
the blue bins converge to the corresponding grey bins.
A measure of the distance of the blue histogram to its theoretical limit (grey) is given by the total
variation (TV) norm, which is a number between 0 and 2. The number computed is not the total
variation distance of the law of to .
On the histogram display, you can see also the proportion of all moves of the chain where the
proposal was accepted. By setting the iteration delay on the right side to zero, the chain may be
speeded up a little. Please remember that Java is an interpreted language, so these algorithms
will run much faster if implemented in a language such as C.
ACF plot
On top of the histogram, you can also see two vertical lines, one green and one red, which is
moving.
These represent respectively the location of the mean of p, and its estimate via the time
averaged value of the position of the chain. Eventually, the two lines should coincide.
It is possible to measure the correlation between the successive values of the position of used
when estimating the mean. This is done by the autocorrelation function plotted to the right of the
histogram. When tends to stay very close to , the ACF varies very slowly.
It is possible to estimate the "efficiency" of the Markov chain in estimating the value
for any integrable function . This is done for the position function
, which
estimates the mean of (green vertical line). A value close to 1 means that the chain is
"efficient" for that function . The estimated efficiency in this applet is computed as follows,
where denotes the current number of iterations, and is the maximum
autocorrelation lag plotted on screen:
and then .
|