Reversibility and Detailed Balance
Running the Chain Backwards
Imagine recording a Markov chain in steady state and playing the recording backwards. For some chains, the reversed sequence is statistically indistinguishable from the forward chain β these are reversible chains. Reversibility has a simple algebraic characterization (detailed balance) that makes stationary distributions easy to find by inspection. It also provides the theoretical foundation for Markov chain Monte Carlo (MCMC), one of the most important computational tools in modern science and engineering.
Definition: Detailed Balance and Reversibility
Detailed Balance and Reversibility
A probability distribution on satisfies detailed balance with respect to the transition matrix if
A DTMC is reversible if its stationary distribution satisfies detailed balance.
Detailed balance says: in steady state, the probability flux from to equals the flux from to , for every pair of states. This is a stronger condition than stationarity (which only requires balance of total flow into and out of each state).
Theorem: Detailed Balance Implies Stationarity
If a probability distribution satisfies detailed balance with , then is a stationary distribution of .
Detailed balance is a sufficient (but not necessary) condition for stationarity. It decomposes the global balance equation into pairwise balance equations, one for each edge β a much easier system to verify.
Sum the detailed balance equations
Fix any state . Sum the detailed balance equation over all :
Recognize stationarity
The equation for all is precisely . Since and by assumption, is indeed a stationary distribution.
Birth-Death Chains Are Always Reversible
A birth-death chain on has transitions only between neighboring states: , , . Such chains are always reversible because the detailed balance equations reduce to a telescoping product:
This gives , and is set by normalization. Birth-death chains model many queueing systems (, ) and simple communication protocols.
Example: Ehrenfest Diffusion Model
Consider particles distributed between two containers A and B. At each step, a particle is chosen uniformly at random and moved to the other container. Let = number of particles in container A. Find the stationary distribution and verify reversibility.
Identify the chain
. From state :
- With probability , a particle moves from A to B: .
- With probability , a particle moves from B to A: .
This is a birth-death chain with and .
Solve via detailed balance
\pi_k = \pi_0 \binom{N}{k}$.
Normalize
\text{Binomial}(N, 1/2)N$ independent fair coin flips.
Metropolis-Hastings Algorithm
Complexity: Each step requires computation (one proposal, one acceptance check). The total cost for samples is , but the effective sample size depends on the mixing time of the chain.The acceptance ratio is designed so that the resulting chain satisfies detailed balance with respect to : . When is symmetric (), the ratio simplifies to β this is the original Metropolis algorithm.
Historical Note: From Los Alamos to Modern Machine Learning
Mid 20th century to presentThe Metropolis algorithm was invented at Los Alamos in 1953 by Nicholas Metropolis, Arianna Rosenbluth, Marshall Rosenbluth, Augusta Teller, and Edward Teller, originally to simulate the equation of state of hard-sphere liquids. W. K. Hastings generalized it in 1970 to asymmetric proposals. MCMC remained a niche tool until the 1990s, when the Gibbs sampler (Geman & Geman, 1984) revolutionized Bayesian statistics. Today MCMC is ubiquitous: Bayesian inference, statistical physics, combinatorial optimization, phylogenetics, and training of certain machine learning models all rely on it.
Mixing Time and MCMC in Practice
In practice, the convergence theorem guarantees that MCMC eventually produces samples from the target distribution, but the mixing time β how many steps until the distribution is close to β can be enormous. Poorly chosen proposal distributions lead to slow mixing: the chain gets "stuck" in a region of state space. Diagnostics include:
- Trace plots: visual inspection for stationarity
- Autocorrelation: high autocorrelation means slow mixing
- Effective sample size: accounts for correlation between samples
- Multiple chains (Gelman-Rubin diagnostic): compare within-chain to between-chain variance
- β’
Burn-in period must be discarded (typically 10-50% of total samples)
- β’
Thinning (keeping every k-th sample) reduces storage but not computation
- β’
Proposal variance should be tuned so acceptance rate is roughly 20-50%
Quick Check
Which of the following chains is NOT reversible?
A birth-death chain on
A random walk on a cycle with asymmetric probabilities
The Ehrenfest diffusion model
A symmetric random walk on a complete graph
A directed cycle with , violates detailed balance: but .
Key Takeaway
Detailed balance () is a sufficient condition for stationarity that decomposes the global balance equation into pairwise conditions. All birth-death chains are reversible. The Metropolis-Hastings algorithm exploits detailed balance to construct chains that sample from any desired target distribution.