Reversibility and Detailed Balance

Running the Chain Backwards

Imagine recording a Markov chain in steady state and playing the recording backwards. For some chains, the reversed sequence is statistically indistinguishable from the forward chain β€” these are reversible chains. Reversibility has a simple algebraic characterization (detailed balance) that makes stationary distributions easy to find by inspection. It also provides the theoretical foundation for Markov chain Monte Carlo (MCMC), one of the most important computational tools in modern science and engineering.

Definition:

Detailed Balance and Reversibility

A probability distribution Ο€\boldsymbol{\pi} on S\mathcal{S} satisfies detailed balance with respect to the transition matrix P\mathbf{P} if

Ο€i pij=Ο€j pji,forΒ allΒ i,j∈S.\pi_i \, p_{ij} = \pi_j \, p_{ji}, \quad \text{for all } i, j \in \mathcal{S}.

A DTMC is reversible if its stationary distribution Ο€\boldsymbol{\pi} satisfies detailed balance.

Detailed balance says: in steady state, the probability flux from ii to jj equals the flux from jj to ii, for every pair of states. This is a stronger condition than stationarity (which only requires balance of total flow into and out of each state).

,

Theorem: Detailed Balance Implies Stationarity

If a probability distribution Ο€\boldsymbol{\pi} satisfies detailed balance with P\mathbf{P}, then Ο€\boldsymbol{\pi} is a stationary distribution of P\mathbf{P}.

Detailed balance is a sufficient (but not necessary) condition for stationarity. It decomposes the global balance equation Ο€=Ο€P\boldsymbol{\pi} = \boldsymbol{\pi}\mathbf{P} into pairwise balance equations, one for each edge β€” a much easier system to verify.

,

Birth-Death Chains Are Always Reversible

A birth-death chain on {0,1,2,…}\{0, 1, 2, \ldots\} has transitions only between neighboring states: pi,i+1=bip_{i,i+1} = b_i, pi,iβˆ’1=dip_{i,i-1} = d_i, pii=1βˆ’biβˆ’dip_{ii} = 1 - b_i - d_i. Such chains are always reversible because the detailed balance equations reduce to a telescoping product:

Ο€k+1=Ο€kbkdk+1,k=0,1,2,…\pi_{k+1} = \pi_k \frac{b_k}{d_{k+1}}, \quad k = 0, 1, 2, \ldots

This gives Ο€k=Ο€0∏m=0kβˆ’1bmdm+1\pi_k = \pi_0 \prod_{m=0}^{k-1} \frac{b_m}{d_{m+1}}, and Ο€0\pi_0 is set by normalization. Birth-death chains model many queueing systems (M/M/1M/M/1, M/M/cM/M/c) and simple communication protocols.

,

Example: Ehrenfest Diffusion Model

Consider NN particles distributed between two containers A and B. At each step, a particle is chosen uniformly at random and moved to the other container. Let XnX_n = number of particles in container A. Find the stationary distribution and verify reversibility.

,

Metropolis-Hastings Algorithm

Complexity: Each step requires O(1)O(1) computation (one proposal, one acceptance check). The total cost for NN samples is O(N)O(N), but the effective sample size depends on the mixing time of the chain.
Input: Target distribution Ο€\boldsymbol{\pi}, proposal matrix Q=[qij]Q = [q_{ij}]
Output: Samples X0,X1,X2,…X_0, X_1, X_2, \ldots from a chain with stationary distribution Ο€\boldsymbol{\pi}
1. Initialize X0=x0X_0 = x_0 (arbitrary starting state)
2. For n=0,1,2,…n = 0, 1, 2, \ldots:
a. Given Xn=iX_n = i, propose Y∼qi,β‹…Y \sim q_{i,\cdot} (sample from row ii of QQ)
b. Compute acceptance ratio:
Ξ±(i,Y)=min⁑ ⁣(1,β€…β€ŠΟ€Y qY,iΟ€i qi,Y)\displaystyle \alpha(i, Y) = \min\!\left(1,\; \frac{\pi_Y \, q_{Y,i}}{\pi_i \, q_{i,Y}}\right)
c. With probability Ξ±(i,Y)\alpha(i, Y): accept, set Xn+1=YX_{n+1} = Y
Otherwise: reject, set Xn+1=iX_{n+1} = i
3. Return the sequence {Xn}\{X_n\}

The acceptance ratio is designed so that the resulting chain satisfies detailed balance with respect to Ο€\boldsymbol{\pi}: Ο€i pij=Ο€i qij α(i,j)=min⁑(Ο€iqij,Ο€jqji)=Ο€j qji α(j,i)=Ο€j pji\pi_i \, p_{ij} = \pi_i \, q_{ij} \, \alpha(i,j) = \min(\pi_i q_{ij}, \pi_j q_{ji}) = \pi_j \, q_{ji} \, \alpha(j,i) = \pi_j \, p_{ji}. When QQ is symmetric (qij=qjiq_{ij} = q_{ji}), the ratio simplifies to Ξ±(i,j)=min⁑(1,Ο€j/Ο€i)\alpha(i,j) = \min(1, \pi_j / \pi_i) β€” this is the original Metropolis algorithm.

,

Historical Note: From Los Alamos to Modern Machine Learning

Mid 20th century to present

The Metropolis algorithm was invented at Los Alamos in 1953 by Nicholas Metropolis, Arianna Rosenbluth, Marshall Rosenbluth, Augusta Teller, and Edward Teller, originally to simulate the equation of state of hard-sphere liquids. W. K. Hastings generalized it in 1970 to asymmetric proposals. MCMC remained a niche tool until the 1990s, when the Gibbs sampler (Geman & Geman, 1984) revolutionized Bayesian statistics. Today MCMC is ubiquitous: Bayesian inference, statistical physics, combinatorial optimization, phylogenetics, and training of certain machine learning models all rely on it.

,
⚠️Engineering Note

Mixing Time and MCMC in Practice

In practice, the convergence theorem guarantees that MCMC eventually produces samples from the target distribution, but the mixing time β€” how many steps until the distribution is close to Ο€\boldsymbol{\pi} β€” can be enormous. Poorly chosen proposal distributions lead to slow mixing: the chain gets "stuck" in a region of state space. Diagnostics include:

  • Trace plots: visual inspection for stationarity
  • Autocorrelation: high autocorrelation means slow mixing
  • Effective sample size: accounts for correlation between samples
  • Multiple chains (Gelman-Rubin diagnostic): compare within-chain to between-chain variance
Practical Constraints
  • β€’

    Burn-in period must be discarded (typically 10-50% of total samples)

  • β€’

    Thinning (keeping every k-th sample) reduces storage but not computation

  • β€’

    Proposal variance should be tuned so acceptance rate is roughly 20-50%

Quick Check

Which of the following chains is NOT reversible?

A birth-death chain on {0,1,2,3}\{0, 1, 2, 3\}

A random walk on a cycle 1β†’2β†’3β†’11 \to 2 \to 3 \to 1 with asymmetric probabilities

The Ehrenfest diffusion model

A symmetric random walk on a complete graph

Key Takeaway

Detailed balance (Ο€ipij=Ο€jpji\pi_i p_{ij} = \pi_j p_{ji}) is a sufficient condition for stationarity that decomposes the global balance equation into pairwise conditions. All birth-death chains are reversible. The Metropolis-Hastings algorithm exploits detailed balance to construct chains that sample from any desired target distribution.