Ferkans — Interactive Telecom Tutor

Running the Chain Backwards

Imagine recording a Markov chain in steady state and playing the recording backwards. For some chains, the reversed sequence is statistically indistinguishable from the forward chain — these are reversible chains. Reversibility has a simple algebraic characterization (detailed balance) that makes stationary distributions easy to find by inspection. It also provides the theoretical foundation for Markov chain Monte Carlo (MCMC), one of the most important computational tools in modern science and engineering.

Definition:
Detailed Balance and Reversibility

A probability distribution $\boldsymbol{\pi}$ on $\mathcal{S}$ satisfies detailed balance with respect to the transition matrix $\mathbf{P}$ if

$\pi_i \, p_{ij} = \pi_j \, p_{ji}, \quad \text{for all } i, j \in \mathcal{S}.$

A DTMC is reversible if its stationary distribution $\boldsymbol{\pi}$ satisfies detailed balance.

Detailed balance says: in steady state, the probability flux from $i$ to $j$ equals the flux from $j$ to $i$ , for every pair of states. This is a stronger condition than stationarity (which only requires balance of total flow into and out of each state).

,

Theorem: Detailed Balance Implies Stationarity

If a probability distribution $\boldsymbol{\pi}$ satisfies detailed balance with $\mathbf{P}$ , then $\boldsymbol{\pi}$ is a stationary distribution of $\mathbf{P}$ .

Detailed balance is a sufficient (but not necessary) condition for stationarity. It decomposes the global balance equation $\boldsymbol{\pi} = \boldsymbol{\pi}\mathbf{P}$ into pairwise balance equations, one for each edge — a much easier system to verify.

Proof

Sum the detailed balance equations

Fix any state $j$ . Sum the detailed balance equation $\pi_i p_{ij} = \pi_j p_{ji}$ over all $i \in \mathcal{S}$ :

$\sum_{i \in \mathcal{S}} \pi_i p_{ij} = \sum_{i \in \mathcal{S}} \pi_j p_{ji} = \pi_j \sum_{i \in \mathcal{S}} p_{ji} = \pi_j \cdot 1 = \pi_j.$

Recognize stationarity

The equation $\sum_i \pi_i p_{ij} = \pi_j$ for all $j$ is precisely $\boldsymbol{\pi} \mathbf{P} = \boldsymbol{\pi}$ . Since $\pi_i \geq 0$ and $\sum_i \pi_i = 1$ by assumption, $\boldsymbol{\pi}$ is indeed a stationary distribution. $\blacksquare$

,

Birth-Death Chains Are Always Reversible

A birth-death chain on $\{0, 1, 2, \ldots\}$ has transitions only between neighboring states: $p_{i,i+1} = b_i$ , $p_{i,i-1} = d_i$ , $p_{ii} = 1 - b_i - d_i$ . Such chains are always reversible because the detailed balance equations reduce to a telescoping product:

$\pi_{k+1} = \pi_k \frac{b_k}{d_{k+1}}, \quad k = 0, 1, 2, \ldots$

This gives $\pi_k = \pi_0 \prod_{m=0}^{k-1} \frac{b_m}{d_{m+1}}$ , and $\pi_0$ is set by normalization. Birth-death chains model many queueing systems ( $M/M/1$ , $M/M/c$ ) and simple communication protocols.

,

Example: Ehrenfest Diffusion Model

Consider $N$ particles distributed between two containers A and B. At each step, a particle is chosen uniformly at random and moved to the other container. Let $X_n$ = number of particles in container A. Find the stationary distribution and verify reversibility.

Solution

Identify the chain

$\mathcal{S} = \{0, 1, \ldots, N\}$ . From state $i$ :

With probability $i/N$ , a particle moves from A to B: $p_{i,i-1} = i/N$ .
With probability $(N-i)/N$ , a particle moves from B to A: $p_{i,i+1} = (N-i)/N$ .

This is a birth-death chain with $b_i = (N-i)/N$ and $d_i = i/N$ .

Solve via detailed balance

$\pi_{k+1} = \pi_k \frac{b_k}{d_{k+1}} = \pi_k \frac{(N-k)/N}{(k+1)/N} = \pi_k \frac{N-k}{k+1}.$ $By induction:$ \pi_k = \pi_0 \binom{N}{k}$.

Normalize

$\sum_{k=0}^{N} \pi_k = \pi_0 \sum_{k=0}^{N} \binom{N}{k} = \pi_0 \cdot 2^N = 1,KATEXPLACEHOLDER0END\pi_k = \binom{N}{k} 2^{-N}, \quad k = 0, 1, \ldots, N.$ $This is$ \text{Binomial}(N, 1/2) $! The steady-state distribution of particles in container A follows the same distribution as$ N$ independent fair coin flips.

,

Metropolis-Hastings Algorithm

Complexity: Each step requires

O(1)

computation (one proposal, one acceptance check). The total cost for

N

samples is

O(N)

, but the effective sample size depends on the mixing time of the chain.

Input: Target distribution

\boldsymbol{\pi}

, proposal matrix

Q = [q_{ij}]

Output: Samples

X_0, X_1, X_2, \ldots

from a chain with stationary distribution

\boldsymbol{\pi}

1. Initialize

X_0 = x_0

(arbitrary starting state)

2. For

n = 0, 1, 2, \ldots

:

a. Given

X_n = i

, propose

Y \sim q_{i,\cdot}

(sample from row

i

of

Q

)

b. Compute acceptance ratio:

\displaystyle \alpha(i, Y) = \min\!\left(1,\; \frac{\pi_Y \, q_{Y,i}}{\pi_i \, q_{i,Y}}\right)

c. With probability

\alpha(i, Y)

: accept, set

X_{n+1} = Y

Otherwise: reject, set

X_{n+1} = i

3. Return the sequence

\{X_n\}

The acceptance ratio is designed so that the resulting chain satisfies detailed balance with respect to $\boldsymbol{\pi}$ : $\pi_i \, p_{ij} = \pi_i \, q_{ij} \, \alpha(i,j) = \min(\pi_i q_{ij}, \pi_j q_{ji}) = \pi_j \, q_{ji} \, \alpha(j,i) = \pi_j \, p_{ji}$ . When $Q$ is symmetric ( $q_{ij} = q_{ji}$ ), the ratio simplifies to $\alpha(i,j) = \min(1, \pi_j / \pi_i)$ — this is the original Metropolis algorithm.

,

Historical Note: From Los Alamos to Modern Machine Learning

Mid 20th century to present

The Metropolis algorithm was invented at Los Alamos in 1953 by Nicholas Metropolis, Arianna Rosenbluth, Marshall Rosenbluth, Augusta Teller, and Edward Teller, originally to simulate the equation of state of hard-sphere liquids. W. K. Hastings generalized it in 1970 to asymmetric proposals. MCMC remained a niche tool until the 1990s, when the Gibbs sampler (Geman & Geman, 1984) revolutionized Bayesian statistics. Today MCMC is ubiquitous: Bayesian inference, statistical physics, combinatorial optimization, phylogenetics, and training of certain machine learning models all rely on it.

,

⚠️Engineering Note

Mixing Time and MCMC in Practice

In practice, the convergence theorem guarantees that MCMC eventually produces samples from the target distribution, but the mixing time — how many steps until the distribution is close to $\boldsymbol{\pi}$ — can be enormous. Poorly chosen proposal distributions lead to slow mixing: the chain gets "stuck" in a region of state space. Diagnostics include:

Trace plots: visual inspection for stationarity
Autocorrelation: high autocorrelation means slow mixing
Effective sample size: accounts for correlation between samples
Multiple chains (Gelman-Rubin diagnostic): compare within-chain to between-chain variance

Practical Constraints

•
Burn-in period must be discarded (typically 10-50% of total samples)
•
Thinning (keeping every k-th sample) reduces storage but not computation
•
Proposal variance should be tuned so acceptance rate is roughly 20-50%

Quick Check

Which of the following chains is NOT reversible?

A birth-death chain on $\{0, 1, 2, 3\}$

A random walk on a cycle $1 \to 2 \to 3 \to 1$ with asymmetric probabilities

The Ehrenfest diffusion model

A symmetric random walk on a complete graph

Correction:

A random walk on a cycle

1 \to 2 \to 3 \to 1

with asymmetric probabilities

A directed cycle with $p_{12} > 0$ , $p_{21} = 0$ violates detailed balance: $\pi_1 p_{12} > 0$ but $\pi_2 p_{21} = 0$ .

Key Takeaway

Detailed balance ( $\pi_i p_{ij} = \pi_j p_{ji}$ ) is a sufficient condition for stationarity that decomposes the global balance equation into pairwise conditions. All birth-death chains are reversible. The Metropolis-Hastings algorithm exploits detailed balance to construct chains that sample from any desired target distribution.

Reversibility and Detailed Balance