Trees, Cycles, and Loopy Graphs

Trees Are Easy, Cycles Are Not

On a tree, inference is a finite recursion: eliminate a leaf, shrink the graph, repeat. There is no approximation, and the running time is linear in the number of nodes. This is why forward-backward, Viterbi, and Kalman filtering are textbook algorithms β€” the underlying graph is a tree.

Most graphs of practical interest are not trees. LDPC codes have cycles of length 6 or more. MIMO detection has very short cycles. Turbo codes have cycles determined by the interleaver. The point of this section is to understand exactly why cycles break the clean recursion, why loopy belief propagation (applying tree algorithms to loopy graphs) works anyway in many cases, and when it fails.

Definition:

Tree (in a Bipartite Graph)

A connected bipartite graph is a tree if it has no cycles. Equivalently, a connected factor graph with VV variable nodes, FF factor nodes, and EE edges is a tree iff E=V+Fβˆ’1E = V + F - 1.

A tree of n=V+Fn = V + F nodes has exactly nβˆ’1n - 1 edges. Adding an extra edge creates exactly one cycle. The cyclomatic number Eβˆ’Vβˆ’F+1E - V - F + 1 counts independent cycles.

Theorem: Exactness of Sum-Product on Trees

Let p(x)=1Z∏afa(xβˆ‚a)p(\mathbf{x}) = \frac{1}{Z}\prod_a f_a(\mathbf{x}_{\partial a}) be a distribution whose factor graph is a tree. Then the sum-product algorithm (Chapter 18), initialized at the leaves and propagated to any root node, computes the exact marginals p(xi)p(x_i) in one outward pass plus one inward pass, in time O(βˆ‘a∣Xβˆ£βˆ£βˆ‚a∣).O\left(\sum_a |\mathcal{X}|^{|\partial a|}\right).

On a tree, any variable xix_i separates the graph into disjoint subtrees rooted at ii. The incoming messages at ii from different subtrees are statistically independent given xix_i, because cutting ii disconnects the subtrees. Multiplying them therefore gives the exact marginal. Cycles destroy this independence β€” messages arriving at a node from different paths are correlated by the cycle, and multiplying them double-counts information.

,

Definition:

Girth of a Factor Graph

The girth of a factor graph is the length (number of edges) of its shortest cycle. Factor graphs without cycles have infinite girth. LDPC codes typically have girth 6 or 8 (short cycles are avoided during code construction).

Large girth is desirable for loopy belief propagation: if the graph locally looks like a tree within a radius of tt iterations, BP behaves nearly as if the graph were a tree for tt iterations. Short cycles cause early divergence from the tree analysis.

Theorem: Local Tree-Like Property

Let GG be a random bipartite graph with bounded degrees and NN variable nodes, drawn from the (dv,dc)(d_v, d_c)-regular ensemble. For any fixed depth tt, the probability that the depth-tt neighborhood of a uniformly random variable node is not a tree tends to zero as Nβ†’βˆžN \to \infty.

Random regular graphs are locally tree-like: zoom in on any node and the first few layers look like a branching process. Loopy BP run for t<girth(G)/2t < \text{girth}(G)/2 iterations computes exactly what it would on an actual tree. This is why density evolution (Chapter 18) gives tight asymptotic predictions for LDPC codes.

Definition:

Loopy Belief Propagation

Loopy belief propagation (loopy BP) applies the tree message-passing equations iteratively to a graph with cycles, regardless of the cycles. Messages are updated in rounds: ΞΌiβ†’a(t+1)(xi)=∏bβˆˆβˆ‚iβˆ–aΞΌbβ†’i(t)(xi),\mu_{i \to a}^{(t+1)}(x_i) = \prod_{b \in \partial i \setminus a} \mu_{b \to i}^{(t)}(x_i), ΞΌaβ†’i(t+1)(xi)=βˆ‘xβˆ‚aβˆ–ifa(xβˆ‚a)∏jβˆˆβˆ‚aβˆ–iΞΌjβ†’a(t)(xj).\mu_{a \to i}^{(t+1)}(x_i) = \sum_{\mathbf{x}_{\partial a \setminus i}} f_a(\mathbf{x}_{\partial a}) \prod_{j \in \partial a \setminus i} \mu_{j \to a}^{(t)}(x_j). Fixed points of this iteration are the Bethe stationary points. They approximate the true marginals; the quality depends on the graph's cycle structure.

Loopy BP is a heuristic extension of an exact tree algorithm. Its success is one of the surprises of modern information theory: on many practically important graphs (LDPC, turbo codes), loopy BP is empirically near-optimal.

Theorem: Fixed Points of Loopy BP Are Bethe Free Energy Stationary Points

The fixed points of loopy belief propagation on a factor graph are in one-to-one correspondence with the stationary points of the Bethe free energy FBethe[{ba,bi}]=βˆ‘aβˆ‘xβˆ‚aba(xβˆ‚a)log⁑ba(xβˆ‚a)fa(xβˆ‚a)βˆ’βˆ‘i(βˆ£βˆ‚iβˆ£βˆ’1)βˆ‘xibi(xi)log⁑bi(xi),\mathcal{F}_{\text{Bethe}}[\{b_a, b_i\}] = \sum_a \sum_{\mathbf{x}_{\partial a}} b_a(\mathbf{x}_{\partial a}) \log\frac{b_a(\mathbf{x}_{\partial a})}{f_a(\mathbf{x}_{\partial a})} - \sum_i (|\partial i| - 1) \sum_{x_i} b_i(x_i) \log b_i(x_i), subject to local marginalization constraints βˆ‘xβˆ‚aβˆ–iba(xβˆ‚a)=bi(xi)\sum_{\mathbf{x}_{\partial a \setminus i}} b_a(\mathbf{x}_{\partial a}) = b_i(x_i).

On trees, the Bethe free energy coincides with the exact (Gibbs) free energy, and the unique minimizer gives the true marginals. On loopy graphs, the Bethe approximation replaces a global entropy by a sum of local entropies adjusted for double-counting. The loopy BP update is a reparameterization of the Lagrangian KKT conditions. This identifies loopy BP as a specific variational approximation β€” with predictable failure modes.

,

When Does Loopy BP Work?

Loopy BP is guaranteed to be correct on trees. It is provably correct on a few special loopy graph families: single-cycle graphs, Gaussian graphical models with diagonally dominant precision matrices, and attractive pairwise binary models at high temperature. For most other loopy graphs, correctness is empirical: LDPC decoding, turbo decoding, and compressed sensing via BP/AMP all work remarkably well, but without closed-form guarantees.

The point is that loopy BP is a heuristic that shines under certain structural conditions: (i) the graph is locally tree-like, (ii) the factors are "weak" (low mutual information between distant variables), and (iii) iteration is combined with reasonable scheduling.

Common Mistake: Short Cycles Destroy Loopy BP

Mistake:

Running loopy BP on a graph with many cycles of length 4 (the shortest possible cycle in a bipartite graph).

Correction:

A 4-cycle (xiβˆ’faβˆ’xjβˆ’fbβˆ’xi)(x_i - f_a - x_j - f_b - x_i) causes the message ΞΌfaβ†’xi\mu_{f_a \to x_i} to depend on itself through two hops. Information flows back to the origin and gets double-counted almost immediately. Loopy BP on graphs with many 4-cycles typically diverges or converges to useless fixed points. When building LDPC codes, use progressive edge growth or similar algorithms to ensure girth β‰₯6\geq 6 (ideally β‰₯8\geq 8).

Effect of Cycle Length on Loopy BP

Visualize the marginal error of loopy BP on a ring graph (single cycle) of various lengths. Shorter cycles β†’ larger error.

Parameters
10
0.5
30

Example: Loopy BP on a Single Cycle

A binary model with nn variables on a cycle has pairwise factors fi,i+1(xi,xi+1)=exp⁑(ΞΈxixi+1)f_{i,i+1}(x_i, x_{i+1}) = \exp(\theta x_i x_{i+1}), xi∈{βˆ’1,+1}x_i \in \{-1, +1\}. Show that loopy BP converges and find the resulting marginal approximation.

πŸ”§Engineering Note

Message Scheduling Matters

In practice, loopy BP can be run with flooding (all messages updated simultaneously) or serial scheduling (messages updated one at a time). Serial scheduling typically converges twice as fast but requires careful ordering. For LDPC codes, the layered decoder interleaves check node and variable node updates and is the standard choice for hardware implementation.

Practical Constraints
  • β€’

    Flooding BP doubles iteration count vs. serial.

  • β€’

    Serial scheduling requires ordering β€” a non-trivial choice on arbitrary graphs.

  • β€’

    Parallelism vs. convergence speed is a hardware design tradeoff.

Why This Matters: 5G LDPC Decoders in Hardware

The LDPC codes in 5G NR use a structured parity-check matrix called a base graph, expanded by a circular permutation to obtain the full Tanner graph. This structure enables layered BP decoders in silicon with throughput above 10 Gb/s. The graph's girth is β‰₯6\geq 6 by construction, which ensures loopy BP converges well within 15-30 iterations.

Loopy BP on a Cycle: Message Flow

Animated visualization of message propagation around a cycle. Messages arriving at a node have already been "touched" by the target node β€” this is the double-counting that causes loopy BP to deviate from the true marginal.

Historical Note: Loopy BP: A Happy Accident

1988-2005

When Pearl formulated belief propagation in the 1980s, he proved its correctness on trees and warned that it would fail on loopy graphs. Then in 1993, Berrou, Glavieux, and Thitimajshima introduced turbo codes, effectively running loopy BP on a coupled convolutional code graph with spectacular empirical performance. Soon after, McEliece, MacKay, and Cheng (1998) recognized turbo decoding as loopy BP. This was a surprise: Pearl's pessimism was overturned by engineering practice. The theoretical understanding β€” via density evolution and Bethe free energy β€” followed in the 2000s.

Key Takeaway

On trees, sum-product message passing computes exact marginals in linear time. On loopy graphs, the same updates applied iteratively yield loopy BP β€” a heuristic whose fixed points are stationary points of the Bethe free energy. Loopy BP works well when the graph is locally tree-like and the factors are not too strong; it fails when short cycles dominate.