Trees, Cycles, and Loopy Graphs
Trees Are Easy, Cycles Are Not
On a tree, inference is a finite recursion: eliminate a leaf, shrink the graph, repeat. There is no approximation, and the running time is linear in the number of nodes. This is why forward-backward, Viterbi, and Kalman filtering are textbook algorithms β the underlying graph is a tree.
Most graphs of practical interest are not trees. LDPC codes have cycles of length 6 or more. MIMO detection has very short cycles. Turbo codes have cycles determined by the interleaver. The point of this section is to understand exactly why cycles break the clean recursion, why loopy belief propagation (applying tree algorithms to loopy graphs) works anyway in many cases, and when it fails.
Definition: Tree (in a Bipartite Graph)
Tree (in a Bipartite Graph)
A connected bipartite graph is a tree if it has no cycles. Equivalently, a connected factor graph with variable nodes, factor nodes, and edges is a tree iff .
A tree of nodes has exactly edges. Adding an extra edge creates exactly one cycle. The cyclomatic number counts independent cycles.
Theorem: Exactness of Sum-Product on Trees
Let be a distribution whose factor graph is a tree. Then the sum-product algorithm (Chapter 18), initialized at the leaves and propagated to any root node, computes the exact marginals in one outward pass plus one inward pass, in time
On a tree, any variable separates the graph into disjoint subtrees rooted at . The incoming messages at from different subtrees are statistically independent given , because cutting disconnects the subtrees. Multiplying them therefore gives the exact marginal. Cycles destroy this independence β messages arriving at a node from different paths are correlated by the cycle, and multiplying them double-counts information.
Message definition
For each oriented edge , define the message , and for each oriented edge , .
Base case (leaves)
A leaf variable node sends (no incoming messages). A leaf factor node sends directly.
Propagation terminates on trees
Each edge is used exactly twice (once in each direction). Messages on any directed edge depend only on messages on edges further from the target β so the recursion computes finitely many unique messages.
Marginals are exact
The product equals the true marginal up to normalization. The proof is by induction on subtree size: messages from disjoint subtrees factor the joint distribution cleanly.
Definition: Girth of a Factor Graph
Girth of a Factor Graph
The girth of a factor graph is the length (number of edges) of its shortest cycle. Factor graphs without cycles have infinite girth. LDPC codes typically have girth 6 or 8 (short cycles are avoided during code construction).
Large girth is desirable for loopy belief propagation: if the graph locally looks like a tree within a radius of iterations, BP behaves nearly as if the graph were a tree for iterations. Short cycles cause early divergence from the tree analysis.
Theorem: Local Tree-Like Property
Let be a random bipartite graph with bounded degrees and variable nodes, drawn from the -regular ensemble. For any fixed depth , the probability that the depth- neighborhood of a uniformly random variable node is not a tree tends to zero as .
Random regular graphs are locally tree-like: zoom in on any node and the first few layers look like a branching process. Loopy BP run for iterations computes exactly what it would on an actual tree. This is why density evolution (Chapter 18) gives tight asymptotic predictions for LDPC codes.
Branching process construction
Start at a variable node and grow its neighborhood breadth-first. At each step, choose a random partner for each open edge.
Cycle-generating probability
A cycle of length appears if two branches meet within hops. For fixed and , this probability is β vanishing.
Tree approximation
Conditioned on the depth- neighborhood being a tree, BP for iterations computes the same message distribution as on a true tree.
Definition: Loopy Belief Propagation
Loopy Belief Propagation
Loopy belief propagation (loopy BP) applies the tree message-passing equations iteratively to a graph with cycles, regardless of the cycles. Messages are updated in rounds: Fixed points of this iteration are the Bethe stationary points. They approximate the true marginals; the quality depends on the graph's cycle structure.
Loopy BP is a heuristic extension of an exact tree algorithm. Its success is one of the surprises of modern information theory: on many practically important graphs (LDPC, turbo codes), loopy BP is empirically near-optimal.
Theorem: Fixed Points of Loopy BP Are Bethe Free Energy Stationary Points
The fixed points of loopy belief propagation on a factor graph are in one-to-one correspondence with the stationary points of the Bethe free energy subject to local marginalization constraints .
On trees, the Bethe free energy coincides with the exact (Gibbs) free energy, and the unique minimizer gives the true marginals. On loopy graphs, the Bethe approximation replaces a global entropy by a sum of local entropies adjusted for double-counting. The loopy BP update is a reparameterization of the Lagrangian KKT conditions. This identifies loopy BP as a specific variational approximation β with predictable failure modes.
Lagrangian
Form the Lagrangian of with Lagrange multipliers enforcing the marginalization constraints.
Stationarity conditions
Setting and gives an exponential-family form with the messages playing the role of the Lagrange multipliers.
Identify BP equations
Reparameterizing, , the stationarity conditions reduce to the loopy BP fixed-point equations.
When Does Loopy BP Work?
Loopy BP is guaranteed to be correct on trees. It is provably correct on a few special loopy graph families: single-cycle graphs, Gaussian graphical models with diagonally dominant precision matrices, and attractive pairwise binary models at high temperature. For most other loopy graphs, correctness is empirical: LDPC decoding, turbo decoding, and compressed sensing via BP/AMP all work remarkably well, but without closed-form guarantees.
The point is that loopy BP is a heuristic that shines under certain structural conditions: (i) the graph is locally tree-like, (ii) the factors are "weak" (low mutual information between distant variables), and (iii) iteration is combined with reasonable scheduling.
Common Mistake: Short Cycles Destroy Loopy BP
Mistake:
Running loopy BP on a graph with many cycles of length 4 (the shortest possible cycle in a bipartite graph).
Correction:
A 4-cycle causes the message to depend on itself through two hops. Information flows back to the origin and gets double-counted almost immediately. Loopy BP on graphs with many 4-cycles typically diverges or converges to useless fixed points. When building LDPC codes, use progressive edge growth or similar algorithms to ensure girth (ideally ).
Effect of Cycle Length on Loopy BP
Visualize the marginal error of loopy BP on a ring graph (single cycle) of various lengths. Shorter cycles β larger error.
Parameters
Example: Loopy BP on a Single Cycle
A binary model with variables on a cycle has pairwise factors , . Show that loopy BP converges and find the resulting marginal approximation.
Set up the transfer matrix
The exact partition function is where with eigenvalues (for ). Hence .
Loopy BP fixed point
By symmetry, all messages are equal: for some . The BP fixed-point equation is . Solving: , i.e., the fixed point gives messages matching the pairwise factor.
Marginal approximation
Loopy BP marginal: . True marginal: by symmetry . Loopy BP predicts a skewed marginal when β wrong. The error is the signature of the cycle.
Moral
For this small example, loopy BP's partition function is , missing the correction. In the large- limit the error vanishes, since . Loopy BP becomes exact as cycles become long β consistent with the locally tree-like argument.
Message Scheduling Matters
In practice, loopy BP can be run with flooding (all messages updated simultaneously) or serial scheduling (messages updated one at a time). Serial scheduling typically converges twice as fast but requires careful ordering. For LDPC codes, the layered decoder interleaves check node and variable node updates and is the standard choice for hardware implementation.
- β’
Flooding BP doubles iteration count vs. serial.
- β’
Serial scheduling requires ordering β a non-trivial choice on arbitrary graphs.
- β’
Parallelism vs. convergence speed is a hardware design tradeoff.
Why This Matters: 5G LDPC Decoders in Hardware
The LDPC codes in 5G NR use a structured parity-check matrix called a base graph, expanded by a circular permutation to obtain the full Tanner graph. This structure enables layered BP decoders in silicon with throughput above 10 Gb/s. The graph's girth is by construction, which ensures loopy BP converges well within 15-30 iterations.
Loopy BP on a Cycle: Message Flow
Historical Note: Loopy BP: A Happy Accident
1988-2005When Pearl formulated belief propagation in the 1980s, he proved its correctness on trees and warned that it would fail on loopy graphs. Then in 1993, Berrou, Glavieux, and Thitimajshima introduced turbo codes, effectively running loopy BP on a coupled convolutional code graph with spectacular empirical performance. Soon after, McEliece, MacKay, and Cheng (1998) recognized turbo decoding as loopy BP. This was a surprise: Pearl's pessimism was overturned by engineering practice. The theoretical understanding β via density evolution and Bethe free energy β followed in the 2000s.
Key Takeaway
On trees, sum-product message passing computes exact marginals in linear time. On loopy graphs, the same updates applied iteratively yield loopy BP β a heuristic whose fixed points are stationary points of the Bethe free energy. Loopy BP works well when the graph is locally tree-like and the factors are not too strong; it fails when short cycles dominate.