Ferkans — Interactive Telecom Tutor

Trees Are Easy, Cycles Are Not

On a tree, inference is a finite recursion: eliminate a leaf, shrink the graph, repeat. There is no approximation, and the running time is linear in the number of nodes. This is why forward-backward, Viterbi, and Kalman filtering are textbook algorithms — the underlying graph is a tree.

Most graphs of practical interest are not trees. LDPC codes have cycles of length 6 or more. MIMO detection has very short cycles. Turbo codes have cycles determined by the interleaver. The point of this section is to understand exactly why cycles break the clean recursion, why loopy belief propagation (applying tree algorithms to loopy graphs) works anyway in many cases, and when it fails.

Definition:
Tree (in a Bipartite Graph)

A connected bipartite graph is a tree if it has no cycles. Equivalently, a connected factor graph with $V$ variable nodes, $F$ factor nodes, and $E$ edges is a tree iff $E = V + F - 1$ .

A tree of $n = V + F$ nodes has exactly $n - 1$ edges. Adding an extra edge creates exactly one cycle. The cyclomatic number $E - V - F + 1$ counts independent cycles.

Theorem: Exactness of Sum-Product on Trees

Let $p(\mathbf{x}) = \frac{1}{Z}\prod_a f_a(\mathbf{x}_{\partial a})$ be a distribution whose factor graph is a tree. Then the sum-product algorithm (Chapter 18), initialized at the leaves and propagated to any root node, computes the exact marginals $p(x_i)$ in one outward pass plus one inward pass, in time $O\left(\sum_a |\mathcal{X}|^{|\partial a|}\right).$

On a tree, any variable $x_i$ separates the graph into disjoint subtrees rooted at $i$ . The incoming messages at $i$ from different subtrees are statistically independent given $x_i$ , because cutting $i$ disconnects the subtrees. Multiplying them therefore gives the exact marginal. Cycles destroy this independence — messages arriving at a node from different paths are correlated by the cycle, and multiplying them double-counts information.

Proof

Message definition

For each oriented edge $(i \to a)$ , define the message $\mu_{i \to a}(x_i) = \prod_{b \in \partial i \setminus \{a\}} \mu_{b \to i}(x_i)$ , and for each oriented edge $(a \to i)$ , $\mu_{a \to i}(x_i) = \sum_{\mathbf{x}_{\partial a \setminus i}} f_a(\mathbf{x}_{\partial a}) \prod_{j \in \partial a \setminus i} \mu_{j \to a}(x_j)$ .

Base case (leaves)

A leaf variable node sends $\mu_{i \to a}(x_i) = 1$ (no incoming messages). A leaf factor node sends $\mu_{a \to i}(x_i) = f_a(x_i)$ directly.

Propagation terminates on trees

Each edge is used exactly twice (once in each direction). Messages on any directed edge depend only on messages on edges further from the target — so the recursion computes finitely many unique messages.

Marginals are exact

The product $\mu_i(x_i) = \prod_{a \in \partial i} \mu_{a \to i}(x_i)$ equals the true marginal $p(x_i)$ up to normalization. The proof is by induction on subtree size: messages from disjoint subtrees factor the joint distribution cleanly.

,

Definition:
Girth of a Factor Graph

The girth of a factor graph is the length (number of edges) of its shortest cycle. Factor graphs without cycles have infinite girth. LDPC codes typically have girth 6 or 8 (short cycles are avoided during code construction).

Large girth is desirable for loopy belief propagation: if the graph locally looks like a tree within a radius of $t$ iterations, BP behaves nearly as if the graph were a tree for $t$ iterations. Short cycles cause early divergence from the tree analysis.

Theorem: Local Tree-Like Property

Let $G$ be a random bipartite graph with bounded degrees and $N$ variable nodes, drawn from the $(d_v, d_c)$ -regular ensemble. For any fixed depth $t$ , the probability that the depth- $t$ neighborhood of a uniformly random variable node is not a tree tends to zero as $N \to \infty$ .

Random regular graphs are locally tree-like: zoom in on any node and the first few layers look like a branching process. Loopy BP run for $t < \text{girth}(G)/2$ iterations computes exactly what it would on an actual tree. This is why density evolution (Chapter 18) gives tight asymptotic predictions for LDPC codes.

Proof

Branching process construction

Start at a variable node and grow its neighborhood breadth-first. At each step, choose a random partner for each open edge.

Cycle-generating probability

A cycle of length $\leq 2t$ appears if two branches meet within $t$ hops. For fixed $t$ and $N \to \infty$ , this probability is $O(1/N)$ — vanishing.

Tree approximation

Conditioned on the depth- $t$ neighborhood being a tree, BP for $t$ iterations computes the same message distribution as on a true tree.

Definition:
Loopy Belief Propagation

Loopy belief propagation (loopy BP) applies the tree message-passing equations iteratively to a graph with cycles, regardless of the cycles. Messages are updated in rounds: $\mu_{i \to a}^{(t+1)}(x_i) = \prod_{b \in \partial i \setminus a} \mu_{b \to i}^{(t)}(x_i),$ $\mu_{a \to i}^{(t+1)}(x_i) = \sum_{\mathbf{x}_{\partial a \setminus i}} f_a(\mathbf{x}_{\partial a}) \prod_{j \in \partial a \setminus i} \mu_{j \to a}^{(t)}(x_j).$ Fixed points of this iteration are the Bethe stationary points. They approximate the true marginals; the quality depends on the graph's cycle structure.

Loopy BP is a heuristic extension of an exact tree algorithm. Its success is one of the surprises of modern information theory: on many practically important graphs (LDPC, turbo codes), loopy BP is empirically near-optimal.

Theorem: Fixed Points of Loopy BP Are Bethe Free Energy Stationary Points

The fixed points of loopy belief propagation on a factor graph are in one-to-one correspondence with the stationary points of the Bethe free energy $\mathcal{F}_{\text{Bethe}}[\{b_a, b_i\}] = \sum_a \sum_{\mathbf{x}_{\partial a}} b_a(\mathbf{x}_{\partial a}) \log\frac{b_a(\mathbf{x}_{\partial a})}{f_a(\mathbf{x}_{\partial a})} - \sum_i (|\partial i| - 1) \sum_{x_i} b_i(x_i) \log b_i(x_i),$ subject to local marginalization constraints $\sum_{\mathbf{x}_{\partial a \setminus i}} b_a(\mathbf{x}_{\partial a}) = b_i(x_i)$ .

On trees, the Bethe free energy coincides with the exact (Gibbs) free energy, and the unique minimizer gives the true marginals. On loopy graphs, the Bethe approximation replaces a global entropy by a sum of local entropies adjusted for double-counting. The loopy BP update is a reparameterization of the Lagrangian KKT conditions. This identifies loopy BP as a specific variational approximation — with predictable failure modes.

Proof

Lagrangian

Form the Lagrangian of $\mathcal{F}_{\text{Bethe}}$ with Lagrange multipliers $\lambda_{a \to i}(x_i)$ enforcing the marginalization constraints.

Stationarity conditions

Setting $\partial\mathcal{L}/\partial b_a = 0$ and $\partial\mathcal{L}/\partial b_i = 0$ gives an exponential-family form with the messages playing the role of the Lagrange multipliers.

Identify BP equations

Reparameterizing, $\mu_{a \to i}(x_i) \propto \exp(\lambda_{a \to i}(x_i))$ , the stationarity conditions reduce to the loopy BP fixed-point equations.

,

When Does Loopy BP Work?

Loopy BP is guaranteed to be correct on trees. It is provably correct on a few special loopy graph families: single-cycle graphs, Gaussian graphical models with diagonally dominant precision matrices, and attractive pairwise binary models at high temperature. For most other loopy graphs, correctness is empirical: LDPC decoding, turbo decoding, and compressed sensing via BP/AMP all work remarkably well, but without closed-form guarantees.

The point is that loopy BP is a heuristic that shines under certain structural conditions: (i) the graph is locally tree-like, (ii) the factors are "weak" (low mutual information between distant variables), and (iii) iteration is combined with reasonable scheduling.

Common Mistake: Short Cycles Destroy Loopy BP

Mistake:

Running loopy BP on a graph with many cycles of length 4 (the shortest possible cycle in a bipartite graph).

Correction:

A 4-cycle $(x_i - f_a - x_j - f_b - x_i)$ causes the message $\mu_{f_a \to x_i}$ to depend on itself through two hops. Information flows back to the origin and gets double-counted almost immediately. Loopy BP on graphs with many 4-cycles typically diverges or converges to useless fixed points. When building LDPC codes, use progressive edge growth or similar algorithms to ensure girth $\geq 6$ (ideally $\geq 8$ ).

Effect of Cycle Length on Loopy BP

Visualize the marginal error of loopy BP on a ring graph (single cycle) of various lengths. Shorter cycles → larger error.

Parameters

Cycle length (number of variable nodes)10

Pairwise coupling strength0.5

BP iterations30

Example: Loopy BP on a Single Cycle

A binary model with $n$ variables on a cycle has pairwise factors $f_{i,i+1}(x_i, x_{i+1}) = \exp(\theta x_i x_{i+1})$ , $x_i \in \{-1, +1\}$ . Show that loopy BP converges and find the resulting marginal approximation.

Solution

Set up the transfer matrix

The exact partition function is $Z = \operatorname{tr}(\mathbf{T}^n)$ where $\mathbf{T} = \begin{pmatrix} e^\theta & e^{-\theta} \\ e^{-\theta} & e^\theta \end{pmatrix}$ with eigenvalues $\lambda_\pm = 2\cosh\theta, 2\sinh\theta$ (for $\theta > 0$ ). Hence $Z_{\text{true}} = \lambda_+^n + \lambda_-^n$ .

Loopy BP fixed point

By symmetry, all messages are equal: $\mu(x) = (1, e^{2\beta})$ for some $\beta$ . The BP fixed-point equation is $e^{2\beta} = (1 \cdot e^\theta + e^{2\beta} \cdot e^{-\theta}) / (1 \cdot e^{-\theta} + e^{2\beta} \cdot e^\theta)$ . Solving: $\beta = \theta$ , i.e., the fixed point gives messages matching the pairwise factor.

Marginal approximation

Loopy BP marginal: $b_i(x_i) \propto \mu(x_i)^2 = (1, e^{4\theta})$ . True marginal: by symmetry $p(x_i = +1) = 1/2$ . Loopy BP predicts a skewed marginal when $\theta \neq 0$ — wrong. The error is the signature of the cycle.

Moral

For this small example, loopy BP's partition function is $Z_{\text{Bethe}} = \lambda_+^n$ , missing the $\lambda_-^n$ correction. In the large- $n$ limit the error vanishes, since $\lambda_-^n / \lambda_+^n \to 0$ . Loopy BP becomes exact as cycles become long — consistent with the locally tree-like argument.

🔧Engineering Note

Message Scheduling Matters

In practice, loopy BP can be run with flooding (all messages updated simultaneously) or serial scheduling (messages updated one at a time). Serial scheduling typically converges twice as fast but requires careful ordering. For LDPC codes, the layered decoder interleaves check node and variable node updates and is the standard choice for hardware implementation.

Practical Constraints

•
Flooding BP doubles iteration count vs. serial.
•
Serial scheduling requires ordering — a non-trivial choice on arbitrary graphs.
•
Parallelism vs. convergence speed is a hardware design tradeoff.

Why This Matters: 5G LDPC Decoders in Hardware

The LDPC codes in 5G NR use a structured parity-check matrix called a base graph, expanded by a circular permutation to obtain the full Tanner graph. This structure enables layered BP decoders in silicon with throughput above 10 Gb/s. The graph's girth is $\geq 6$ by construction, which ensures loopy BP converges well within 15-30 iterations.

Loopy BP on a Cycle: Message Flow

Animated visualization of message propagation around a cycle. Messages arriving at a node have already been "touched" by the target node — this is the double-counting that causes loopy BP to deviate from the true marginal.

Historical Note: Loopy BP: A Happy Accident

1988-2005

When Pearl formulated belief propagation in the 1980s, he proved its correctness on trees and warned that it would fail on loopy graphs. Then in 1993, Berrou, Glavieux, and Thitimajshima introduced turbo codes, effectively running loopy BP on a coupled convolutional code graph with spectacular empirical performance. Soon after, McEliece, MacKay, and Cheng (1998) recognized turbo decoding as loopy BP. This was a surprise: Pearl's pessimism was overturned by engineering practice. The theoretical understanding — via density evolution and Bethe free energy — followed in the 2000s.

Key Takeaway

On trees, sum-product message passing computes exact marginals in linear time. On loopy graphs, the same updates applied iteratively yield loopy BP — a heuristic whose fixed points are stationary points of the Bethe free energy. Loopy BP works well when the graph is locally tree-like and the factors are not too strong; it fails when short cycles dominate.

Trees, Cycles, and Loopy Graphs

Trees Are Easy, Cycles Are Not

Definition: Tree (in a Bipartite Graph)

Theorem: Exactness of Sum-Product on Trees

Message definition

Base case (leaves)

Propagation terminates on trees

Marginals are exact

Definition: Girth of a Factor Graph

Theorem: Local Tree-Like Property

Branching process construction

Cycle-generating probability

Tree approximation

Definition: Loopy Belief Propagation

Theorem: Fixed Points of Loopy BP Are Bethe Free Energy Stationary Points

Lagrangian

Stationarity conditions

Identify BP equations

When Does Loopy BP Work?

Common Mistake: Short Cycles Destroy Loopy BP

Effect of Cycle Length on Loopy BP

Parameters

Example: Loopy BP on a Single Cycle

Set up the transfer matrix

Loopy BP fixed point

Marginal approximation

Moral

Message Scheduling Matters

Why This Matters: 5G LDPC Decoders in Hardware

Loopy BP on a Cycle: Message Flow

Historical Note: Loopy BP: A Happy Accident

Key Takeaway

Definition:
Tree (in a Bipartite Graph)

Definition:
Girth of a Factor Graph

Definition:
Loopy Belief Propagation