The Message-Passing Rules

From Factor Graph to Algorithm

In Chapter 17 we saw that marginalization on a tree factor graph distributes sums inside products. The sum-product algorithm makes this distribution mechanical: every variable and factor node repeatedly applies the same local rule, and after two passes through a tree the exact marginals emerge at every node.

The point is that this algorithm is universal: the update rules depend only on the factor graph and the factor values, not on the problem domain. The same code runs on LDPC decoding, HMM filtering, and compressed sensing β€” only the factors change. This section establishes the rules once and for all.

Definition:

Sum-Product Messages

Let p(x)=1Z∏afa(xβˆ‚a)p(\mathbf{x}) = \frac{1}{Z}\prod_a f_a(\mathbf{x}_{\partial a}) be a distribution on a factor graph. For each directed edge, define:

  • Variable-to-factor message: ΞΌiβ†’a(xi)=∏bβˆˆβˆ‚iβˆ–aΞΌbβ†’i(xi).\mu_{i \to a}(x_i) = \prod_{b \in \partial i \setminus a} \mu_{b \to i}(x_i).
  • Factor-to-variable message: ΞΌaβ†’i(xi)=βˆ‘xβˆ‚aβˆ–ifa(xβˆ‚a)∏jβˆˆβˆ‚aβˆ–iΞΌjβ†’a(xj).\mu_{a \to i}(x_i) = \sum_{\mathbf{x}_{\partial a \setminus i}} f_a(\mathbf{x}_{\partial a}) \prod_{j \in \partial a \setminus i} \mu_{j \to a}(x_j). At convergence (or after the tree passes), the belief (approximate marginal) at variable ii is bi(xi)∝∏aβˆˆβˆ‚iΞΌaβ†’i(xi).b_i(x_i) \propto \prod_{a \in \partial i} \mu_{a \to i}(x_i).

The variable-to-factor message combines information from all other factors touching ii. The factor-to-variable message marginalizes out everything except xix_i from faf_a while weighting each neighbor by its incoming message. Messages are defined up to a scalar β€” normalization is imposed for numerical reasons only.

Theorem: Sum-Product Computes Exact Marginals on Trees

If the factor graph is a tree, the sum-product algorithm initialized at the leaves and run inward-then-outward terminates in O(∣E∣)O(|\mathcal{E}|) operations (where ∣E∣|\mathcal{E}| is the number of directed edges) and produces beliefs bi(xi)=p(xi)b_i(x_i) = p(x_i) (the true marginals) at every node.

The tree structure ensures that the messages arriving at node ii from different subtrees are statistically independent given xix_i. Their product equals the joint posterior, correctly marginalizing all other variables. This is the tree intuition: remove ii, the tree decomposes into disjoint subtrees, and the messages bring the subtree contributions together.

,

Sum-Product Algorithm (Generic)

Complexity: Per iteration: O(βˆ‘a∣Xβˆ£βˆ£βˆ‚a∣)O\left(\sum_a |\mathcal{X}|^{|\partial a|}\right) time. Memory: O(∣E∣∣X∣)O(|\mathcal{E}| |\mathcal{X}|) for the message table.
Input: factor graph with factors {f_a} and variables {x_i}
Output: beliefs b_i(x_i) approximating marginals p(x_i)
Initialize all messages to uniform:
mu_{i->a}(x_i) = 1 for all edges
mu_{a->i}(x_i) = 1 for all edges
repeat until convergence (or for T iterations):
// Update variable-to-factor messages
for each edge (i, a):
mu_{i->a}(x_i) = product over b in N(i){a}: mu_{b->i}(x_i)
normalize mu_{i->a}
// Update factor-to-variable messages
for each edge (a, i):
mu_{a->i}(x_i) = sum over x_{N(a){i}}:
f_a(x_{N(a)}) * product over j in N(a){i}: mu_{j->a}(x_j)
normalize mu_{a->i}
Compute beliefs:
for each variable i:
b_i(x_i) = product over a in N(i): mu_{a->i}(x_i)
normalize b_i
return {b_i}

Flooding schedule shown. Serial scheduling updates one edge at a time using the most recent messages, typically halving iteration count.

Example: Sum-Product on a Three-Variable Chain

Three binary variables x1,x2,x3∈{0,1}x_1, x_2, x_3 \in \{0, 1\} with joint p(x)∝f1(x1,x2)f2(x2,x3)p(\mathbf{x}) \propto f_1(x_1, x_2) f_2(x_2, x_3) where f1(x1,x2)=(2112)f_1(x_1, x_2) = \begin{pmatrix} 2 & 1 \\ 1 & 2 \end{pmatrix} and f2(x2,x3)=(1331)f_2(x_2, x_3) = \begin{pmatrix} 1 & 3 \\ 3 & 1 \end{pmatrix} (entries indexed as [xa,xb][x_a, x_b]). Compute p(x2)p(x_2) by sum-product.

Sum-Product Message Propagation on a Tree

Animate message passing on a tree factor graph. Observe messages flowing inward from leaves and outward to all nodes.

Parameters
5
1
1

Loopy BP: Same Rules, Different Guarantees

On a loopy graph, we apply the same message updates iteratively until convergence (or for a fixed number of iterations). Fixed points correspond to stationary points of the Bethe free energy (Chapter 17). For many practical graphs β€” LDPC codes, turbo codes, MIMO with sparse channels β€” loopy BP converges to beliefs close to the true marginals. The quality of the approximation depends on graph girth, factor strength, and scheduling.

Theorem: Factor-to-Variable Message for Hard Constraints

Let fa(xβˆ‚a)=1{g(xβˆ‚a)=0}f_a(\mathbf{x}_{\partial a}) = \mathbb{1}\{g(\mathbf{x}_{\partial a}) = 0\} be a hard constraint (indicator of a feasible set). Then ΞΌaβ†’i(xi)=βˆ‘xβˆ‚aβˆ–i:g(xβˆ‚a)=0∏jβˆˆβˆ‚aβˆ–iΞΌjβ†’a(xj).\mu_{a \to i}(x_i) = \sum_{\mathbf{x}_{\partial a \setminus i}: g(\mathbf{x}_{\partial a}) = 0} \prod_{j \in \partial a \setminus i} \mu_{j \to a}(x_j). In particular, if the constraint is a parity check g(x)=βˆ‘jxj(mod2)g(\mathbf{x}) = \sum_j x_j \pmod 2, the message admits a closed form in terms of the incoming LLRs (see Section 18.2).

When the factor is a hard constraint, only configurations satisfying it contribute to the marginal. The sum-product rule prunes all infeasible configurations.

Common Mistake: Do Not Include Yourself

Mistake:

Computing ΞΌiβ†’a(xi)=∏bβˆˆβˆ‚iΞΌbβ†’i(xi)\mu_{i \to a}(x_i) = \prod_{b \in \partial i} \mu_{b \to i}(x_i) β€” i.e., including the target factor aa in the product.

Correction:

The variable-to-factor message must exclude the target factor: ΞΌiβ†’a(xi)=∏bβˆˆβˆ‚iβˆ–aΞΌbβ†’i(xi)\mu_{i \to a}(x_i) = \prod_{b \in \partial i \setminus a} \mu_{b \to i}(x_i). Including aa creates a self-loop of information and typically makes BP diverge even on trees. This mistake is the most common bug in homework implementations.

Extrinsic message

A message carrying information excluding the recipient's own prior contribution. Variable-to-factor messages in sum-product are extrinsic with respect to the target factor. This extrinsic structure is what enables iterative refinement: each factor receives "fresh" information at each iteration.

Related: Turbo principle, Iterative decoding

⚠️Engineering Note

Numerical Stability: Log-Domain BP

Messages are products of many small numbers and quickly underflow to zero in naive implementations. Always work in log-domain: replace products by sums, and use the log-sum-exp trick for marginalization: logβ‘βˆ‘ieai=amax⁑+logβ‘βˆ‘ieaiβˆ’amax⁑\log \sum_i e^{a_i} = a_{\max} + \log \sum_i e^{a_i - a_{\max}}. For binary variables, LLRs L(x)=log⁑p(x=0)p(x=1)L(x) = \log\frac{p(x=0)}{p(x=1)} are the natural representation.

Practical Constraints
  • β€’

    Never multiply raw probabilities β€” underflow within 20 iterations.

  • β€’

    Use LLRs for binary; log-densities for continuous.

  • β€’

    Clip extreme LLRs to a finite range (e.g., Β±50\pm 50) in hardware to prevent overflow.

Sum-Product on a Tree: Two-Pass Algorithm

Animated illustration of sum-product on a small tree. Watch messages flow inward from leaves to root, then outward to all nodes β€” after which every belief equals the true marginal.

Key Takeaway

Sum-product is a local algorithm: every message depends only on its immediate neighbors. On a tree, two passes compute exact marginals exactly; on a loopy graph, iteration yields a principled approximation. The algorithm is universal β€” the same code solves LDPC decoding, HMM filtering, and compressed sensing.