Ferkans — Interactive Telecom Tutor

From Factor Graph to Algorithm

In Chapter 17 we saw that marginalization on a tree factor graph distributes sums inside products. The sum-product algorithm makes this distribution mechanical: every variable and factor node repeatedly applies the same local rule, and after two passes through a tree the exact marginals emerge at every node.

The point is that this algorithm is universal: the update rules depend only on the factor graph and the factor values, not on the problem domain. The same code runs on LDPC decoding, HMM filtering, and compressed sensing — only the factors change. This section establishes the rules once and for all.

Definition:
Sum-Product Messages

Let $p(\mathbf{x}) = \frac{1}{Z}\prod_a f_a(\mathbf{x}_{\partial a})$ be a distribution on a factor graph. For each directed edge, define:

Variable-to-factor message: $\mu_{i \to a}(x_i) = \prod_{b \in \partial i \setminus a} \mu_{b \to i}(x_i).$
Factor-to-variable message: $\mu_{a \to i}(x_i) = \sum_{\mathbf{x}_{\partial a \setminus i}} f_a(\mathbf{x}_{\partial a}) \prod_{j \in \partial a \setminus i} \mu_{j \to a}(x_j).$ At convergence (or after the tree passes), the belief (approximate marginal) at variable $i$ is $b_i(x_i) \propto \prod_{a \in \partial i} \mu_{a \to i}(x_i).$

The variable-to-factor message combines information from all other factors touching $i$ . The factor-to-variable message marginalizes out everything except $x_i$ from $f_a$ while weighting each neighbor by its incoming message. Messages are defined up to a scalar — normalization is imposed for numerical reasons only.

Theorem: Sum-Product Computes Exact Marginals on Trees

If the factor graph is a tree, the sum-product algorithm initialized at the leaves and run inward-then-outward terminates in $O(|\mathcal{E}|)$ operations (where $|\mathcal{E}|$ is the number of directed edges) and produces beliefs $b_i(x_i) = p(x_i)$ (the true marginals) at every node.

The tree structure ensures that the messages arriving at node $i$ from different subtrees are statistically independent given $x_i$ . Their product equals the joint posterior, correctly marginalizing all other variables. This is the tree intuition: remove $i$ , the tree decomposes into disjoint subtrees, and the messages bring the subtree contributions together.

Proof

Root the tree at a chosen variable node

Choose any variable node $r$ as the root. The tree decomposes into subtrees $T_a$ for each neighbor factor $a \in \partial r$ .

Inward pass computes subtree marginals

For each subtree $T_a$ , recursive application of the sum-product rule computes $\mu_{a \to r}(x_r) \propto \sum_{\mathbf{x}_{T_a \setminus r}} \prod_{b \in T_a} f_b(\mathbf{x}_{\partial b})$ , the 'unnormalized contribution' of subtree $a$ to $p(x_r)$ .

Multiply at the root

Since the subtrees share only $r$ , the global sum factorizes: $p(x_r) \propto \prod_a \mu_{a \to r}(x_r)$ . The belief $b_r$ equals $p(x_r)$ after normalization.

Outward pass extends to all nodes

Symmetrically, rooting at any other node yields its marginal. A single outward pass from $r$ provides all messages needed for all nodes' beliefs simultaneously. Total cost: $O(|\mathcal{E}|)$ message updates.

,

Sum-Product Algorithm (Generic)

Complexity: Per iteration:

O\left(\sum_a |\mathcal{X}|^{|\partial a|}\right)

time. Memory:

O(|\mathcal{E}| |\mathcal{X}|)

for the message table.

Input: factor graph with factors {f_a} and variables {x_i}

Output: beliefs b_i(x_i) approximating marginals p(x_i)

Initialize all messages to uniform:

mu_{i->a}(x_i) = 1 for all edges

mu_{a->i}(x_i) = 1 for all edges

repeat until convergence (or for T iterations):

// Update variable-to-factor messages

for each edge (i, a):

mu_{i->a}(x_i) = product over b in N(i){a}: mu_{b->i}(x_i)

normalize mu_{i->a}

// Update factor-to-variable messages

for each edge (a, i):

mu_{a->i}(x_i) = sum over x_{N(a){i}}:

f_a(x_{N(a)}) * product over j in N(a){i}: mu_{j->a}(x_j)

normalize mu_{a->i}

Compute beliefs:

for each variable i:

b_i(x_i) = product over a in N(i): mu_{a->i}(x_i)

normalize b_i

return {b_i}

Flooding schedule shown. Serial scheduling updates one edge at a time using the most recent messages, typically halving iteration count.

Example: Sum-Product on a Three-Variable Chain

Three binary variables $x_1, x_2, x_3 \in \{0, 1\}$ with joint $p(\mathbf{x}) \propto f_1(x_1, x_2) f_2(x_2, x_3)$ where $f_1(x_1, x_2) = \begin{pmatrix} 2 & 1 \\ 1 & 2 \end{pmatrix}$ and $f_2(x_2, x_3) = \begin{pmatrix} 1 & 3 \\ 3 & 1 \end{pmatrix}$ (entries indexed as $[x_a, x_b]$ ). Compute $p(x_2)$ by sum-product.

Solution

Forward message from $x_1$ to $x_2$ via $f_1$

$\mu_{f_1 \to x_2}(x_2) = \sum_{x_1} f_1(x_1, x_2) \cdot \mu_{x_1 \to f_1}(x_1)$ . Since $x_1$ is a leaf, $\mu_{x_1 \to f_1} = (1, 1)$ . Thus $\mu_{f_1 \to x_2}(0) = 2 + 1 = 3$ , $\mu_{f_1 \to x_2}(1) = 1 + 2 = 3$ .

Backward message from $x_3$ to $x_2$ via $f_2$

Similarly $\mu_{x_3 \to f_2} = (1, 1)$ and $\mu_{f_2 \to x_2}(x_2) = \sum_{x_3} f_2(x_2, x_3)$ . $\mu_{f_2 \to x_2}(0) = 1 + 3 = 4$ , $\mu_{f_2 \to x_2}(1) = 3 + 1 = 4$ .

Belief at $x_2$

$b_2(x_2) \propto \mu_{f_1 \to x_2}(x_2) \cdot \mu_{f_2 \to x_2}(x_2) = (12, 12)$ . Normalized: $p(x_2 = 0) = p(x_2 = 1) = 1/2$ .

Verification

The factor product is symmetric under $x_1 \leftrightarrow x_2$ flips when both flipped. The marginal is uniform. ✓

Sum-Product Message Propagation on a Tree

Animate message passing on a tree factor graph. Observe messages flowing inward from leaves and outward to all nodes.

Parameters

Chain length5

Coupling strength

\theta

1

Observation strength at endpoints1

Loopy BP: Same Rules, Different Guarantees

On a loopy graph, we apply the same message updates iteratively until convergence (or for a fixed number of iterations). Fixed points correspond to stationary points of the Bethe free energy (Chapter 17). For many practical graphs — LDPC codes, turbo codes, MIMO with sparse channels — loopy BP converges to beliefs close to the true marginals. The quality of the approximation depends on graph girth, factor strength, and scheduling.

Theorem: Factor-to-Variable Message for Hard Constraints

Let $f_a(\mathbf{x}_{\partial a}) = \mathbb{1}\{g(\mathbf{x}_{\partial a}) = 0\}$ be a hard constraint (indicator of a feasible set). Then $\mu_{a \to i}(x_i) = \sum_{\mathbf{x}_{\partial a \setminus i}: g(\mathbf{x}_{\partial a}) = 0} \prod_{j \in \partial a \setminus i} \mu_{j \to a}(x_j).$ In particular, if the constraint is a parity check $g(\mathbf{x}) = \sum_j x_j \pmod 2$ , the message admits a closed form in terms of the incoming LLRs (see Section 18.2).

When the factor is a hard constraint, only configurations satisfying it contribute to the marginal. The sum-product rule prunes all infeasible configurations.

Proof

Specialize the sum-product rule

In the general formula $\mu_{a \to i}(x_i) = \sum_{\mathbf{x}_{\partial a \setminus i}} f_a(\mathbf{x}_{\partial a}) \prod_j \mu_{j \to a}(x_j)$ , insert the indicator — summands with $g \neq 0$ vanish.

Parity-check closed form

For parity check, the feasibility constraint is linear mod 2. The sum over satisfying configurations factorizes: each incoming bit distribution is a Bernoulli, and the resulting message corresponds to the XOR of Bernoullis — which is a classical calculation in LLR form.

Common Mistake: Do Not Include Yourself

Mistake:

Computing $\mu_{i \to a}(x_i) = \prod_{b \in \partial i} \mu_{b \to i}(x_i)$ — i.e., including the target factor $a$ in the product.

Correction:

The variable-to-factor message must exclude the target factor: $\mu_{i \to a}(x_i) = \prod_{b \in \partial i \setminus a} \mu_{b \to i}(x_i)$ . Including $a$ creates a self-loop of information and typically makes BP diverge even on trees. This mistake is the most common bug in homework implementations.

Extrinsic message

A message carrying information excluding the recipient's own prior contribution. Variable-to-factor messages in sum-product are extrinsic with respect to the target factor. This extrinsic structure is what enables iterative refinement: each factor receives "fresh" information at each iteration.

⚠️Engineering Note

Numerical Stability: Log-Domain BP

Messages are products of many small numbers and quickly underflow to zero in naive implementations. Always work in log-domain: replace products by sums, and use the log-sum-exp trick for marginalization: $\log \sum_i e^{a_i} = a_{\max} + \log \sum_i e^{a_i - a_{\max}}$ . For binary variables, LLRs $L(x) = \log\frac{p(x=0)}{p(x=1)}$ are the natural representation.

Practical Constraints

•
Never multiply raw probabilities — underflow within 20 iterations.
•
Use LLRs for binary; log-densities for continuous.
•
Clip extreme LLRs to a finite range (e.g., $\pm 50$ ) in hardware to prevent overflow.

Sum-Product on a Tree: Two-Pass Algorithm

Animated illustration of sum-product on a small tree. Watch messages flow inward from leaves to root, then outward to all nodes — after which every belief equals the true marginal.

Key Takeaway

Sum-product is a local algorithm: every message depends only on its immediate neighbors. On a tree, two passes compute exact marginals exactly; on a loopy graph, iteration yields a principled approximation. The algorithm is universal — the same code solves LDPC decoding, HMM filtering, and compressed sensing.

The Message-Passing Rules