Ferkans — Interactive Telecom Tutor

One Language, Many Models

The power of factor graphs is their universality. In this section we draw the factor graphs of six canonical problems from communications and signal processing. The reader will notice a striking pattern: the algorithms we developed separately for each — Viterbi for ISI, iterative decoding for LDPC, Kalman filtering for state space, MMSE for MIMO — are all the same algorithm applied to different factor graphs.

This is the payoff of the abstraction. Once we understand message passing on a factor graph (Chapter 18), we have understood all of these classical algorithms.

Definition:
Markov Chain as a Factor Graph

A Markov chain $x_1 \to x_2 \to \cdots \to x_n$ with transition probabilities $p(x_{t+1}|x_t)$ and initial distribution $p(x_1)$ has joint $p(x_1, \ldots, x_n) = p(x_1) \prod_{t=1}^{n-1} p(x_{t+1}|x_t).$ Its factor graph is a chain: variable nodes $x_1, \ldots, x_n$ alternating with factor nodes

$f_0(x_1) = p(x_1)$ (prior),
$f_t(x_t, x_{t+1}) = p(x_{t+1}|x_t)$ for $t = 1, \ldots, n-1$ .

A chain is a tree, so exact marginals compute in $O(n|\mathcal{X}|^2)$ time. The two-pass algorithm is the forward-backward algorithm — message passing from left to right, then right to left.

Definition:
Hidden Markov Model as a Factor Graph

An HMM has latent states $x_1, \ldots, x_n$ (Markov chain) and observations $y_1, \ldots, y_n$ via emission distributions $p(y_t|x_t)$ . Joint: $p(\mathbf{x}, \mathbf{y}) = p(x_1) \prod_{t=1}^{n-1} p(x_{t+1}|x_t) \prod_{t=1}^n p(y_t|x_t).$ Factor graph: a horizontal chain of state variables and transition factors, with an emission factor (and observation variable) hanging off each state.

Clamping $\mathbf{y}$ to observed values turns the emission factors into unary factors on the states. The graph is still a tree; forward-backward computes $p(x_t|\mathbf{y})$ exactly.

Factor Graph of a Hidden Markov Model — HMM factor graph: horizontal chain of state variables $x_t$ connected by transition factors $f_t$ , with emission factors $g_t$ hanging down to observations $y_t$ .

Example: LDPC Code Tanner Graph

Construct the factor graph of a rate-1/2 LDPC code with parity-check matrix $\mathbf{H} = \begin{pmatrix} 1 & 1 & 1 & 0 & 0 & 0 \\ 0 & 0 & 1 & 1 & 1 & 0 \\ 1 & 0 & 0 & 0 & 1 & 1 \end{pmatrix}$ . Interpret the role of the variable and factor nodes.

Solution

Variable nodes

Six variable nodes $x_1, \ldots, x_6$ , one per code bit.

Factor nodes

Three factor nodes, one per parity check:

$f_1(x_1, x_2, x_3) = \mathbb{1}\{x_1 \oplus x_2 \oplus x_3 = 0\}$
$f_2(x_3, x_4, x_5) = \mathbb{1}\{x_3 \oplus x_4 \oplus x_5 = 0\}$
$f_3(x_1, x_5, x_6) = \mathbb{1}\{x_1 \oplus x_5 \oplus x_6 = 0\}$ Each factor is an indicator that a parity check is satisfied.

Code distribution

For a uniform prior over codewords, $p(\mathbf{x}|\text{codeword}) \propto \prod_k f_k(\mathbf{x}_{\partial k})$ . Decoding from channel observations multiplies these factors by channel likelihoods $\prod_i p(y_i|x_i)$ — these become additional unary factors on each $x_i$ .

The Tanner graph

This bipartite graph is called the Tanner graph of the code. Variable nodes = code bits; check nodes = parity constraints. Each bit participates in multiple checks and each check covers a handful of bits — hence "low-density".

Tanner Graph of a Random $(d_v, d_c)$ -Regular LDPC Code

Visualize the Tanner graph of a random regular LDPC code. Move sliders to explore how the degree distribution controls the graph topology.

Parameters

Number of variable nodes12

Variable node degree

d_v

3

Check node degree

d_c

6

Definition:
Convolutional Code Trellis as a Factor Graph

A rate- $1/n$ convolutional code with memory $m$ has state $s_t \in \{0, 1\}^m$ . The joint distribution over state and output sequences factorizes as $p(\mathbf{s}, \mathbf{c}) = \prod_{t=1}^T \mathbb{1}\{s_t, c_t \text{ consistent with encoder}\}.$ Factor graph: a chain of state variables $s_0, s_1, \ldots, s_T$ and output variables $c_1, \ldots, c_T$ , connected by trellis factors $f_t(s_{t-1}, c_t, s_t) = \mathbb{1}\{\text{trellis transition valid}\}$ .

The Viterbi algorithm is max-product message passing on this factor graph. The BCJR algorithm is sum-product message passing on the same graph.

Example: MIMO Detection as a Factor Graph

Construct the factor graph for the detection problem $\mathbf{y} = \mathbf{H}\mathbf{x} + \mathbf{w}$ with $x_n \in \{-1, +1\}$ (BPSK), $\mathbf{w} \sim \mathcal{N}(\mathbf{0}, \sigma^2\mathbf{I})$ , and uniform prior on $\mathbf{x}$ . What is the structure when $\mathbf{H}$ is full (all entries nonzero)?

Solution

Factorize the posterior

$p(\mathbf{x}|\mathbf{y}) \propto \prod_m f_m(\mathbf{x}_{\partial m}) \cdot \prod_n g_n(x_n)$ where $f_m(\mathbf{x}_{\partial m}) = \exp(-|y_m - \mathbf{h}_m^T \mathbf{x}|^2/(2\sigma^2))$ and $g_n(x_n) = 1/2$ for $x_n \in \{\pm 1\}$ .

Factor graph structure

Variable nodes: $x_1, \ldots, x_N$ (each binary)
Factor nodes: $f_1, \ldots, f_M$ (observation likelihoods)
Edges: $f_m \sim x_n$ iff $h_{m,n} \neq 0$

Dense case

When $\mathbf{H}$ is full, every factor $f_m$ connects to every variable $x_n$ — a complete bipartite graph $K_{M,N}$ . This is extremely loopy. Short cycles of length 4 abound. Loopy BP on this graph does not converge well; this is why MIMO detection uses Gaussian BP, expectation propagation, or AMP (Chapters 18-20).

Sparse (ISI) case

For an ISI channel, $\mathbf{H}$ is banded Toeplitz with bandwidth $L$ (channel memory). Each $f_m$ touches only $L+1$ consecutive variables, so the graph is a chain-like structure with tree-width $L$ . This is what makes BCJR exact and efficient.

Definition:
ISI Channel as a Factor Graph

An ISI channel $y_t = \sum_{\ell=0}^L h_\ell x_{t-\ell} + w_t$ corresponds to a factor graph where each observation $y_t$ imposes a factor $f_t(x_t, x_{t-1}, \ldots, x_{t-L}) \propto \mathcal{N}(y_t; \sum_\ell h_\ell x_{t-\ell}, \sigma^2)$ connecting $L+1$ consecutive transmitted symbols. Prior factors $g_t(x_t)$ enforce constellation constraints.

The tree-width of this graph is $L$ . When $L$ is small (say, 2-5), exact BCJR detection is feasible ( $|\mathcal{X}|^L$ states). For larger $L$ , turbo equalization with soft approximations takes over.

Canonical Factor Graph Structures

Model	Factor graph topology	Tree-width	Exact inference cost
Markov chain / HMM	Chain	1	$O(n\|\mathcal{X}\|^2)$
Convolutional code	Trellis (chain of states)	$m$ (memory)	$O(n \cdot 2^m)$
LDPC code	Sparse bipartite (Tanner graph)	Large (loopy)	Intractable — loopy BP
ISI channel, memory $L$	Sliding window	$L$	$O(n\|\mathcal{X}\|^L)$
MIMO detection ( $\mathbf{H}$ full)	Complete bipartite $K_{M,N}$	$\min(M,N)$	Intractable — approximations
Kalman filter (state space)	Chain (linear Gaussian)	1	$O(n \cdot d^3)$ ( $d$ = state dim)
Compressed sensing (sparse $\mathbf{A}$ )	Sparse bipartite	Depends on sparsity	AMP / loopy BP

Why This Matters: Turbo Codes as Coupled Factor Graphs

The turbo principle from Book 1 Chapter 16 is a direct factor graph statement: two convolutional code graphs share variable nodes through an interleaver. Decoding iterates message passing in each sub-graph, passing soft information across. The turbo principle is message passing on a factor graph with two coupled sub-graphs — the coupling creates cycles of length equal to twice the interleaver size.

Historical Note: Tanner, Wiberg, and the Unification

1981-2009

Michael Tanner (1981) introduced bipartite graphs to describe low-density codes, giving the first geometric view of decoding. Niclas Wiberg's 1996 PhD thesis formalized factor graphs and derived message-passing algorithms for them in full generality. Kschischang, Frey, and Loeliger (2001) consolidated the language in a landmark tutorial, after which factor graphs became the lingua franca for coding, signal processing, and machine learning. The modern generalization to probabilistic graphical models (Koller-Friedman 2009) extended the framework to massive-scale inference.

Factor Graphs Across Disciplines

The same language is used in:

Coding theory: Tanner graphs, LDPC/turbo decoders
Signal processing: Kalman filters, HMMs, ISI equalizers
Machine learning: Bayesian networks, Markov random fields, variational inference
Statistics: Bayesian hierarchical models
Physics: Statistical mechanics on lattices, spin glasses The connections are not metaphorical — they are literal. The same algorithm, rewritten in domain-appropriate notation, solves problems in each of these fields.

Quick Check

What is the tree-width of an HMM factor graph with $n$ states and arbitrary emission alphabets?

1

$|\mathcal{X}|$

$n$

$\log n$

Correction:

1

The HMM graph is a chain plus leaves — still a tree, tree-width 1.

Key Takeaway

HMMs, LDPC codes, ISI channels, MIMO detection, and Kalman filters all admit natural factor graph representations. The graph topology (chain, trellis, bipartite, loopy) determines whether inference is tractable exactly or requires approximations. Message passing on the graph unifies the algorithms.

Examples of Factor Graphs