Ferkans — Interactive Telecom Tutor

The Unifying Idea of Graphical Models

In Chapter 8 we studied EM — an iterative algorithm for inferring latent variables. In Chapter 14 we studied LASSO — an algorithm for sparse recovery. In Book ITA Chapter 12 we studied LDPC decoding — an iterative algorithm for channel coding. These problems look completely different, yet their solution methods all share a common structure: repeated local updates on a graph.

The point is that these problems are all instances of a single task: inference in a factorized probability distribution. And the algorithms that solve them — belief propagation, approximate message passing, Viterbi — are all instances of a single computational recipe: message passing on a graph.

Factor graphs are the language that makes this unity visible. Once we have the graph, the algorithm writes itself. The same code, applied to different factor graphs, decodes LDPC codes, equalizes ISI channels, performs MIMO detection, and runs the Kalman filter.

Definition:
Factor Graph

Let $p(\mathbf{x}) = \frac{1}{Z}\prod_{a \in \mathcal{F}} f_a(\mathbf{x}_{\partial a})$ be a factorized distribution on $\mathbf{x} = (x_1, \ldots, x_n)$ , where each factor $f_a$ depends on a subset $\mathbf{x}_{\partial a}$ of variables. The factor graph is a bipartite graph with:

Variable nodes $\{1, \ldots, n\}$ , drawn as circles $\bigcirc$ ;
Factor nodes $\mathcal{F}$ , drawn as squares $\square$ ;
Edges: $i \sim a$ iff $x_i$ appears in $\mathbf{x}_{\partial a}$ . The neighborhood of variable node $i$ is $\partial i = \{a \in \mathcal{F}: i \sim a\}$ ; the neighborhood of factor node $a$ is $\partial a = \{i : i \sim a\}$ .

A factor graph is more explicit than a Bayesian network or Markov random field: each factor function $f_a$ is represented by its own node, and the graph shows exactly which variables enter each factor. This explicitness is what makes message-passing algorithms easy to describe.

Example: From Distribution to Factor Graph

Draw the factor graph of the distribution $p(x_1, x_2, x_3, x_4) = \frac{1}{Z} f_A(x_1, x_2) f_B(x_2, x_3) f_C(x_3, x_4) f_D(x_1, x_4)$ . Is the graph a tree?

Solution

Enumerate factors and neighborhoods

$f_A: \partial A = \{1, 2\}$
$f_B: \partial B = \{2, 3\}$
$f_C: \partial C = \{3, 4\}$
$f_D: \partial D = \{1, 4\}$

Draw the graph

Place variable nodes $1, 2, 3, 4$ and factor nodes $A, B, C, D$ . Connect: $1 - A, 2 - A; 2 - B, 3 - B; 3 - C, 4 - C; 4 - D, 1 - D$ . The underlying structure is a cycle: $1 - A - 2 - B - 3 - C - 4 - D - 1$ .

Identify the cycle

The graph contains a 4-cycle (in terms of variable nodes) or equivalently an 8-edge cycle in the bipartite graph. Not a tree. Exact message passing will not compute the marginals in one pass; iterative (loopy) message passing is needed.

A Basic Factor Graph — Bipartite factor graph: circles are variable nodes, squares are factor nodes. Each factor node is connected to the variables it depends on.

Theorem: Marginalization via Factor-Graph Pushforward

Let $p(\mathbf{x}) = \frac{1}{Z}\prod_a f_a(\mathbf{x}_{\partial a})$ . For any variable $x_i$ , its marginal is $p(x_i) = \frac{1}{Z} \sum_{\mathbf{x}_{\sim i}} \prod_a f_a(\mathbf{x}_{\partial a}),$ where $\mathbf{x}_{\sim i}$ denotes all variables except $x_i$ . When the factor graph is a tree, this marginal can be computed in time $O(n \cdot \max_a |\mathcal{X}|^{|\partial a|})$ via the sum-product algorithm.

The factorization of $p$ means that marginalization distributes over the product: pushing sums inside products splits the global computation into local ones. On a tree this distribution is lossless and we get an exact polynomial-time algorithm. On a loopy graph, the distribution introduces approximations — that is the subject of Chapter 18.

Proof

Distributive law

$\sum_{x_k} \prod_a f_a = \prod_{a: k \notin \partial a} f_a \cdot \sum_{x_k} \prod_{a: k \in \partial a} f_a$ . Factors not containing $x_k$ pull out of the sum.

Iterative elimination on a tree

A tree has a leaf variable node $\ell$ . Eliminate it by summing out: the result is a factor graph with one fewer node (the factor that connected $\ell$ becomes a function of the neighbor only). Repeat.

Complexity counting

Each elimination step processes one factor, costing $O(|\mathcal{X}|^{|\partial a|})$ per factor. With $O(n)$ factors, total cost is as stated.

Cycles obstruct the distributive law

In a loopy graph, after eliminating some variables, the remaining factors may share multiple variables, creating higher-arity factors. Successive elimination blows up in complexity — exponential in the tree-width of the graph.

,

Bipartite graph

A graph with two disjoint node sets and edges only between sets. Factor graphs are bipartite with variable nodes and factor nodes as the two sides.

Related: Tanner graph, LDPC code

Partition function

The normalization constant $Z = \sum_{\mathbf{x}} \prod_a f_a(\mathbf{x}_{\partial a})$ that makes $p(\mathbf{x})$ sum to one. Computing $Z$ is #P-hard in general but polynomial on trees.

Definition:
Relation to Bayesian Networks and Markov Random Fields

Every Bayesian network (directed acyclic graph with conditional probability tables) can be converted to a factor graph: each conditional $p(x_i | \text{parents})$ becomes one factor node with the corresponding neighborhood. Every Markov random field (undirected graph with cliques potentials) can similarly be converted: each clique potential becomes a factor.

The factor graph is usually more informative because it disambiguates factorizations that share the same undirected skeleton. For instance, the distributions $f(x_1, x_2, x_3)$ and $f_A(x_1, x_2) f_B(x_2, x_3) f_C(x_1, x_3)$ share the same pairwise MRF graph but have distinct factor graphs.

The factor graph captures precise factor structure. Two models with the same undirected graph can have different factor graphs, and message-passing algorithms operate on the factor graph — so the distinction matters.

Encoding Conditional Distributions

The classical modeling task — "write down the posterior $p(\mathbf{x}|\mathbf{y})$ " — maps to factor graphs via observations. An observation $\mathbf{y}$ is clamped: its factor becomes $f(\mathbf{x}_{\partial a}, \mathbf{y}_{\partial a})$ , evaluated at the observed value. The resulting graph has only $\mathbf{x}$ as free variables. Inference on it computes $p(\mathbf{x}|\mathbf{y})$ .

Example: Posterior in a Linear Gaussian Model

The model is $\mathbf{y} = \mathbf{H}\mathbf{x} + \mathbf{w}$ , $\mathbf{x} \sim \mathcal{N}(\mathbf{0}, \sigma_x^2 \mathbf{I})$ , $\mathbf{w} \sim \mathcal{N}(\mathbf{0}, \sigma^2 \mathbf{I})$ . Construct the factor graph for the posterior $p(\mathbf{x}|\mathbf{y})$ .

Solution

Write the posterior up to a constant

$p(\mathbf{x}|\mathbf{y}) \propto p(\mathbf{y}|\mathbf{x}) p(\mathbf{x}) = \prod_m \mathcal{N}(y_m; \mathbf{h}_m^T \mathbf{x}, \sigma^2) \cdot \prod_n \mathcal{N}(x_n; 0, \sigma_x^2)$ .

Identify factors

Prior factors: $g_n(x_n) = \mathcal{N}(x_n; 0, \sigma_x^2)$ for each $n$
Likelihood factors: $f_m(\mathbf{x}_{\partial m}) = \mathcal{N}(y_m; \mathbf{h}_m^T \mathbf{x}, \sigma^2)$ for each $m$ , where $\partial m = \{n : h_{m,n} \neq 0\}$ .

Draw the graph

Variable nodes $x_1, \ldots, x_N$ . Unary factor $g_n$ connected to $x_n$ only. Each likelihood factor $f_m$ connected to the variables that appear with nonzero $\mathbf{h}_m$ . The graph structure depends on the sparsity pattern of $\mathbf{H}$ . If $\mathbf{H}$ is dense, $f_m$ touches all $x_n$ : the graph is a single hub connecting all variables — dense and loopy.

⚠️Engineering Note

Sparsity Matters for Tractability

The computational cost of message passing scales with the factor sizes $|\partial a|$ . Dense factors are the enemy of tractability. In practice, choose models whose factors depend on few variables: LDPC parity checks involve only $d_c$ (check degree) variables; HMMs have pairwise transitions; ISI channels have $L$ -variable factors where $L$ is the channel length. When models lack natural sparsity (e.g., dense MIMO), message passing is combined with approximations (Gaussian BP, AMP) that replace high-dimensional integrals by Gaussian moments.

Practical Constraints

•
Exact inference scales as $O(|\mathcal{X}|^{|\partial a|})$ per factor — exponential in factor size.
•
Tree-width $\geq |\partial a|$ , so cliques of degree $d$ force tree-width $\geq d - 1$ .

Common Mistake: Factor Graphs Are Undirected

Mistake:

Drawing arrows on factor graph edges to represent 'causal' relationships between variables and factors.

Correction:

Factor graphs are undirected bipartite graphs. They represent symmetric dependence through factors; the direction of message flow in algorithms is a property of the algorithm, not the graph. If directed causal structure matters, use a Bayesian network and convert it to a factor graph when running inference.

Quick Check

A joint distribution $p(x_1, x_2, x_3) = f_1(x_1) f_{12}(x_1, x_2) f_{23}(x_2, x_3)$ has a factor graph structure that is:

A tree

A 3-cycle

Complete bipartite

Cannot be determined

Correction:

A tree

Three variable nodes and three factor nodes connected in a path — no cycles.

Key Takeaway

A factor graph is a bipartite graph that makes the factorization of a distribution explicit. It is the natural data structure for message-passing inference: on trees, exact marginals compute in linear time; on loopy graphs, approximate inference becomes possible through the same local update rules.

From Distributions to Graphs

The Unifying Idea of Graphical Models

Definition: Factor Graph

Example: From Distribution to Factor Graph

Enumerate factors and neighborhoods

Draw the graph

Identify the cycle

A Basic Factor Graph

Theorem: Marginalization via Factor-Graph Pushforward

Distributive law

Iterative elimination on a tree

Complexity counting

Cycles obstruct the distributive law

Bipartite graph

Partition function

Definition: Relation to Bayesian Networks and Markov Random Fields

Encoding Conditional Distributions

Example: Posterior in a Linear Gaussian Model

Write the posterior up to a constant

Identify factors

Draw the graph

Sparsity Matters for Tractability

Common Mistake: Factor Graphs Are Undirected

Quick Check

Key Takeaway

Definition:
Factor Graph

Definition:
Relation to Bayesian Networks and Markov Random Fields