From Distributions to Graphs
The Unifying Idea of Graphical Models
In Chapter 8 we studied EM β an iterative algorithm for inferring latent variables. In Chapter 14 we studied LASSO β an algorithm for sparse recovery. In Book ITA Chapter 12 we studied LDPC decoding β an iterative algorithm for channel coding. These problems look completely different, yet their solution methods all share a common structure: repeated local updates on a graph.
The point is that these problems are all instances of a single task: inference in a factorized probability distribution. And the algorithms that solve them β belief propagation, approximate message passing, Viterbi β are all instances of a single computational recipe: message passing on a graph.
Factor graphs are the language that makes this unity visible. Once we have the graph, the algorithm writes itself. The same code, applied to different factor graphs, decodes LDPC codes, equalizes ISI channels, performs MIMO detection, and runs the Kalman filter.
Definition: Factor Graph
Factor Graph
Let be a factorized distribution on , where each factor depends on a subset of variables. The factor graph is a bipartite graph with:
- Variable nodes , drawn as circles ;
- Factor nodes , drawn as squares ;
- Edges: iff appears in . The neighborhood of variable node is ; the neighborhood of factor node is .
A factor graph is more explicit than a Bayesian network or Markov random field: each factor function is represented by its own node, and the graph shows exactly which variables enter each factor. This explicitness is what makes message-passing algorithms easy to describe.
Example: From Distribution to Factor Graph
Draw the factor graph of the distribution . Is the graph a tree?
Enumerate factors and neighborhoods
Draw the graph
Place variable nodes and factor nodes . Connect: . The underlying structure is a cycle: .
Identify the cycle
The graph contains a 4-cycle (in terms of variable nodes) or equivalently an 8-edge cycle in the bipartite graph. Not a tree. Exact message passing will not compute the marginals in one pass; iterative (loopy) message passing is needed.
A Basic Factor Graph
Theorem: Marginalization via Factor-Graph Pushforward
Let . For any variable , its marginal is where denotes all variables except . When the factor graph is a tree, this marginal can be computed in time via the sum-product algorithm.
The factorization of means that marginalization distributes over the product: pushing sums inside products splits the global computation into local ones. On a tree this distribution is lossless and we get an exact polynomial-time algorithm. On a loopy graph, the distribution introduces approximations β that is the subject of Chapter 18.
Distributive law
. Factors not containing pull out of the sum.
Iterative elimination on a tree
A tree has a leaf variable node . Eliminate it by summing out: the result is a factor graph with one fewer node (the factor that connected becomes a function of the neighbor only). Repeat.
Complexity counting
Each elimination step processes one factor, costing per factor. With factors, total cost is as stated.
Cycles obstruct the distributive law
In a loopy graph, after eliminating some variables, the remaining factors may share multiple variables, creating higher-arity factors. Successive elimination blows up in complexity β exponential in the tree-width of the graph.
Bipartite graph
A graph with two disjoint node sets and edges only between sets. Factor graphs are bipartite with variable nodes and factor nodes as the two sides.
Related: Tanner graph, LDPC code
Partition function
The normalization constant that makes sum to one. Computing is #P-hard in general but polynomial on trees.
Related: Evidence Lower Bound (ELBO) / Free Energy, Bethe approximation
Definition: Relation to Bayesian Networks and Markov Random Fields
Relation to Bayesian Networks and Markov Random Fields
Every Bayesian network (directed acyclic graph with conditional probability tables) can be converted to a factor graph: each conditional becomes one factor node with the corresponding neighborhood. Every Markov random field (undirected graph with cliques potentials) can similarly be converted: each clique potential becomes a factor.
The factor graph is usually more informative because it disambiguates factorizations that share the same undirected skeleton. For instance, the distributions and share the same pairwise MRF graph but have distinct factor graphs.
The factor graph captures precise factor structure. Two models with the same undirected graph can have different factor graphs, and message-passing algorithms operate on the factor graph β so the distinction matters.
Encoding Conditional Distributions
The classical modeling task β "write down the posterior " β maps to factor graphs via observations. An observation is clamped: its factor becomes , evaluated at the observed value. The resulting graph has only as free variables. Inference on it computes .
Example: Posterior in a Linear Gaussian Model
The model is , , . Construct the factor graph for the posterior .
Write the posterior up to a constant
.
Identify factors
- Prior factors: for each
- Likelihood factors: for each , where .
Draw the graph
Variable nodes . Unary factor connected to only. Each likelihood factor connected to the variables that appear with nonzero . The graph structure depends on the sparsity pattern of . If is dense, touches all : the graph is a single hub connecting all variables β dense and loopy.
Sparsity Matters for Tractability
The computational cost of message passing scales with the factor sizes . Dense factors are the enemy of tractability. In practice, choose models whose factors depend on few variables: LDPC parity checks involve only (check degree) variables; HMMs have pairwise transitions; ISI channels have -variable factors where is the channel length. When models lack natural sparsity (e.g., dense MIMO), message passing is combined with approximations (Gaussian BP, AMP) that replace high-dimensional integrals by Gaussian moments.
- β’
Exact inference scales as per factor β exponential in factor size.
- β’
Tree-width , so cliques of degree force tree-width .
Common Mistake: Factor Graphs Are Undirected
Mistake:
Drawing arrows on factor graph edges to represent 'causal' relationships between variables and factors.
Correction:
Factor graphs are undirected bipartite graphs. They represent symmetric dependence through factors; the direction of message flow in algorithms is a property of the algorithm, not the graph. If directed causal structure matters, use a Bayesian network and convert it to a factor graph when running inference.
Quick Check
A joint distribution has a factor graph structure that is:
A tree
A 3-cycle
Complete bipartite
Cannot be determined
Three variable nodes and three factor nodes connected in a path β no cycles.
Key Takeaway
A factor graph is a bipartite graph that makes the factorization of a distribution explicit. It is the natural data structure for message-passing inference: on trees, exact marginals compute in linear time; on loopy graphs, approximate inference becomes possible through the same local update rules.