Exercises
ch17-ex01
EasyDraw the factor graph of the joint distribution . Is it a tree?
List the neighborhood of each factor.
Count edges vs. nodes.
Neighborhoods
.
Tree check
4 variable nodes + 4 factor nodes = 8 nodes. Edges: . . β This is a tree (in fact a chain with a leaf factor at node 1).
ch17-ex02
EasyThe joint distribution (a single factor touching all three variables) has what factor graph structure?
One factor, three variables.
Draw
A single factor node connected to all three variable nodes. This is a star with the factor at the center. It is a tree (no cycles).
Cost
Inference cost is β the factor's own size. Factor graphs do not make dense factors cheap; they just make the structure explicit.
ch17-ex03
MediumConsider a Markov chain with binary states and all transition probabilities equal to . Run sum-product message passing to compute . Initial distribution at state 0 (clamped).
Clamp the observations at and .
Forward messages from , backward messages from .
Transition kernel
for states .
Forward message at $x_2$
Starting from : .
Backward message at $x_2$
Backward from : compute . Then .
Combine and normalize
, so . .
ch17-ex04
MediumShow that a 2D grid factor graph ( variables with pairwise neighbor factors) has tree-width at least . Why does this make exact inference intractable?
Tree-width equals the minimum over elimination orderings of max-clique size.
Any elimination leaves a 'frontier' of at least variables.
Frontier argument
Regardless of elimination order, at some point a horizontal or vertical line of variables must be active (connects upper and lower halves). Eliminating any variable on that line creates a clique among the others.
Exact cost
Exact inference cost is . For : . For : . For : . Exact inference on a Ising grid is already at the edge of feasibility.
ch17-ex05
MediumFor the Tanner graph in Example ch17-ex-ldpc-tanner, check for 4-cycles. A 4-cycle in a Tanner graph is a pair of variable nodes sharing two common check nodes. Identify any.
Two rows of sharing two columns' 1's form a 4-cycle.
List the 1s per row
Row 1: columns . Row 2: . Row 3: .
Find shared pairs
Rows 1&2: share column 3 only. Rows 1&3: share column 1 only. Rows 2&3: share column 5 only. No two rows share 2 or more columns.
Conclusion
No 4-cycles. Girth . This matrix would be considered acceptable for loopy BP decoding (absent longer cycles that degrade performance).
ch17-ex06
MediumDraw the factor graph of the ISI channel , , with clamped observations. Compute the tree-width.
Each touches .
Factor graph
Variable nodes (treating as a boundary). Factors: . Prior factors on each variable.
Topology and tree-width
Chain structure, tree-width 1 (pairwise factors on a chain). BCJR runs in .
ch17-ex07
HardProve that the sum-product algorithm on a tree computes the exact marginal up to normalization.
Induct on the number of nodes.
Use the factorization-by-subtrees structure.
Induction hypothesis
For a tree with nodes, sum-product computes exact marginals. Base case: (single node) is trivial.
Root at $i$
Rooting at variable node , each neighbor factor is the root of a subtree containing variables .
Factorization
. Since the subtrees share only , the sum factorizes: , by definition of the message.
ch17-ex08
EasyA factor graph has 12 variable nodes, 8 factor nodes, and 24 edges. Is it a tree? If not, how many independent cycles does it have?
Tree: .
Check
. . Not a tree.
Cyclomatic number
. The graph has 5 independent cycles (in the cycle space).
ch17-ex09
MediumConvert the Bayesian network to a factor graph.
Each conditional becomes a factor.
Identify factors
.
Neighborhoods
.
Cycle?
The path closes a cycle. Not a tree. This is the classical "V-structure" (common child) causing a loop in the factor graph.
ch17-ex10
MediumShow that loopy BP on a tree is equivalent to one forward pass plus one backward pass β iterating further does nothing.
After both passes, the messages are at their fixed-point values.
Pass structure
Leaves have trivial incoming messages; their outgoing messages depend only on themselves. After the first pass (outward from leaves to root), all messages from leaves toward root are set.
Backward pass
After root-to-leaves pass, all messages from root toward leaves are set. Now every directed edge has a message value.
Stability
The next iteration recomputes each message using its inputs β which are unchanged. So iterate once more and nothing changes. A fixed point is reached in two passes.
ch17-ex11
MediumArgue why loopy BP on a graph with girth is equivalent to exact inference on a tree for the first iterations.
Information travels one hop per iteration.
A cycle has length , so information returns to origin after iterations.
Depth of influence
After iterations, message depends on observations within a -hop neighborhood of . If , this neighborhood is a tree (no cycle fits).
Tree interpretation
The computation is identical to sum-product on the -hop unrolled tree around . For iterations, loopy BP = exact BP on a finite tree.
Consequence
Loopy BP makes its first mistake only after iterations. Large girth delays the onset of approximation errors.
ch17-ex12
HardFor a -regular Tanner graph with girth , prove that the depth- neighborhood of any variable node is a tree with variable nodes at depth .
Count children at each level of a branching process.
Level 0: origin
1 variable node at depth 0.
Level 1: neighbors
check nodes adjacent to the origin. Each check has other variable neighbors: variables at depth 2.
General level
Each variable at depth has new check neighbors (one check is the parent), each leading to new variable nodes. At depth : ... actually let me redo. Number of variables at depth is .
Tree property
Girth ensures no two paths of length from origin meet, so the enumeration is exact (no double-counting).
ch17-ex13
MediumGive an example of a factor graph where loopy BP has multiple fixed points. What does this imply about convergence behavior?
Strong coupling can create bistability.
Consider a binary Ising ring with negative coupling.
Construction
A ring of binary variables with pairwise factor (antiferromagnetic, ). For odd, BP has multiple fixed points by frustration.
Implication
With multiple fixed points, different initializations lead to different outputs. Convergence is not guaranteed to the "best" fixed point; damping and random restarts may be needed.
Diagnostic
In practice, check BP by running from multiple initializations and comparing outputs. Divergence between runs is a red flag.
ch17-ex14
EasyThe Bethe free energy equals the exact Gibbs free energy on which class of factor graphs?
On trees.
Answer
On trees. The Bethe approximation treats factor marginals as if they were independent except for the consistency constraints. On a tree, the global distribution literally factorizes pairwise in this way. On loops, there is residual dependence not captured.
ch17-ex15
HardA code with parity-check matrix has 4 variable nodes and 2 check nodes, each check of degree 3 (so is with three 1s per row). What is the maximum possible girth? Construct an achieving it.
Short cycles: girth 4 requires two variables in both checks.
Girth 6 forbids such sharing.
Girth 4 requires
Two rows sharing column indices. Since each row has 3 ones out of 4 columns, and , two rows differ in exactly one position. Hence they share 2 columns β always. So girth is exactly 4 for any such code.
Conclusion
Maximum girth is 4 for this size. To avoid 4-cycles we would need either more variable nodes () or smaller check degree.
ch17-ex16
MediumConvert the posterior of the Kalman filter problem to a factor graph, and identify the message-passing cost per time step.
State transition and observation factors.
Factor graph
Variable nodes: (continuous). Factors:
- β transitions.
- β observations (clamped).
Structure
Chain of state variables with transition factors and unary observation factors. Tree-width 1.
Per-step cost
Each message is a Gaussian (mean + covariance of dimension ). Message update involves matrix multiplies and inversions: per step. Total β the Kalman filter complexity.
ch17-ex17
MediumWhat is the relation between a factor graph with only pairwise factors and a Markov random field?
Both represent pairwise interactions.
Direct correspondence
A pairwise factor graph converts to a Markov random field with one edge per pairwise factor. Conversely, every pairwise MRF has a unique factor graph (one factor per edge). The graphs are isomorphic after collapsing factor nodes.
Why factor graphs are still useful
Pairwise MRFs are a restricted family. Real-world models with ternary or higher-arity factors (e.g., parity checks) are more naturally expressed as factor graphs.
ch17-ex18
HardShow that flooding loopy BP and serial loopy BP have the same fixed points, but may have different convergence rates and stability.
Fixed points are characterized algebraically, not by update order.
Fixed points
A fixed point satisfies for the message update operator . This condition is independent of update order.
Dynamics differ
Flooding: all messages update simultaneously, . Serial: messages update one at a time in some order. Jacobian of the update differs: serial can converge when flooding oscillates, because it uses freshly updated messages.
Practical takeaway
Serial scheduling typically converges faster and more reliably. The standard LDPC decoder uses a layered (block serial) schedule.
ch17-ex19
EasyName three problem domains where factor graphs are the standard framework for inference.
Coding, signal processing, statistics/ML.
Examples
(1) Channel decoding (LDPC, turbo): Tanner graph + loopy BP. (2) Signal processing (Kalman filter, HMM): chain graphs + forward-backward. (3) Machine learning (Bayesian networks, MRFs): message passing, variational inference, sampling. Also: robotics (SLAM), statistical physics (Ising models), compressed sensing.
ch17-ex20
ChallengeConsider a cluster graph obtained by grouping variables in a factor graph into overlapping clusters. Explain how generalized belief propagation (Yedidia et al.) uses a region graph to improve over loopy BP.
Higher-order corrections via inclusion-exclusion.
Cluster = super-node; region = subgraph.
Region graph
Choose overlapping regions (clusters) that cover each factor. Form a region graph: larger regions at top, smaller (intersections) below.
Generalized BP
Pass messages between regions. The resulting fixed points minimize a Kikuchi free energy, a generalization of Bethe that accounts for multi-node correlations.
Tradeoff
Larger regions β better approximation, but exponential cost in region size. Generalized BP bridges loopy BP (regions = factors) and exact inference (regions = whole graph).