The Max-Product Algorithm
From Marginals to MAP
Sum-product computes marginals β the probability distribution of a single variable integrating over all others. But many applications ask a different question: what is the most likely joint configuration ?
The point is that replacing with in the sum-product rules gives a new algorithm β max-product β that solves the MAP problem. The Viterbi algorithm is a famous instance: max-product on a trellis. Same factor graph, same algorithm skeleton, different operator. That is the elegance of the framework.
Definition: Max-Product Messages
Max-Product Messages
The max-product algorithm replaces the marginalization sum by a maximization: while the variable-to-factor update is unchanged (). The final max-marginal is
In log-domain (to avoid underflow), max-product becomes max-sum: products become sums and stays . This is the form used in Viterbi.
Theorem: Max-Product Computes MAP on Trees
If the factor graph is a tree, the max-product algorithm computes up to normalization, for every . Traceback (selecting ) yields the MAP configuration provided the MAP is unique.
Max-product is the MAP analog of sum-product. The max distributes over products in the same way sums do (because for ). On a tree, the same subtree-factorization argument applies, with max replacing sum.
Distributive law for max
For nonnegative factors, only if is the maximizer. The identity holds when does not depend on . This is the 'distributivity' over products.
Subtree argument
On a tree, eliminating a variable by maximization decouples into subtree maxima. The recursion gives exactly the max-product definition of the messages.
Correct backpointers
For reconstruction, each operation stores the argmax as a backpointer. Following backpointers from the root yields the joint MAP.
Example: Viterbi as Max-Product on a Trellis
A rate-1/1 convolutional code has state , with transition and output . We observe the outputs through AWGN. Show that the Viterbi decoder is max-product on the trellis factor graph.
Construct factor graph
Variable nodes: states . Factors: encoding both the trellis transitions and the observation likelihoods.
Max-product messages
Forward messages: . These are the path metrics in Viterbi. Backpointer .
Identify with Viterbi
Taking logs: . This is exactly the Viterbi recursion with branch metrics being the squared-distance terms.
Traceback recovers MAP sequence
After reaching , pick the best final state and trace back through stored backpointers to recover the most likely state sequence β and hence the most likely information bit sequence.
Viterbi Is Not New
Viterbi (1967) predates the factor graph formalism, but in retrospect his algorithm is nothing more than max-product on a trellis factor graph. Similarly, the Forney-Bahl-Cocke-Jelinek-Raviv (BCJR) algorithm is sum-product on the same trellis. Viterbi and BCJR are not just related β they are literally the same algorithm with vs. swapped.
Max-Product Algorithm (Log-Domain)
Complexity: Per iteration: β same as sum-product.On trees, one pair of passes is exact. On loopy graphs, max-product may not converge; damping and attention to ties are needed.
Theorem: Loopy Max-Product: Local Optimality
Let be a fixed point of loopy max-product on a graph with girth . Then is locally optimal: any alternative assignment differing from on a subset with has lower joint probability.
Loopy max-product does not always converge, but when it does, the returned configuration beats any 'local' perturbation. This is a weaker guarantee than global optimality, but it is nontrivial and is the theoretical basis of why max-product is useful in practice.
Reduction to tree
Consider a subset to perturb. The boundary of consists of edges leaving . If the boundary has fewer than edges, the local change does not create a cycle.
Tree bound
On the unrolled tree of depth , max-product is exact and is optimal. Hence no such local perturbation decreases the joint cost.
Sum-Product vs. Max-Product
| Aspect | Sum-product (BP) | Max-product (MAP-BP) |
|---|---|---|
| Computes | Marginals | MAP |
| Message operation at factor | Sum over neighbors | Max over neighbors |
| Log-domain equivalent | Log-sum-exp | Max-sum (Viterbi-style) |
| Classical instance | BCJR, forward-backward | Viterbi |
| Estimation criterion | MMSE / posterior mean | MAP |
| Exact on trees | Yes | Yes (with traceback) |
| Loopy behavior | Converges empirically; Bethe approximation | May oscillate; locally optimal fixed points |
| Bit vs. sequence error | Minimizes bit errors (marginal MAP) | Minimizes sequence errors (joint MAP) |
Common Mistake: Marginal MAP β Joint MAP
Mistake:
Computing marginal beliefs via sum-product and then setting , expecting this to equal the joint MAP.
Correction:
The vector of marginal-MAP choices is the posterior-mean decoder (minimizes bit error rate), not the joint MAP (minimizes word error rate). The two differ whenever the posterior has competing modes. If you want the joint MAP, use max-product, not sum-product. LDPC decoding uses sum-product because bit errors are what count; Viterbi uses max-product because sequence errors are what count.
Example: When Marginal MAP Differs from Joint MAP
A joint distribution on has . Find the joint MAP and the marginal-MAP assignments. Do they agree?
Joint MAP
or with mass 0.4 each. Either is a valid joint MAP.
Marginals
(same for ).
Marginal MAP
Ties in both marginals β any of could be picked. If ties are broken by lexicographic order, marginal-MAP returns , which happens to agree with a joint-MAP configuration. But in less symmetric problems, the two disagree (e.g., β joint MAP ties between three; marginal MAP or ).
Why This Matters: Viterbi in 5G Uplink Control Channels
While 5G data channels use LDPC (sum-product decoding), control channels (PUCCH) still use tail-biting convolutional codes decoded with Viterbi (max-product) β because they are short and Viterbi's low-latency, deterministic decoding is preferred over iterative BP. The same max-product algorithm, applied to a tiny trellis, decodes billions of control messages per day in deployed networks.
Viterbi Trellis Decoding Visualization
Visualize the Viterbi (max-product) algorithm on a convolutional code trellis. Watch the path metrics evolve and the survivor paths emerge.
Parameters
Quick Check
You want the estimator that minimizes the bit error rate over a long codeword. Which algorithm should you use?
Sum-product BP (marginal beliefs)
Max-product (joint MAP)
Both are equivalent
Neither β use least squares
Sum-product computes marginals; thresholding marginals minimizes bit error rate.
Key Takeaway
Max-product replaces with in sum-product, computing MAP instead of marginals. The Viterbi algorithm is max-product on a trellis. On trees, max-product is exact with traceback; on loopy graphs, fixed points are locally optimal. Use sum-product when you want bit-level MMSE; use max-product when you want sequence-level MAP.