The Viterbi Algorithm for MLSE

From Brute Force to Dynamic Programming

Theorem TMarkov Structure of the ISI Likelihood established that the ML metric is additive along paths through a finite state graph. Whenever an optimization objective decomposes this way, dynamic programming applies: at each stage, discard any partial path that cannot possibly be extended to the global optimum. For finite-memory channels the state graph is called a trellis, and the dynamic-programming procedure is the Viterbi algorithm. The same machine appears in convolutional decoding, hidden Markov models, and speech recognition β€” it is one of the most widely applied structural algorithms in all of engineering.

The payoff is dramatic. Brute-force MLSE examines MTM^T sequences. Viterbi examines O(ML+1T)O(M^{L+1} T) state transitions. For M=2M = 2, L=3L = 3, T=100T = 100 the ratio is 16β‹…10016 \cdot 100 versus 21002^{100} β€” about 102710^{27} fewer operations, and the answer is identical.

Definition:

Trellis

The trellis of an ISI channel with alphabet A\mathcal{A} and memory LL is the directed graph whose vertices are pairs (k,s)∈{0,1,…,T}Γ—S(k, s) \in \{0, 1, \ldots, T\} \times \mathcal{S} with S=AL\mathcal{S} = \mathcal{A}^L, and whose edges connect (k,sk)(k, s_k) to (k+1,sk+1)(k+1, s_{k+1}) whenever the transition is consistent with shifting in a new symbol x[k]∈Ax[k] \in \mathcal{A}:

sk+1=(x[k],x[kβˆ’1],…,x[kβˆ’L+1])ifsk=(x[kβˆ’1],…,x[kβˆ’L]).s_{k+1} = (x[k], x[k-1], \ldots, x[k-L+1]) \quad\text{if}\quad s_k = (x[k-1], \ldots, x[k-L]).

Each vertex has in-degree MM (one predecessor per choice of x[kβˆ’L]x[k-L]) and out-degree MM (one successor per choice of x[k]x[k]). An edge (sk,sk+1)(s_k, s_{k+1}) carries the branch metric Ξ³k(sk,sk+1)=∣y[k]βˆ’ΞΌk(sk,sk+1)∣2\gamma_k(s_k, s_{k+1}) = |y[k] - \mu_k(s_k, s_{k+1})|^2.

Definition:

Path Metric and Survivor

The path metric at node (k,s)(k, s) is the minimum cumulative branch cost along any path from the known initial state s0s_0 to state ss at time kk:

Ξ»k(s)β€…β€Š=β€…β€Šmin⁑(s0,s1,…,sk)sk=sβˆ‘j=0kβˆ’1Ξ³j(sj,sj+1).\lambda_k(s) \;=\; \min_{\substack{(s_0, s_1, \ldots, s_k) \\ s_k = s}} \sum_{j=0}^{k-1} \gamma_j(s_j, s_{j+1}).

The survivor at (k,s)(k, s) is the unique path attaining this minimum (ties broken arbitrarily). The surviving predecessor of (k,s)(k, s) is the state sβ€²βˆˆSs' \in \mathcal{S} such that Ξ»k(s)=Ξ»kβˆ’1(sβ€²)+Ξ³kβˆ’1(sβ€²,s)\lambda_k(s) = \lambda_{k-1}(s') + \gamma_{k-1}(s', s).

Only one survivor is kept per state per time step. The MM edges arriving at each (k,s)(k, s) are compared, the best is retained, and the other Mβˆ’1M-1 are discarded. This pruning is what keeps the algorithm's memory bounded β€” at any instant only ∣S∣=ML|\mathcal{S}| = M^L survivors are alive, regardless of how far back the trellis extends.

Theorem: Viterbi Correctness: Dynamic-Programming Recursion

Given the additive branch metrics of TMarkov Structure of the ISI Likelihood, the path metrics satisfy the recursion

Ξ»k+1(s)β€…β€Š=β€…β€Šmin⁑sβ€²βˆˆS(s)β€…β€Š{Ξ»k(sβ€²)β€…β€Š+β€…β€ŠΞ³k(sβ€²,s)},Ξ»0(s0)=0,\lambda_{k+1}(s) \;=\; \min_{s' \in \mathcal{S}(s)}\; \big\{ \lambda_k(s') \;+\; \gamma_k(s', s) \big\}, \qquad \lambda_0(s_0) = 0,

where S(s)\mathcal{S}(s) is the set of predecessors of state ss. The state s^T=arg⁑min⁑sλT(s)\hat{s}_T = \arg\min_{s} \lambda_T(s) together with the chain of surviving predecessors yields x^ML\hat{\mathbf{x}}_{\text{ML}}, the exact solution of the MLSE problem.

Bellman's principle of optimality: an optimal path through the trellis has the property that every sub-path (from the start to any intermediate node) is itself optimal for reaching that node. Hence if two partial paths end at the same state, only the one with smaller cumulative metric can ever be a prefix of the global optimum β€” the other can be discarded without loss.

Key Takeaway

The Viterbi algorithm converts the exponential-in-TT MLSE search into a forward sweep with Mβ‹…βˆ£S∣M \cdot |\mathcal{S}| additions and comparisons per time step. Its correctness follows directly from the Markov factorization of the log-likelihood and Bellman's principle.

Viterbi Algorithm for MLSE

Complexity: O(ML+1β‹…T)O(M^{L+1} \cdot T) operations, O(∣Sβˆ£β‹…T)O(|\mathcal{S}| \cdot T) memory (or O(∣S∣)O(|\mathcal{S}|) with sliding-window traceback)
Input : received block y[0..T-1], channel taps h[0..L], alphabet A
Output : ML symbol sequence x_hat[0..T-1]
// ---- Forward recursion ----
for every state s in S:
lambda[0][s] <- +infinity
lambda[0][s0] <- 0 // known initial state
for k = 0 to T - 1:
for every state s in S:
lambda[k+1][s] <- +infinity
for every predecessor s' in S(s):
x_k <- symbol entering state s from s'
mu <- sum over ell of h[ell] * (symbols read from s', x_k)
gamma <- | y[k] - mu |^2
cand <- lambda[k][s'] + gamma
if cand < lambda[k+1][s]:
lambda[k+1][s] <- cand
psi[k+1][s] <- s' // surviving predecessor
// ---- Traceback ----
s_hat <- argmin over s of lambda[T][s]
for k = T down to 1:
s_prev <- psi[k][s_hat]
x_hat[k-1] <- symbol entering s_hat from s_prev
s_hat <- s_prev
return x_hat

In practice the full TΓ—βˆ£S∣T \times |\mathcal{S}| traceback array is prohibitive for long blocks. A sliding window of Dβ‰ˆ5LD \approx 5L columns is maintained; paths older than DD are deemed to have merged (an empirical fact for most channels). This converts the memory cost to O(∣Sβˆ£β‹…D)O(|\mathcal{S}| \cdot D) with negligible loss in performance.

Viterbi Trellis Traversal on a Two-Tap BPSK Channel

Watch the algorithm build path metrics and highlight the maximum-likelihood path through a two-state trellis. Adjust the SNR and the channel tap h[1]h[1] to see how noise and ISI strength shape the surviving paths.

Parameters
1
0.7
10
6

The Viterbi Algorithm Step by Step

A short animation that walks through Viterbi on a small 2-state trellis, adding branches, pruning dominated partial paths, and performing the traceback at the end.
At each time step, the two candidate branches arriving at each state are compared; the survivor (shown in green) is kept, the discarded branch (red, faded) is pruned. The final traceback highlights the maximum-likelihood path.

Example: Complete Viterbi Trace for T=3T=3

Using the same channel h=[1,0.5]T\mathbf{h} = [1, 0.5]^T, BPSK alphabet, and initial state s0=+1s_0 = +1 as in EBranch Metrics for a Two-Tap Channel with BPSK, now suppose the receiver observes y=(1.2,βˆ’0.7,βˆ’1.4)y = (1.2, -0.7, -1.4). Run the Viterbi forward recursion to completion, record all surviving path metrics, and perform the traceback to recover x^ML\hat{\mathbf{x}}_{\text{ML}}.

Brute-Force MLSE vs Viterbi

QuantityBrute-force MLSEViterbi
Number of sequences / paths examinedMTM^TML+1β‹…TM^{L+1} \cdot T
MemoryO(T)O(T) per path, MTM^T pathsO(MLβ‹…D)O(M^L \cdot D) for traceback window DD
OutputExact ML sequenceExact ML sequence
Dependence on block lengthExponentialLinear
Dependence on channel memoryNone (in complexity)Exponential
Dependence on alphabet sizeExponential (MTM^T)Linear in MM, exponential in LL

Common Mistake: Infinite Traceback Is Unnecessary β€” But Don't Cut It Too Short

Mistake:

Readers sometimes either (a) store the entire TΓ—βˆ£S∣T \times |\mathcal{S}| traceback matrix, wasting memory for long blocks, or (b) pick a traceback depth that is too short, causing noisy or incorrect decisions near the current time.

Correction:

A rule of thumb is to use a sliding traceback window of Dβ‰ˆ5LD \approx 5L to 7L7L. Empirically this is enough for surviving paths to have merged into a common prefix with probability close to one, so the decision at time kβˆ’Dk - D is nearly as good as with full traceback. Cutting below about 4L4L produces noticeable BER degradation.

Quick Check

A system uses 16-QAM modulation (M=16M = 16) over a channel with memory L=3L = 3. Roughly how many multiply-accumulates does the Viterbi algorithm require per transmitted symbol?

16β‹…3=4816 \cdot 3 = 48

163β‹…16=65,53616^3 \cdot 16 = 65{,}536

1610016^{100} (for block length 100)

log⁑216=4\log_2 16 = 4

Historical Note: Ungerboeck's Whitened-Matched-Filter Receiver (1974)

1970s

Two years after Forney, Gottfried Ungerboeck showed in a 1974 paper that the Viterbi algorithm can be applied after a whitened matched filter front end, producing branch metrics of a particularly simple form that directly expose the channel's minimum-phase equivalent. Ungerboeck's formulation is the one implemented in most wireline modem receivers and in the GSM/EDGE equalizer; it also provides the conceptual bridge from MLSE to the MMSE-DFE that closes this chapter. Together, Forney (1972) and Ungerboeck (1974) defined the modern textbook treatment of sequence estimation over ISI channels.

Viterbi algorithm

A dynamic-programming procedure that computes the minimum-cost path through a trellis by propagating surviving path metrics forward one stage at a time, pruning dominated partial paths. Applied to the ISI trellis, it produces the maximum-likelihood symbol sequence in time linear in the block length.

Related: Maximum-likelihood sequence estimation (MLSE), Trellis

Trellis

The time-unrolled state graph of a finite-memory channel or code. Vertices are (time, state) pairs; edges correspond to state transitions driven by the input symbol; each edge carries a branch metric used by the Viterbi algorithm to rank candidate paths.

Related: Viterbi algorithm

Why This Matters: Viterbi-MLSE in GSM

The GSM receiver runs a Viterbi equalizer on a channel of nominal memory L=4L = 4 with a binary-like Gaussian MSK alphabet. Because the alphabet is effectively M=2M = 2 after pre-processing, the trellis has 24=162^4 = 16 states β€” very small. This is exactly the regime where MLSE is cheap and optimal. When LTE moved to wider bandwidths and much longer delay spreads, the trellis became infeasibly large and OFDM was adopted precisely because its per-subcarrier processing side-steps the trellis entirely (see Book 1, Chapter 14).