Ferkans — Interactive Telecom Tutor

From Brute Force to Dynamic Programming

Theorem TMarkov Structure of the ISI Likelihood established that the ML metric is additive along paths through a finite state graph. Whenever an optimization objective decomposes this way, dynamic programming applies: at each stage, discard any partial path that cannot possibly be extended to the global optimum. For finite-memory channels the state graph is called a trellis, and the dynamic-programming procedure is the Viterbi algorithm. The same machine appears in convolutional decoding, hidden Markov models, and speech recognition — it is one of the most widely applied structural algorithms in all of engineering.

The payoff is dramatic. Brute-force MLSE examines $M^T$ sequences. Viterbi examines $O(M^{L+1} T)$ state transitions. For $M = 2$ , $L = 3$ , $T = 100$ the ratio is $16 \cdot 100$ versus $2^{100}$ — about $10^{27}$ fewer operations, and the answer is identical.

Definition:
Trellis

The trellis of an ISI channel with alphabet $\mathcal{A}$ and memory $L$ is the directed graph whose vertices are pairs $(k, s) \in \{0, 1, \ldots, T\} \times \mathcal{S}$ with $\mathcal{S} = \mathcal{A}^L$ , and whose edges connect $(k, s_k)$ to $(k+1, s_{k+1})$ whenever the transition is consistent with shifting in a new symbol $x[k] \in \mathcal{A}$ :

$s_{k+1} = (x[k], x[k-1], \ldots, x[k-L+1]) \quad\text{if}\quad s_k = (x[k-1], \ldots, x[k-L]).$

Each vertex has in-degree $M$ (one predecessor per choice of $x[k-L]$ ) and out-degree $M$ (one successor per choice of $x[k]$ ). An edge $(s_k, s_{k+1})$ carries the branch metric $\gamma_k(s_k, s_{k+1}) = |y[k] - \mu_k(s_k, s_{k+1})|^2$ .

Definition:
Path Metric and Survivor

The path metric at node $(k, s)$ is the minimum cumulative branch cost along any path from the known initial state $s_0$ to state $s$ at time $k$ :

$\lambda_k(s) \;=\; \min_{\substack{(s_0, s_1, \ldots, s_k) \\ s_k = s}} \sum_{j=0}^{k-1} \gamma_j(s_j, s_{j+1}).$

The survivor at $(k, s)$ is the unique path attaining this minimum (ties broken arbitrarily). The surviving predecessor of $(k, s)$ is the state $s' \in \mathcal{S}$ such that $\lambda_k(s) = \lambda_{k-1}(s') + \gamma_{k-1}(s', s)$ .

Only one survivor is kept per state per time step. The $M$ edges arriving at each $(k, s)$ are compared, the best is retained, and the other $M-1$ are discarded. This pruning is what keeps the algorithm's memory bounded — at any instant only $|\mathcal{S}| = M^L$ survivors are alive, regardless of how far back the trellis extends.

Theorem: Viterbi Correctness: Dynamic-Programming Recursion

Given the additive branch metrics of TMarkov Structure of the ISI Likelihood, the path metrics satisfy the recursion

$\lambda_{k+1}(s) \;=\; \min_{s' \in \mathcal{S}(s)}\; \big\{ \lambda_k(s') \;+\; \gamma_k(s', s) \big\}, \qquad \lambda_0(s_0) = 0,$

where $\mathcal{S}(s)$ is the set of predecessors of state $s$ . The state $\hat{s}_T = \arg\min_{s} \lambda_T(s)$ together with the chain of surviving predecessors yields $\hat{\mathbf{x}}_{\text{ML}}$ , the exact solution of the MLSE problem.

Bellman's principle of optimality: an optimal path through the trellis has the property that every sub-path (from the start to any intermediate node) is itself optimal for reaching that node. Hence if two partial paths end at the same state, only the one with smaller cumulative metric can ever be a prefix of the global optimum — the other can be discarded without loss.

Show Hint

Use induction on $k$ , assuming $\lambda_k(s')$ is the cost of the best length- $k$ path to $s'$ for every $s'$ .

Any length- $(k{+}1)$ path to $s$ decomposes into a length- $k$ path to some predecessor $s'$ followed by the edge $(s', s)$ .

Minimize over $s'$ to obtain $\lambda_{k+1}(s)$ .

Proof

Base case

At time $0$ the only state with finite cost is the known initial state $s_0$ , and $\lambda_0(s_0) = 0$ by the empty-sum convention. All other states have $\lambda_0(s) = +\infty$ , reflecting that no trellis path has yet reached them.

Inductive hypothesis

Assume that at time $k$ , for every $s \in \mathcal{S}$ , $\lambda_k(s)$ equals the cumulative branch-metric cost of the best length- $k$ path from $s_0$ to $s$ .

Path decomposition

Any length- $(k{+}1)$ trellis path ending at state $s$ has a unique predecessor state $s' \in \mathcal{S}(s)$ visited at time $k$ . Its cumulative cost is the cost of the length- $k$ prefix plus the edge metric $\gamma_k(s', s)$ .

Minimizing over predecessors

Minimizing the cumulative cost over all length- $(k{+}1)$ paths to $s$ therefore decomposes as

$\lambda_{k+1}(s) = \min_{s' \in \mathcal{S}(s)} \min_{\text{paths to } s'} \left\{ \text{cost} + \gamma_k(s', s) \right\} = \min_{s' \in \mathcal{S}(s)} \{\lambda_k(s') + \gamma_k(s', s)\},$

which is the stated recursion.

Global optimality and traceback

By induction $\lambda_T(s)$ equals the minimum cumulative cost of any length- $T$ path to $s$ . The minimum over terminal states $\hat{s}_T = \arg\min_s \lambda_T(s)$ identifies the optimal path's endpoint. Walking backward along surviving predecessors recovers the entire optimal state sequence, from which $\hat{\mathbf{x}}_{\text{ML}}$ is read off. Because the log-likelihood equals $-\sum_k \gamma_k / N_0$ plus a constant (theorem TMarkov Structure of the ISI Likelihood), minimizing the cumulative metric is equivalent to maximizing the likelihood. $\blacksquare$

Key Takeaway

The Viterbi algorithm converts the exponential-in- $T$ MLSE search into a forward sweep with $M \cdot |\mathcal{S}|$ additions and comparisons per time step. Its correctness follows directly from the Markov factorization of the log-likelihood and Bellman's principle.

Viterbi Algorithm for MLSE

Complexity:

O(M^{L+1} \cdot T)

operations,

O(|\mathcal{S}| \cdot T)

memory (or

O(|\mathcal{S}|)

with sliding-window traceback)

Input : received block y[0..T-1], channel taps h[0..L], alphabet A

Output : ML symbol sequence x_hat[0..T-1]

// ---- Forward recursion ----

for every state s in S:

lambda[0][s] <- +infinity

lambda[0][s0] <- 0 // known initial state

for k = 0 to T - 1:

for every state s in S:

lambda[k+1][s] <- +infinity

for every predecessor s' in S(s):

x_k <- symbol entering state s from s'

mu <- sum over ell of h[ell] * (symbols read from s', x_k)

gamma <- | y[k] - mu |^2

cand <- lambda[k][s'] + gamma

if cand < lambda[k+1][s]:

lambda[k+1][s] <- cand

psi[k+1][s] <- s' // surviving predecessor

// ---- Traceback ----

s_hat <- argmin over s of lambda[T][s]

for k = T down to 1:

s_prev <- psi[k][s_hat]

x_hat[k-1] <- symbol entering s_hat from s_prev

s_hat <- s_prev

return x_hat

In practice the full $T \times |\mathcal{S}|$ traceback array is prohibitive for long blocks. A sliding window of $D \approx 5L$ columns is maintained; paths older than $D$ are deemed to have merged (an empirical fact for most channels). This converts the memory cost to $O(|\mathcal{S}| \cdot D)$ with negligible loss in performance.

Viterbi Trellis Traversal on a Two-Tap BPSK Channel

Watch the algorithm build path metrics and highlight the maximum-likelihood path through a two-state trellis. Adjust the SNR and the channel tap $h[1]$ to see how noise and ISI strength shape the surviving paths.

Parameters

h[0]

1

h[1]

0.7

SNR (dB)10

Trellis length

T

6

The Viterbi Algorithm Step by Step

A short animation that walks through Viterbi on a small 2-state trellis, adding branches, pruning dominated partial paths, and performing the traceback at the end.

At each time step, the two candidate branches arriving at each state are compared; the survivor (shown in green) is kept, the discarded branch (red, faded) is pruned. The final traceback highlights the maximum-likelihood path.

Example: Complete Viterbi Trace for $T=3$

Using the same channel $\mathbf{h} = [1, 0.5]^T$ , BPSK alphabet, and initial state $s_0 = +1$ as in EBranch Metrics for a Two-Tap Channel with BPSK, now suppose the receiver observes $y = (1.2, -0.7, -1.4)$ . Run the Viterbi forward recursion to completion, record all surviving path metrics, and perform the traceback to recover $\hat{\mathbf{x}}_{\text{ML}}$ .

Solution

Initialization

The state space is $\mathcal{S} = \{+1, -1\}$ (labelled by the most recent symbol). Set $\lambda_0(+1) = 0$ , $\lambda_0(-1) = +\infty$ .

Step $k=0$, observation $y[0]=1.2$

From the known state $s_0 = +1$ only, two outgoing branches exist:

$x[0] = +1 \Rightarrow s_1 = +1,\ \hat{y}_0 = 1.5,\ \gamma = 0.09 \Rightarrow \lambda_1(+1) = 0.09$
$x[0] = -1 \Rightarrow s_1 = -1,\ \hat{y}_0 = -0.5,\ \gamma = 2.89 \Rightarrow \lambda_1(-1) = 2.89$

Surviving predecessors: $\psi_1(+1) = \psi_1(-1) = +1$ .

Step $k=1$, observation $y[1]=-0.7$

Evaluate all four incoming branches at each state:

To $s_2 = +1$ (i.e. $x[1] = +1$ ):

from $s_1 = +1$ : $\hat{y}_1 = 1.5$ , $\gamma = 4.84$ , cand $= 0.09 + 4.84 = 4.93$
from $s_1 = -1$ : $\hat{y}_1 = 0.5$ , $\gamma = 1.44$ , cand $= 2.89 + 1.44 = 4.33$

To $s_2 = -1$ (i.e. $x[1] = -1$ ):

from $s_1 = +1$ : $\hat{y}_1 = -0.5$ , $\gamma = 0.04$ , cand $= 0.09 + 0.04 = 0.13$
from $s_1 = -1$ : $\hat{y}_1 = -1.5$ , $\gamma = 0.64$ , cand $= 2.89 + 0.64 = 3.53$

Therefore $\lambda_2(+1) = 4.33$ with $\psi_2(+1) = -1$ , and $\lambda_2(-1) = 0.13$ with $\psi_2(-1) = +1$ .

Step $k=2$, observation $y[2]=-1.4$

To $s_3 = +1$ (i.e. $x[2] = +1$ ):

from $s_2 = +1$ : $\hat{y}_2 = 1.5$ , $\gamma = 8.41$ , cand $= 4.33 + 8.41 = 12.74$
from $s_2 = -1$ : $\hat{y}_2 = 0.5$ , $\gamma = 3.61$ , cand $= 0.13 + 3.61 = 3.74$

To $s_3 = -1$ (i.e. $x[2] = -1$ ):

from $s_2 = +1$ : $\hat{y}_2 = -0.5$ , $\gamma = 0.81$ , cand $= 4.33 + 0.81 = 5.14$
from $s_2 = -1$ : $\hat{y}_2 = -1.5$ , $\gamma = 0.01$ , cand $= 0.13 + 0.01 = 0.14$

Thus $\lambda_3(+1) = 3.74$ with $\psi_3(+1) = -1$ , and $\lambda_3(-1) = 0.14$ with $\psi_3(-1) = -1$ .

Traceback

The terminal minimum is $\lambda_3(-1) = 0.14$ , so $\hat{s}_3 = -1$ . Following $\psi$ : $\hat{s}_2 = \psi_3(-1) = -1$ , $\hat{s}_1 = \psi_2(-1) = +1$ , and the known $s_0 = +1$ .

Reading off the symbols (each state equals the most recent symbol): $\hat{\mathbf{x}}_{\text{ML}} = (+1, -1, -1).$

Compare to the brute-force total from EBranch Metrics for a Two-Tap Channel with BPSK: Viterbi arrived at the identical answer without enumerating any path of length $> 2$ .

Brute-Force MLSE vs Viterbi

Quantity	Brute-force MLSE	Viterbi
Number of sequences / paths examined	$M^T$	$M^{L+1} \cdot T$
Memory	$O(T)$ per path, $M^T$ paths	$O(M^L \cdot D)$ for traceback window $D$
Output	Exact ML sequence	Exact ML sequence
Dependence on block length	Exponential	Linear
Dependence on channel memory	None (in complexity)	Exponential
Dependence on alphabet size	Exponential ( $M^T$ )	Linear in $M$ , exponential in $L$

Common Mistake: Infinite Traceback Is Unnecessary — But Don't Cut It Too Short

Mistake:

Readers sometimes either (a) store the entire $T \times |\mathcal{S}|$ traceback matrix, wasting memory for long blocks, or (b) pick a traceback depth that is too short, causing noisy or incorrect decisions near the current time.

Correction:

A rule of thumb is to use a sliding traceback window of $D \approx 5L$ to $7L$ . Empirically this is enough for surviving paths to have merged into a common prefix with probability close to one, so the decision at time $k - D$ is nearly as good as with full traceback. Cutting below about $4L$ produces noticeable BER degradation.

Quick Check

A system uses 16-QAM modulation ( $M = 16$ ) over a channel with memory $L = 3$ . Roughly how many multiply-accumulates does the Viterbi algorithm require per transmitted symbol?

$16 \cdot 3 = 48$

$16^3 \cdot 16 = 65{,}536$

$16^{100}$ (for block length 100)

$\log_2 16 = 4$

Correction:

16^3 \cdot 16 = 65{,}536

The trellis has $|\\mathcal{S}| = M^L = 16^3 = 4096$ states, and at each time step each state has $M = 16$ incoming branches to compare — roughly $M^{L+1} = 65{,}536$ branch metric evaluations per received sample. This is orders of magnitude larger than a linear equalizer, which is why Viterbi for 16-QAM is usually avoided for $L \\geq 4$ .

Historical Note: Ungerboeck's Whitened-Matched-Filter Receiver (1974)

1970s

Two years after Forney, Gottfried Ungerboeck showed in a 1974 paper that the Viterbi algorithm can be applied after a whitened matched filter front end, producing branch metrics of a particularly simple form that directly expose the channel's minimum-phase equivalent. Ungerboeck's formulation is the one implemented in most wireline modem receivers and in the GSM/EDGE equalizer; it also provides the conceptual bridge from MLSE to the MMSE-DFE that closes this chapter. Together, Forney (1972) and Ungerboeck (1974) defined the modern textbook treatment of sequence estimation over ISI channels.

Viterbi algorithm

A dynamic-programming procedure that computes the minimum-cost path through a trellis by propagating surviving path metrics forward one stage at a time, pruning dominated partial paths. Applied to the ISI trellis, it produces the maximum-likelihood symbol sequence in time linear in the block length.

Trellis

The time-unrolled state graph of a finite-memory channel or code. Vertices are (time, state) pairs; edges correspond to state transitions driven by the input symbol; each edge carries a branch metric used by the Viterbi algorithm to rank candidate paths.

Related: Viterbi algorithm

Why This Matters: Viterbi-MLSE in GSM

The GSM receiver runs a Viterbi equalizer on a channel of nominal memory $L = 4$ with a binary-like Gaussian MSK alphabet. Because the alphabet is effectively $M = 2$ after pre-processing, the trellis has $2^4 = 16$ states — very small. This is exactly the regime where MLSE is cheap and optimal. When LTE moved to wider bandwidths and much longer delay spreads, the trellis became infeasibly large and OFDM was adopted precisely because its per-subcarrier processing side-steps the trellis entirely (see Book 1, Chapter 14).

The Viterbi Algorithm for MLSE