Viterbi Decoding for TCM

Why Viterbi is the Right Decoder for TCM

The encoder in a TCM scheme is a finite-state machine: the next state depends on the current state and the input bits. Under AWGN, the maximum-likelihood sequence detector minimizes the total squared Euclidean distance between the received-sample sequence and all admissible transmitted sequences. The Viterbi algorithm does exactly this: it performs dynamic programming on the trellis, pruning all paths that cannot be optimal at each time step, and returns the surviving best path at traceback.

TCM introduces exactly one twist compared to classical binary Viterbi decoding: the parallel transitions (multiple edges between the same pair of states, produced by the mβˆ’m~m - \tilde{m} uncoded bits) must be collapsed by a subset decoder into a single "best" edge per coset per time step, before the Viterbi recursion runs on the state-pair trellis. This preprocessing step shrinks the effective branch count from 2m2^m per state to 2m~2^{\tilde{m}} per state, which is what makes TCM decoding practical.

Everything else β€” path-metric update, add-compare-select (ACS), survivor memory, traceback β€” is identical to binary Viterbi. This section states the algorithm precisely, works one section of the 4-state 8-PSK trellis by hand, and discusses the complexity scaling.

Definition:

Branch Metric (TCM)

For a received sample yk∈Cy_k \in \mathbb{C} at time kk and a trellis edge at time kk labelled by coset DD, the branch metric (after subset decoding) is

Ξ»k(D)β€…β€Šβ‰œβ€…β€Šmin⁑x∈Dβˆ₯ykβˆ’xβˆ₯2.\lambda_k(D) \;\triangleq\; \min_{x \in D} \|y_k - x\|^2.

The x∈Dx \in D achieving the minimum is the subset-decoded point of DD at time kk and is recorded as the best in-coset decision for that edge; it will be used to recover the uncoded bits at traceback.

For level-(m~+1)(\tilde{m}+1) cosets, ∣D∣=2mβˆ’m~|D| = 2^{m - \tilde{m}} candidates must be searched β€” typically 1, 2, or 4 points. This is a small computation per time step per coset.

,

Definition:

Path Metric and Survivor

The path metric of a trellis path Ο€\pi ending at state ss at time kk is the sum of branch metrics along Ο€\pi:

Ξ›k(s,Ο€)β€…β€Š=β€…β€Šβˆ‘β„“=0kβˆ’1Ξ»β„“(DΟ€,β„“),\Lambda_k(s, \pi) \;=\; \sum_{\ell = 0}^{k-1} \lambda_\ell(D_{\pi, \ell}),

where DΟ€,β„“D_{\pi, \ell} is the coset labelling the β„“\ell-th edge of Ο€\pi. The survivor at state ss at time kk is the path arriving at ss with the smallest path metric among all paths ending at ss:

Ο€kβˆ—(s)β€…β€Š=β€…β€Šarg min⁑π:π endsΒ atΒ sΒ atΒ timeΒ kΞ›k(s,Ο€).\pi^*_k(s) \;=\; \operatorname*{arg\,min}_{\pi : \pi \text{ ends at } s \text{ at time } k} \Lambda_k(s, \pi).

By Bellman's optimality principle, once a path is eliminated at state ss at time kk, it cannot be part of the ML sequence through any later state, so it can be discarded.

,

Definition:

Traceback (Survivor Decoding)

At some time kendk_\text{end}, the decoder selects the state sβˆ—s^* with the smallest path metric Ξ›kend(sβˆ—,Ο€βˆ—)\Lambda_{k_\text{end}}(s^*, \pi^*) and reconstructs the entire survivor path by walking backward through the stored survivor pointers. The coset indices along this path, together with the subset-decoded points at each edge, yield the maximum-likelihood sequence of transmitted symbols.

In practice, the decoder does not wait until kendk_\text{end} but performs rolling traceback after each DD trellis sections (the traceback depth), declaring a decision at time kβˆ’Dk - D at every time kk.

Recommended traceback depth: Dβ‰ˆ5Ξ½D \approx 5\nu to 7Ξ½7\nu for convolutional memory Ξ½\nu, so that the probability of incorrect traceback from a wrong final-state choice is negligible (ETrellis Depth and Decoder Latency).

,

Branch Metric

The cost of a single trellis edge, used by the Viterbi algorithm. For TCM, it is the minimum squared Euclidean distance between the received sample and any point in the coset labelling that edge.

Related: Branch Metric (TCM), Subset Decoding (Within-Coset Nearest-Point Rule)

Path Metric

The sum of branch metrics along a trellis path. The Viterbi algorithm maintains, for each state at each time, the minimum path metric among all paths arriving at that state (the survivor path metric).

Related: Path Metric and Survivor

Survivor Path

At each state and time, the trellis path arriving at that state with the smallest path metric. By Bellman's principle, all other paths arriving at that state can be discarded.

Related: Path Metric and Survivor, Traceback (Survivor Decoding)

Viterbi Algorithm for TCM (with Subset Decoding)

Complexity: O(Nsβ‹…2m~β‹…T)\mathcal{O}(N_s \cdot 2^{\tilde{m}} \cdot T) time; O(Nsβ‹…D)\mathcal{O}(N_s \cdot D) memory for traceback depth DD.
Input: Received samples (y0,y1,…,yTβˆ’1)(y_0, y_1, \ldots, y_{T-1}); trellis with
NsN_s states; partition cosets {Di(j):j=0,…,2m~+1βˆ’1}\{D_i^{(j)} : j = 0, \ldots, 2^{\tilde{m}+1} - 1\}
at level m~+1\tilde{m}+1; traceback depth DD.
Output: Decoded symbol sequence (x^0,…,x^Tβˆ’1)(\hat{x}_0, \ldots, \hat{x}_{T-1}).
1. Initialization. Set Ξ›0(s0)←0\Lambda_0(s_0) \leftarrow 0 for the known
start state s0=0s_0 = 0, and Ξ›0(s)←+∞\Lambda_0(s) \leftarrow +\infty for
all other ss. Survivor pointers empty.
2. For k=0,1,…,Tβˆ’1k = 0, 1, \ldots, T - 1 do
3. \quad Subset decoding. For each coset DjD_j, j=0,…,2m~+1βˆ’1j = 0, \ldots, 2^{\tilde{m}+1} - 1:
4. x^k(Dj)←arg min⁑x∈Djβˆ₯ykβˆ’xβˆ₯2\qquad \hat{x}_k(D_j) \leftarrow \operatorname{arg\,min}_{x \in D_j} \|y_k - x\|^2; Ξ»k(Dj)←βˆ₯ykβˆ’x^k(Dj)βˆ₯2\lambda_k(D_j) \leftarrow \|y_k - \hat{x}_k(D_j)\|^2.
5. \quad Add-Compare-Select (ACS). For each state sβ€²s' at time k+1k+1:
6. \qquad For each predecessor state ss with valid transition s→s′s \to s' labelled by coset Dj(s,s′)D_{j(s, s')}:
7. \qquad\quad Compute candidate metric Ξ›k(s)+Ξ»k(Dj(s,sβ€²))\Lambda_k(s) + \lambda_k(D_{j(s, s')}).
8. \qquad Ξ›k+1(sβ€²)←\Lambda_{k+1}(s') \leftarrow smallest such candidate; record the corresponding ss and x^k(Dj(s,sβ€²))\hat{x}_k(D_{j(s, s')}) as survivor pointer for (sβ€²,k+1)(s', k+1).
9. \quad Rolling traceback. If kβ‰₯Dk \geq D, walk backward DD steps from the best state at time k+1k+1 and output the symbol at time k+1βˆ’Dk + 1 - D.
10. end for
11. Final traceback. From the best final state, walk back to the
initial state and output any remaining symbols.

The subset-decoding step is embarrassingly parallel (one coset at a time, one sample at a time). The ACS step is the bottleneck: NsN_s states times 2m~2^{\tilde{m}} predecessors per state times two adds and one compare each = O(Nsβ‹…2m~)\mathcal{O}(N_s \cdot 2^{\tilde{m}}) ops per symbol. This is why TCM uses small m~\tilde{m} (typically 1 or 2) β€” the complexity grows as 2m~2^{\tilde{m}}.

Viterbi Decoding on the 4-State 8-PSK TCM Trellis

Animated step-by-step Viterbi decoding over several sections of the canonical 4-state 8-PSK TCM trellis. Watch survivor paths accumulate and non-survivors get pruned at each ACS step, with path metrics Ξ›k(s)\Lambda_k(s) shown numerically at each state. The algorithm's left-to-right sweep + right-to-left traceback is the structure to remember.
Viterbi at work: survivor paths (bold) and pruned paths (dim) over 6 trellis sections. Each state keeps exactly one survivor per time step β€” NsN_s survivors total, independent of how many paths have been considered.

Example: One ACS Step of Viterbi on the 4-State 8-PSK TCM

Consider the 4-state 8-PSK TCM trellis from EVerifying the Design Rules on the 4-State 8-PSK TCM. Suppose at time kk the path metrics at the four states are Ξ›k=(0.0,β€…β€Š0.5,β€…β€Š1.2,β€…β€Š0.8)\Lambda_k = (0.0,\; 0.5,\; 1.2,\; 0.8) and the received sample is yk=0.9ejΟ€/8y_k = 0.9 e^{j\pi/8} (between 8-PSK points labelled 00 and 11, slightly closer to 00). For simplicity take the normalized 8-PSK X={ej2Ο€β„“/8:β„“=0,…,7}\mathcal{X} = \{e^{j 2\pi \ell/8} : \ell = 0, \ldots, 7\}. The trellis has the standard Ungerboeck structure: each state has 2 outgoing edges; each edge has 2 parallel transitions, carrying antipodal 8-PSK points at level-2 cosets labelled by the coset bit pair (high bit, low bit).

Perform one ACS step: subset-decode the four level-2 cosets, then compute the new path metrics Ξ›k+1(sβ€²)\Lambda_{k+1}(s') for sβ€²=0,1,2,3s' = 0, 1, 2, 3.

,
⚠️Engineering Note

Viterbi Complexity vs. Trellis Memory

The Viterbi decoder cost per symbol scales as:

  • Branch-metric step: 2m~+12^{\tilde{m}+1} coset metrics, each of which requires a search over 2mβˆ’m~2^{m - \tilde{m}} in-coset candidates. Total: 2m+12^{m+1} point-distance computations (same as full ML on the constellation).
  • ACS step: Nsβ‹…2m~N_s \cdot 2^{\tilde{m}} adds and NsN_s compares.
  • Survivor memory: Nsβ‹…DN_s \cdot D bits for traceback depth DD.

Doubling NsN_s (one more memory cell Ξ½\nu) doubles ACS cost and survivor memory, and typically gains 0.3–0.5 dB of free-distance coding gain. V.34's 16-state TCM was considered the sweet spot at the time: more states would have exceeded the DSP budget of 1994 modems. Modern 5G NR uses LDPC codes precisely because the per-symbol Viterbi complexity scales poorly as NsN_s grows into the tens of thousands.

Practical Constraints
  • β€’

    Per-symbol ACS cost: Nsβ‹…2m~N_s \cdot 2^{\tilde{m}} operations

  • β€’

    Branch-metric cost: 2m+12^{m+1} distance evaluations

  • β€’

    Survivor memory: Nsβ‹…DN_s \cdot D bits with Dβ‰ˆ5Ξ½D \approx 5\nu

πŸ“‹ Ref: ITU-T V.34 Section 9

Common Mistake: Using Hamming Distance as the Viterbi Branch Metric for TCM

Mistake:

Applying a binary Viterbi decoder directly to the coset-label output of a TCM encoder, using Hamming distance as the branch metric.

Correction:

The correct TCM branch metric is the squared Euclidean distance in the signal space, after subset decoding β€” not the Hamming distance between received bits and the coset label. Using Hamming distance discards the geometric information (distance-doubling at each partition level) that is the whole point of TCM. The coding gain drops from 3–6 dB to roughly 0 dB β€” you end up decoding as if the underlying binary code were used directly over BPSK.

Common Mistake: Too-Shallow Traceback Depth

Mistake:

Setting the traceback depth DD to be too small (e.g., D=Ξ½D = \nu or D=2Ξ½D = 2\nu).

Correction:

With DD smaller than about 5Ξ½5\nu, the probability that the survivor paths at time kβˆ’Dk - D have not merged to a single common ancestor is non-negligible: different final states disagree on what the symbol at time kβˆ’Dk - D was. This causes burst errors at the decoder output β€” not from the channel, but from the decoder itself. Rule of thumb: D=5Ξ½D = 5\nu for rate-1/21/2 codes, D=7Ξ½D = 7\nu for higher-rate codes.

Quick Check

The Viterbi decoder of a TCM code with NsN_s states and m~\tilde{m} coded bits performs, per symbol:

NsN_s add-compare-select operations and 2m~+12^{\tilde{m}+1} branch-metric evaluations.

Nsβ‹…2m~N_s \cdot 2^{\tilde{m}} add-compare-select operations and 2m~+12^{\tilde{m}+1} branch-metric (coset) evaluations, each an in-coset minimum over 2mβˆ’m~2^{m-\tilde{m}} candidates.

2m2^m add-compare-select operations and a single branch-metric evaluation.

Ns2N_s^2 compares and NsN_s branch-metric evaluations.

Quick Check

A 64-state (Ξ½=6\nu = 6) TCM code operates at 28 80028\,800 symbols/s. Using the rule-of-thumb traceback depth D=5Ξ½D = 5\nu, what is the approximate decoder latency?

About 1.01.0 ms.

About 3030 ms.

About 100100 ΞΌ\mus.

About 66 ΞΌ\mus.

Key Takeaway

Two-stage decoding. TCM decoding is binary-code Viterbi wrapped around per-coset subset decoding. First, for every time step and every coset, find the nearest in-coset constellation point and record its squared Euclidean distance to the received sample. Second, run Viterbi over the state-pair trellis using those coset-level branch metrics. The per-symbol cost is O(Nsβ‹…2m~)\mathcal{O}(N_s \cdot 2^{\tilde{m}}) ACS operations plus O(2m+1)\mathcal{O}(2^{m+1}) distance evaluations β€” the same as optimal ML but organized to exploit the trellis structure.