Ferkans — Interactive Telecom Tutor

Why Viterbi is the Right Decoder for TCM

The encoder in a TCM scheme is a finite-state machine: the next state depends on the current state and the input bits. Under AWGN, the maximum-likelihood sequence detector minimizes the total squared Euclidean distance between the received-sample sequence and all admissible transmitted sequences. The Viterbi algorithm does exactly this: it performs dynamic programming on the trellis, pruning all paths that cannot be optimal at each time step, and returns the surviving best path at traceback.

TCM introduces exactly one twist compared to classical binary Viterbi decoding: the parallel transitions (multiple edges between the same pair of states, produced by the $m - \tilde{m}$ uncoded bits) must be collapsed by a subset decoder into a single "best" edge per coset per time step, before the Viterbi recursion runs on the state-pair trellis. This preprocessing step shrinks the effective branch count from $2^m$ per state to $2^{\tilde{m}}$ per state, which is what makes TCM decoding practical.

Everything else — path-metric update, add-compare-select (ACS), survivor memory, traceback — is identical to binary Viterbi. This section states the algorithm precisely, works one section of the 4-state 8-PSK trellis by hand, and discusses the complexity scaling.

Definition:
Branch Metric (TCM)

For a received sample $y_k \in \mathbb{C}$ at time $k$ and a trellis edge at time $k$ labelled by coset $D$ , the branch metric (after subset decoding) is

$\lambda_k(D) \;\triangleq\; \min_{x \in D} \|y_k - x\|^2.$

The $x \in D$ achieving the minimum is the subset-decoded point of $D$ at time $k$ and is recorded as the best in-coset decision for that edge; it will be used to recover the uncoded bits at traceback.

For level- $(\tilde{m}+1)$ cosets, $|D| = 2^{m - \tilde{m}}$ candidates must be searched — typically 1, 2, or 4 points. This is a small computation per time step per coset.

,

Definition:
Path Metric and Survivor

The path metric of a trellis path $\pi$ ending at state $s$ at time $k$ is the sum of branch metrics along $\pi$ :

$\Lambda_k(s, \pi) \;=\; \sum_{\ell = 0}^{k-1} \lambda_\ell(D_{\pi, \ell}),$

where $D_{\pi, \ell}$ is the coset labelling the $\ell$ -th edge of $\pi$ . The survivor at state $s$ at time $k$ is the path arriving at $s$ with the smallest path metric among all paths ending at $s$ :

$\pi^*_k(s) \;=\; \operatorname*{arg\,min}_{\pi : \pi \text{ ends at } s \text{ at time } k} \Lambda_k(s, \pi).$

By Bellman's optimality principle, once a path is eliminated at state $s$ at time $k$ , it cannot be part of the ML sequence through any later state, so it can be discarded.

,

Definition:
Traceback (Survivor Decoding)

At some time $k_\text{end}$ , the decoder selects the state $s^*$ with the smallest path metric $\Lambda_{k_\text{end}}(s^*, \pi^*)$ and reconstructs the entire survivor path by walking backward through the stored survivor pointers. The coset indices along this path, together with the subset-decoded points at each edge, yield the maximum-likelihood sequence of transmitted symbols.

In practice, the decoder does not wait until $k_\text{end}$ but performs rolling traceback after each $D$ trellis sections (the traceback depth), declaring a decision at time $k - D$ at every time $k$ .

Recommended traceback depth: $D \approx 5\nu$ to $7\nu$ for convolutional memory $\nu$ , so that the probability of incorrect traceback from a wrong final-state choice is negligible (ETrellis Depth and Decoder Latency).

,

Branch Metric

The cost of a single trellis edge, used by the Viterbi algorithm. For TCM, it is the minimum squared Euclidean distance between the received sample and any point in the coset labelling that edge.

Path Metric

The sum of branch metrics along a trellis path. The Viterbi algorithm maintains, for each state at each time, the minimum path metric among all paths arriving at that state (the survivor path metric).

Related: Path Metric and Survivor

Survivor Path

At each state and time, the trellis path arriving at that state with the smallest path metric. By Bellman's principle, all other paths arriving at that state can be discarded.

Viterbi Algorithm for TCM (with Subset Decoding)

Complexity:

\mathcal{O}(N_s \cdot 2^{\tilde{m}} \cdot T)

time;

\mathcal{O}(N_s \cdot D)

memory for traceback depth

D

.

Input: Received samples

(y_0, y_1, \ldots, y_{T-1})

; trellis with

N_s

states; partition cosets

\{D_i^{(j)} : j = 0, \ldots, 2^{\tilde{m}+1} - 1\}

at level

\tilde{m}+1

; traceback depth

D

.

Output: Decoded symbol sequence

(\hat{x}_0, \ldots, \hat{x}_{T-1})

.

1. Initialization. Set

\Lambda_0(s_0) \leftarrow 0

for the known

start state

s_0 = 0

, and

\Lambda_0(s) \leftarrow +\infty

for

all other

s

. Survivor pointers empty.

2. For

k = 0, 1, \ldots, T - 1

do

3.

\quad

Subset decoding. For each coset

D_j

,

j = 0, \ldots, 2^{\tilde{m}+1} - 1

:

4.

\qquad \hat{x}_k(D_j) \leftarrow \operatorname{arg\,min}_{x \in D_j} \|y_k - x\|^2

;

\lambda_k(D_j) \leftarrow \|y_k - \hat{x}_k(D_j)\|^2

.

5.

\quad

Add-Compare-Select (ACS). For each state

s'

at time

k+1

:

6.

\qquad

For each predecessor state

s

with valid transition

s \to s'

labelled by coset

D_{j(s, s')}

:

7.

\qquad\quad

Compute candidate metric

\Lambda_k(s) + \lambda_k(D_{j(s, s')})

.

8.

\qquad

\Lambda_{k+1}(s') \leftarrow

smallest such candidate; record the corresponding

s

and

\hat{x}_k(D_{j(s, s')})

as survivor pointer for

(s', k+1)

.

9.

\quad

Rolling traceback. If

k \geq D

, walk backward

D

steps from the best state at time

k+1

and output the symbol at time

k + 1 - D

.

10. end for

11. Final traceback. From the best final state, walk back to the

initial state and output any remaining symbols.

The subset-decoding step is embarrassingly parallel (one coset at a time, one sample at a time). The ACS step is the bottleneck: $N_s$ states times $2^{\tilde{m}}$ predecessors per state times two adds and one compare each = $\mathcal{O}(N_s \cdot 2^{\tilde{m}})$ ops per symbol. This is why TCM uses small $\tilde{m}$ (typically 1 or 2) — the complexity grows as $2^{\tilde{m}}$ .

Viterbi Decoding on the 4-State 8-PSK TCM Trellis

Animated step-by-step Viterbi decoding over several sections of the canonical 4-state 8-PSK TCM trellis. Watch survivor paths accumulate and non-survivors get pruned at each ACS step, with path metrics

\Lambda_k(s)

shown numerically at each state. The algorithm's left-to-right sweep + right-to-left traceback is the structure to remember.

Viterbi at work: survivor paths (bold) and pruned paths (dim) over 6 trellis sections. Each state keeps exactly one survivor per time step —

N_s

survivors total, independent of how many paths have been considered.

Example: One ACS Step of Viterbi on the 4-State 8-PSK TCM

Consider the 4-state 8-PSK TCM trellis from EVerifying the Design Rules on the 4-State 8-PSK TCM. Suppose at time $k$ the path metrics at the four states are $\Lambda_k = (0.0,\; 0.5,\; 1.2,\; 0.8)$ and the received sample is $y_k = 0.9 e^{j\pi/8}$ (between 8-PSK points labelled $0$ and $1$ , slightly closer to $0$ ). For simplicity take the normalized 8-PSK $\mathcal{X} = \{e^{j 2\pi \ell/8} : \ell = 0, \ldots, 7\}$ . The trellis has the standard Ungerboeck structure: each state has 2 outgoing edges; each edge has 2 parallel transitions, carrying antipodal 8-PSK points at level-2 cosets labelled by the coset bit pair (high bit, low bit).

Perform one ACS step: subset-decode the four level-2 cosets, then compute the new path metrics $\Lambda_{k+1}(s')$ for $s' = 0, 1, 2, 3$ .

Solution

Identify the four level-2 cosets

With Ungerboeck's labeling, the four level-2 cosets of 8-PSK are antipodal pairs:

$D_0 = \{e^{j 0}, e^{j\pi}\} = \{+1, -1\}$
$D_1 = \{e^{j\pi/4}, e^{j 5\pi/4}\}$
$D_2 = \{e^{j\pi/2}, e^{j 3\pi/2}\} = \{+j, -j\}$
$D_3 = \{e^{j 3\pi/4}, e^{j 7\pi/4}\}$

Each coset has $\Delta_3^2 = 4$ separation (antipodal).

Subset-decode each coset

For $y_k = 0.9 e^{j\pi/8} \approx 0.832 + 0.344 j$ :

$D_0$ : nearest is $+1 = 1 + 0j$ ; $\|y - 1\|^2 \approx (0.832 - 1)^2 + 0.344^2 = 0.028 + 0.118 = 0.146$ .
$D_1$ : nearest is $e^{j\pi/4} \approx 0.707 + 0.707 j$ ; $\|y - e^{j\pi/4}\|^2 \approx 0.016 + 0.132 = 0.148$ .
$D_2$ : nearest is $+j = 0 + 1j$ ; $\|y - j\|^2 \approx 0.692 + 0.431 = 1.123$ .
$D_3$ : nearest is $e^{j 3\pi/4} \approx -0.707 + 0.707 j$ ; $\|y - e^{j 3\pi/4}\|^2 \approx 2.368 + 0.132 = 2.500$ .

So $\lambda_k = (0.146,\; 0.148,\; 1.123,\; 2.500)$ for the four cosets. (Subset decoding collapsed 8 branch-point distances into just 4 coset metrics — a $2\times$ reduction in ACS input count.)

Apply the trellis transition table

For Ungerboeck's canonical 4-state 8-PSK encoder, one standard assignment (see 📊One Section of the Canonical 4-State 8-PSK Ungerboeck Trellis) gives the transition/coset table (row = current state $s$ , column = input bit $\tilde{u}$ ):

$s\backslash\tilde{u}$	0	1
0	$(s' = 0, D_0)$	$(s' = 2, D_2)$
1	$(s' = 0, D_2)$	$(s' = 2, D_0)$
2	$(s' = 1, D_1)$	$(s' = 3, D_3)$
3	$(s' = 1, D_3)$	$(s' = 3, D_1)$

(Any other assignment that satisfies (R1)–(R3) gives an equivalent trellis up to state relabelling.)

ACS for new state $s' = 0$

State $s' = 0$ has two incoming edges: from $s = 0$ via $D_0$ ( $\lambda = 0.146$ ) and from $s = 1$ via $D_2$ ( $\lambda = 1.123$ ).

Candidate from $s = 0$ : $\Lambda_k(0) + 0.146 = 0.146$ .
Candidate from $s = 1$ : $\Lambda_k(1) + 1.123 = 1.623$ .

Take the minimum: $\Lambda_{k+1}(0) = 0.146$ , with survivor pointer $s = 0$ and decoded coset $D_0$ (subset-decoded point $+1$ ).

ACS for the other three states

Similar ACS computations:

$s' = 1$ : edges from $s = 2$ (via $D_1$ , $\lambda = 0.148$ ) and from $s = 3$ (via $D_3$ , $\lambda = 2.500$ ). Candidates: $1.348$ and $3.300$ . $\Lambda_{k+1}(1) = 1.348$ , survivor $s = 2$ , coset $D_1$ .
$s' = 2$ : edges from $s = 0$ (via $D_2$ , $\lambda = 1.123$ ) and from $s = 1$ (via $D_0$ , $\lambda = 0.146$ ). Candidates: $1.123$ and $0.646$ . $\Lambda_{k+1}(2) = 0.646$ , survivor $s = 1$ , coset $D_0$ .
$s' = 3$ : edges from $s = 2$ (via $D_3$ , $\lambda = 2.500$ ) and from $s = 3$ (via $D_1$ , $\lambda = 0.148$ ). Candidates: $3.700$ and $0.948$ . $\Lambda_{k+1}(3) = 0.948$ , survivor $s = 3$ , coset $D_1$ .

Summary of the step

The updated path metrics are $\Lambda_{k+1} = (0.146,\; 1.348,\; 0.646,\; 0.948)$ . State $0$ is currently the most likely end-state; this will guide the rolling traceback a few steps later. Notice that 4 ACS computations were needed — $N_s \cdot 2^{\tilde{m}} = 4 \cdot 2 = 8$ candidate metrics, halved to 4 survivors. This is the complexity contract of TCM Viterbi.

,

⚠️Engineering Note

Viterbi Complexity vs. Trellis Memory

The Viterbi decoder cost per symbol scales as:

Branch-metric step: $2^{\tilde{m}+1}$ coset metrics, each of which requires a search over $2^{m - \tilde{m}}$ in-coset candidates. Total: $2^{m+1}$ point-distance computations (same as full ML on the constellation).
ACS step: $N_s \cdot 2^{\tilde{m}}$ adds and $N_s$ compares.
Survivor memory: $N_s \cdot D$ bits for traceback depth $D$ .

Doubling $N_s$ (one more memory cell $\nu$ ) doubles ACS cost and survivor memory, and typically gains 0.3–0.5 dB of free-distance coding gain. V.34's 16-state TCM was considered the sweet spot at the time: more states would have exceeded the DSP budget of 1994 modems. Modern 5G NR uses LDPC codes precisely because the per-symbol Viterbi complexity scales poorly as $N_s$ grows into the tens of thousands.

Practical Constraints

•
Per-symbol ACS cost: $N_s \cdot 2^{\tilde{m}}$ operations
•
Branch-metric cost: $2^{m+1}$ distance evaluations
•
Survivor memory: $N_s \cdot D$ bits with $D \approx 5\nu$

📋 Ref: ITU-T V.34 Section 9

Common Mistake: Using Hamming Distance as the Viterbi Branch Metric for TCM

Mistake:

Applying a binary Viterbi decoder directly to the coset-label output of a TCM encoder, using Hamming distance as the branch metric.

Correction:

The correct TCM branch metric is the squared Euclidean distance in the signal space, after subset decoding — not the Hamming distance between received bits and the coset label. Using Hamming distance discards the geometric information (distance-doubling at each partition level) that is the whole point of TCM. The coding gain drops from 3–6 dB to roughly 0 dB — you end up decoding as if the underlying binary code were used directly over BPSK.

Common Mistake: Too-Shallow Traceback Depth

Mistake:

Setting the traceback depth $D$ to be too small (e.g., $D = \nu$ or $D = 2\nu$ ).

Correction:

With $D$ smaller than about $5\nu$ , the probability that the survivor paths at time $k - D$ have not merged to a single common ancestor is non-negligible: different final states disagree on what the symbol at time $k - D$ was. This causes burst errors at the decoder output — not from the channel, but from the decoder itself. Rule of thumb: $D = 5\nu$ for rate- $1/2$ codes, $D = 7\nu$ for higher-rate codes.

Quick Check

The Viterbi decoder of a TCM code with $N_s$ states and $\tilde{m}$ coded bits performs, per symbol:

$N_s$ add-compare-select operations and $2^{\tilde{m}+1}$ branch-metric evaluations.

$N_s \cdot 2^{\tilde{m}}$ add-compare-select operations and $2^{\tilde{m}+1}$ branch-metric (coset) evaluations, each an in-coset minimum over $2^{m-\tilde{m}}$ candidates.

$2^m$ add-compare-select operations and a single branch-metric evaluation.

$N_s^2$ compares and $N_s$ branch-metric evaluations.

Correction:

N_s \cdot 2^{\tilde{m}}

add-compare-select operations and

2^{\tilde{m}+1}

branch-metric (coset) evaluations, each an in-coset minimum over

2^{m-\tilde{m}}

candidates.

Each of $N_s$ target states has $2^{\tilde{m}}$ incoming edges (one per input pattern), giving $N_s \cdot 2^{\tilde{m}}$ ACS operations per symbol. Branch metrics are computed once per coset, so $2^{\tilde{m}+1}$ of them; each involves finding the nearest point among $2^{m - \tilde{m}}$ candidates in that coset. Total distance computations: $2^{m+1}$ , same as full ML.

Quick Check

A 64-state ( $\nu = 6$ ) TCM code operates at $28\,800$ symbols/s. Using the rule-of-thumb traceback depth $D = 5\nu$ , what is the approximate decoder latency?

About $1.0$ ms.

About $30$ ms.

About $100$ $\mu$ s.

About $6$ $\mu$ s.

Correction:

About

1.0

ms.

$D = 5\nu = 30$ symbols of traceback at $28800$ symbols/s gives $30/28800 \approx 1.04$ ms. V.34 modems accepted this latency because echo cancellation and handshake overheads dominated. Modern gigasymbol-rate systems cannot tolerate even $30$ symbols of latency at those rates ( $\sim 30$ ns), which is one reason LDPC has replaced TCM in 5G NR.

Key Takeaway

Two-stage decoding. TCM decoding is binary-code Viterbi wrapped around per-coset subset decoding. First, for every time step and every coset, find the nearest in-coset constellation point and record its squared Euclidean distance to the received sample. Second, run Viterbi over the state-pair trellis using those coset-level branch metrics. The per-symbol cost is $\mathcal{O}(N_s \cdot 2^{\tilde{m}})$ ACS operations plus $\mathcal{O}(2^{m+1})$ distance evaluations — the same as optimal ML but organized to exploit the trellis structure.

Viterbi Decoding for TCM

Why Viterbi is the Right Decoder for TCM

Definition: Branch Metric (TCM)

Definition: Path Metric and Survivor

Definition: Traceback (Survivor Decoding)

Branch Metric

Path Metric

Survivor Path

Viterbi Algorithm for TCM (with Subset Decoding)

Viterbi Decoding on the 4-State 8-PSK TCM Trellis

Example: One ACS Step of Viterbi on the 4-State 8-PSK TCM

Identify the four level-2 cosets

Subset-decode each coset

Apply the trellis transition table

ACS for new state $s' = 0$

ACS for the other three states

Summary of the step

Viterbi Complexity vs. Trellis Memory

Common Mistake: Using Hamming Distance as the Viterbi Branch Metric for TCM

Common Mistake: Too-Shallow Traceback Depth

Quick Check

Quick Check

Key Takeaway

Definition:
Branch Metric (TCM)

Definition:
Path Metric and Survivor

Definition:
Traceback (Survivor Decoding)