Trellis Structure and the Ungerboeck Design Rules

The Trellis as the Design Blueprint

The convolutional code inside a TCM scheme is conveniently pictured as a trellis: a directed graph whose nodes are the $N_s = 2^\nu$ encoder states and whose edges are labelled by the (coset index, uncoded bits) pair of each possible transition. One "section" of the trellis corresponds to one channel use; the complete code is obtained by concatenating sections.

Ungerboeck's 1982 paper spends less time on the trellis itself than on the rules for labelling its edges so that the free Euclidean distance is maximized. Those three rules — now known as the Ungerboeck design rules — turn out to be deceptively simple. They are the single most important engineering output of the paper, and they still drive TCM table constructions today.

In this section we make the trellis structure explicit, state the three rules, and explain precisely why each one is needed — each rule eliminates one type of near-minimum-distance error event that would otherwise collapse $d_{\rm free}^2$ back to its Hamming-distance ceiling.

Definition:
Trellis (of a Convolutional Code)

The trellis of a convolutional encoder with $\nu$ memory cells is the time-indexed directed graph whose nodes at time $k$ are the $N_s = 2^\nu$ possible encoder states $s_k \in \{0, 1\}^\nu$ , and whose edges from $s_k$ to $s_{k+1}$ correspond to each admissible state transition under one input symbol. Each edge is labelled with the encoder output for that transition.

One section of the trellis shows edges from time $k$ to time $k+1$ . A path is a sequence of edges consistent with the state-transition rule; an error event is a pair of paths that share a starting state, diverge at some time, and first remerge to a common state at a later time.

A binary code of rate $\tilde{m}/(\tilde{m}+1)$ has $2^{\tilde{m}}$ edges leaving each state (one per input pattern) and $2^{\tilde{m}}$ edges arriving at each state. For TCM each edge's label is converted via the Ungerboeck partition into a constellation point (plus uncoded-bit choices producing parallel transitions).

Definition:
Ungerboeck TCM Code

An Ungerboeck TCM code is the pair $(\mathcal{C}, \mathcal{P})$ where

$\mathcal{C}$ is a rate- $\tilde{m}/(\tilde{m}+1)$ convolutional code (specified by generator polynomials or a state-transition table), and
$\mathcal{P}$ is an Ungerboeck partition of a $2^{m+1}$ -point constellation.

The encoder:

Reads $m$ information bits $u_{m-1} \cdots u_0$ per symbol.
Feeds the lowest $\tilde{m}$ bits $u_{\tilde{m}-1} \cdots u_0$ into $\mathcal{C}$ ; the output is $\tilde{m}+1$ bits labelling a level- $(\tilde{m}+1)$ coset of $\mathcal{P}$ .
Uses the remaining $m - \tilde{m}$ uncoded bits to select a point inside the coset.

The tables in Ungerboeck 1982 (Table I for 8-PSK, Table II for 16-QAM, Table III for 16-PSK) list the optimal $(\mathcal{C}, \mathcal{P})$ pairs for each state count $N_s \in \{4, 8, 16, 32, 64, 128, 256\}$ , together with the achieved $d_{\rm free}^2$ and the asymptotic coding gain over the uncoded baseline.

Definition:
Euclidean Distance Profile

The Euclidean distance profile of a TCM code is the enumeration of error events by their squared Euclidean distance:

$\mathcal{D}^2 = \{d_1^2 < d_2^2 < d_3^2 < \cdots\}$

with associated multiplicities $\{N_1, N_2, \ldots\}$ . The smallest entry $d_1^2 = d_{\rm free}^2$ dominates the high-SNR BER via the union bound; the next few entries matter at moderate SNR where the "knee" of the BER curve sits.

In practice, Ungerboeck only reports $(d_{\rm free}^2, N_{\rm free})$ in his tables. Refined analysis (Biglieri 2005, Ch. 10) computes the full distance spectrum via the transfer-function bound, which treats the trellis as a signal flow graph with distance-weighted edges.

Error Event

A pair of trellis paths that share a common state, diverge at some time, and first remerge to a common state at a later time. Error events are the atomic units of error analysis: the union bound on BER is a sum over all error events weighted by their squared Euclidean distance.

Ungerboeck Design Rules

Three heuristic rules, introduced in Ungerboeck (1982), for assigning signal points to trellis branches so that the free Euclidean distance is maximized: (R1) Parallel transitions are labelled with points of the deepest non-trivial partition level; (R2) Transitions originating from or merging into any single state are labelled with points of the next-deepest level; (R3) All signal points are used with equal frequency (uniform input).

Theorem: Ungerboeck Design Rules

To maximize the free Euclidean distance $d_{\rm free}^2$ of a TCM code built from an Ungerboeck partition of a $2^{m+1}$ -point constellation and a rate- $\tilde{m}/(\tilde{m}+1)$ convolutional code, assign signal points to trellis transitions so that the following three rules are satisfied.

(R1) Parallel transitions (transitions between the same pair of states) carry signal points from the same level- $(\tilde{m}+1)$ coset — i.e., a maximally-separated antipodal pair (or the equivalent for larger $m - \tilde{m}$ ).

(R2) Transitions originating from a single state, or merging into a single state, carry signal points from the same level- $\tilde{m}$ coset — i.e., all such edges are labelled with points whose pairwise intra-subset distance is at least $\Delta_{\tilde{m}}$ .

(R3) Uniform input. All signal points are used with equal frequency when the input bits are uniform i.i.d.; equivalently, the mapping is a "regular" labeling (a group homomorphism for the constellation's symmetry group).

Any TCM code satisfying (R1)–(R3) achieves $d_{\rm free}^2$ at least as large as the bound in TLower Bound on Free Euclidean Distance via Partition Levels.

Each rule eliminates a particular class of "weak" error event.

(R1) Without it, a parallel-transition error could use two points from different level- $(\tilde{m}+1)$ cosets (if the trellis allowed it), which would cost only $\Delta_{\tilde{m}}^2 < \Delta_{\tilde{m}+1}^2$ . (R1) forces parallel transitions to cost at least $\Delta_{\tilde{m}+1}^2$ .

(R2) Without it, a length-2 error event (diverge, remerge after one step) could use two pairs of points each with small inter-point distance, giving a length-2 contribution smaller than $2 \Delta_{\tilde{m}}^2$ . (R2) forces this to cost at least $2 \Delta_{\tilde{m}}^2$ .

(R3) Without it, some signal points would be used more frequently than others, producing a non-uniform average signal energy and no longer saturating the AWGN capacity at the chosen $\text{SNR}$ .

Show Hint

For (R1), argue that parallel transitions in the same state pair must share all bits except the uncoded ones, hence their coset indices at level $\tilde{m}+1$ are identical.

For (R2), observe that the outgoing (or incoming) edges of a state form a set of $2^{\tilde{m}}$ points determined by the $\tilde{m}$ input bits; assign these $2^{\tilde{m}}$ points to the same level- $\tilde{m}$ coset so that pairwise distances are at least $\Delta_{\tilde{m}}$ .

For (R3), observe that a regular labeling corresponds to a group homomorphism from $\{0,1\}^{m+1}$ to the constellation's symmetry group.

Proof

Rule (R1) — parallel transitions

Parallel transitions between states $s$ and $s'$ exist only when $\tilde{m} < m$ : the $m - \tilde{m}$ uncoded bits produce $2^{m-\tilde{m}}$ edges all between $s$ and $s'$ (the convolutional-code output is the same for all of them). These parallel edges carry distinct signal points that must all lie in the same level- $(\tilde{m}+1)$ coset — because the first $\tilde{m}+1$ bits of the constellation label (which identify the coset) equal the convolutional-code output, which is common to all parallel edges. Hence the minimum Euclidean distance among parallel edges is $\Delta_{\tilde{m}+1}$ .

A length-1 error event traverses exactly one parallel edge of the "correct" path and one parallel edge of the "competing" path; its contribution to $d_{\rm free}^2$ is at least $\Delta_{\tilde{m}+1}^2$ .

Rule (R2) — transitions from/into a single state

Consider the $2^{\tilde{m}}$ outgoing edges from state $s$ . Their labels differ in the convolutional-code output bits only, so they span $2^{\tilde{m}}$ distinct level- $(\tilde{m}+1)$ cosets. If we arrange these to all lie in the same level- $\tilde{m}$ coset (i.e., they share the first $\tilde{m}$ coset-selection bits, which are a function of the state $s$ ), then the pairwise Euclidean distance between any two outgoing edges is at least $\Delta_{\tilde{m}}$ .

A length-2 error event that diverges from $s$ , traverses one edge, then remerges, must use two edges out of $s$ whose Euclidean distance is $\geq \Delta_{\tilde{m}}$ ; together with the second "mirror" pair at the remerge state (also $\geq \Delta_{\tilde{m}}$ ), the length-2 contribution is at least $2 \Delta_{\tilde{m}}^2$ . This is larger than $\Delta_{\tilde{m}+1}^2$ (by the partition doubling property $\Delta_{\tilde{m}+1}^2 \leq 2 \Delta_{\tilde{m}}^2$ ), so length-2 events do not break the parallel-transition bound.

Rule (R3) — uniform usage

If the encoder input is an i.i.d. uniform binary stream, all $2^{m+1}$ constellation points are used equally often if and only if the labeling is a group-theoretic "regular" map — i.e., the pre-image of every point under the input-to-point map has cardinality $2^{m+1 - (m+1)} = 1$ (each 3-bit label is attained exactly once per encoder state cycle). This is automatically satisfied by any non-degenerate rate- $\tilde{m}/(\tilde{m}+1)$ convolutional code followed by $m - \tilde{m}$ uncoded bits.

Without (R3), the constellation would be used with a non-uniform distribution, shifting the average energy and the effective $\text{SNR}$ reference used for the coding-gain calculation. (R3) therefore ensures the coding-gain figure is apples-to-apples with the uncoded baseline.

Combining the rules

Together, (R1) forces $d_{\rm free}^2$ for length-1 events to be $\geq \Delta_{\tilde{m}+1}^2$ , (R2) forces length- $\geq 2$ events to be $\geq 2 \Delta_{\tilde{m}}^2 \geq \Delta_{\tilde{m}+1}^2$ , and (R3) aligns the energy reference. Hence any TCM satisfying (R1)–(R3) has $d_{\rm free}^2 \geq \Delta_{\tilde{m}+1}^2$ , which matches the parallel-transition term of TLower Bound on Free Euclidean Distance via Partition Levels. When the convolutional code also achieves $d_{\rm free}^{(H)} \Delta_1^2 \geq \Delta_{\tilde{m}+1}^2$ , both terms of the lower bound are saturated and $d_{\rm free}^2 = \Delta_{\tilde{m}+1}^2$ exactly. $\blacksquare$

, ,

One Section of the Canonical 4-State 8-PSK Ungerboeck Trellis

One time-slice of Ungerboeck's 4-state 8-PSK TCM trellis. The four states on the left connect to the four states on the right via labelled edges: each edge displays the constellation point transmitted on that transition. Toggle parallel transitions to see how the $\tilde{m} = 1$ coding bit leaves $m - \tilde{m} = 1$ uncoded bit, producing pairs of parallel edges that carry antipodal 8-PSK points (distance $\Delta_2^2 = 4$ ). The "nice" structure visible here — transitions into each state form a rotated-QPSK set — is exactly rule (R2) in action.

Parameters

Show parallel transitions

Example: Verifying the Design Rules on the 4-State 8-PSK TCM

For the canonical 4-state 8-PSK TCM of EFree Euclidean Distance of the 4-State 8-PSK TCM (rate- $1/2$ convolutional code, $\tilde{m} = 1$ , $m = 2$ , $\nu = 2$ , $N_s = 4$ ), verify the three Ungerboeck design rules.

Solution

Rule (R1) — parallel transitions

Each of the 4 states has 2 outgoing edges labelled by the convolutional-code outputs, and because $\tilde{m} = 1$ there is $m - \tilde{m} = 1$ uncoded bit, producing pairs of parallel edges on each original edge. Each pair carries the two points of a single level-2 coset (antipodal pair on the 8-PSK circle), so the distance between parallel edges is $\Delta_2 = 2$ ( $\Delta_2^2 = 4$ ). Rule (R1) is satisfied.

Rule (R2) — outgoing-edge sets

The 2 outgoing edges from any state (ignoring parallel transitions) are labelled with 2 points drawn from two different level-2 cosets that together form a single level-1 coset (rotated QPSK). The minimum Euclidean distance among these two labels is $\Delta_1 = \sqrt{2}$ ( $\Delta_1^2 = 2$ ). The same holds for incoming edges by symmetry. Rule (R2) is satisfied.

Rule (R3) — uniform usage

The rate- $1/2$ convolutional code is not catastrophic and has a uniform distribution over output pairs, so the 8-PSK labels are hit with equal long-run frequency. Rule (R3) is satisfied.

Consequence

All three rules hold, so Ungerboeck's Table I tells us the code achieves $d_{\rm free}^2 = 4 = \Delta_2^2$ — the parallel-transition bound — with error coefficient $N_{\rm free} = 4$ (each of the 4 states has an error event at distance $4$ ).

The asymptotic coding gain over uncoded QPSK ( $d_{\rm uncoded}^2 = 2$ ) is $\gamma_c = 4/2 = 2 \;\;\longleftrightarrow\;\; 3.0 \text{ dB}.$ This is the simplest nontrivial TCM code; every 8-PSK TCM in Ungerboeck's table improves on this by going to more states, raising $d_{\rm free}^{(H)}$ until parallel transitions are no longer the bottleneck.

Historical Note: Viterbi and Forney: From Algorithm to Trellis Concept

1967–1988

Andrew Viterbi introduced what is now called the Viterbi algorithm in a 1967 IEEE Transactions on Information Theory paper as a maximum-likelihood decoder for convolutional codes; the 1971 paper "Convolutional codes and their performance in communication systems" (IEEE Trans. Commun. Tech.) popularized the bit-error analysis via error events. At the time, neither the algorithm nor its complexity were widely appreciated outside a small coding-theory circle.

G. David Forney Jr.'s 1973 Proceedings of the IEEE paper "The Viterbi algorithm" reframed the decoder as a shortest-path search on a trellis — the graph-theoretic picture we take for granted today. This reframing turned a specialized coding technique into a general tool: once the trellis picture was available, it became clear that any finite-state code, including Ungerboeck's TCM nine years later, could be decoded by the same dynamic-programming recursion.

The chain Viterbi $\to$ Forney $\to$ Ungerboeck is the clearest arc in the history of coded modulation. Forney's lattice-theoretic reinterpretation of TCM in the 1988 paper "Coset codes I/II" (cited in our Chapter 4) closed the loop by showing that TCM is a special case of lattice coset coding.

⚠️Engineering Note

Trellis Depth and Decoder Latency

A Viterbi decoder for a TCM code must store the survivor-path history deep enough that the traceback is reliable: the rule of thumb is 5 $\nu$ to 7 $\nu$ trellis sections, where $\nu$ is the convolutional encoder memory. At 28800 symbols/s (V.34 modem baud rate), a 64-state TCM ( $\nu = 6$ ) needs survivor paths of length 30–40 symbols, giving roughly 1 ms of decoding latency. This was manageable in 1994 with dedicated DSP silicon; scaling to gigasymbol/s modern systems with $\nu = 8$ or more requires pipelined architectures and is why early 5G NR adopted LDPC over trellis codes (LDPC decoders parallelize more naturally than Viterbi).

Practical Constraints

•
Survivor memory scales linearly in $N_s$ and linearly in traceback depth
•
Branch-metric computation scales linearly in $N_s$ and exponentially in $\tilde{m}$
•
Traceback latency is the dominant delay at multi-gigasymbol rates

📋 Ref: ITU-T V.34 Annex C (Viterbi traceback guidelines)

Common Mistake: Catastrophic Convolutional Codes in TCM

Mistake:

Choosing a convolutional code with $(g_0, g_1)$ that produces a catastrophic encoder — one where a finite number of input errors can cause an infinite number of output errors.

Correction:

A catastrophic code has an error event of infinite length with finite Euclidean distance, which completely breaks the free-distance analysis. For binary rate- $1/2$ codes, the encoder is catastrophic iff $\gcd(g_0, g_1) \neq 1$ (as polynomials in $\mathbb{F}_2[D]$ ).

Always check $\gcd$ before adopting a generator pair from a table. Ungerboeck's 1982 tables are all non-catastrophic; but if you modify a generator polynomial to simplify hardware, rerun the $\gcd$ test.

Quick Check

Which of the three Ungerboeck design rules governs the labelling of parallel transitions (multiple edges between the same pair of states)?

Rule (R1): parallel transitions carry points of the deepest non-trivial partition level.

Rule (R2): transitions from or into a single state carry same-coset points at the next level.

Rule (R3): uniform input distribution.

None of the above — parallel transitions are not addressed by Ungerboeck's rules.

Correction:

Rule (R1): parallel transitions carry points of the deepest non-trivial partition level.

Rule (R1) handles parallel transitions. Without it, parallel edges might carry points from different deep-level cosets and would fail to achieve the $\Delta_{\tilde{m}+1}$ minimum distance. Rules (R2) and (R3) address non-parallel outgoing/incoming edges and uniform input distribution, respectively.

Quick Check

In Ungerboeck's Table I (1982), the asymptotic coding gain of 8-PSK TCM over uncoded QPSK as a function of trellis states $N_s$ is approximately: 4 states → 3 dB; 8 → 3.6 dB; 16 → 4.1 dB; 32 → 4.6 dB; 64 → 4.8 dB; 128 → 5.0 dB; 256 → 5.4 dB. What is the ultimate ceiling, and why can it not be exceeded by adding states to this family?

$\infty$ — adding states always helps.

$10\log_{10}(\Delta_3^2 / d_{\rm uncoded}^2) = 10\log_{10}(4/2) = 3$ dB.

About $6$ dB — beyond that, the partition geometry (parallel-transition bound) saturates and no more gain is possible from coding bit $\tilde{m}=1$ alone.

Exactly $\gamma_c = 10\log_{10}(\pi e/6) \approx 1.53$ dB.

Correction:

About

6

dB — beyond that, the partition geometry (parallel-transition bound) saturates and no more gain is possible from coding bit

\tilde{m}=1

alone.

The parallel-transition bound gives $d_{\rm free}^2 \leq \Delta_2^2 = 4$ as long as $\tilde{m} = 1$ and a $2^{\tilde{m}+1} = 4$ -way split is used. Adding states only helps until the Hamming distance term $d_{\rm free}^{(H)} \Delta_1^2$ exceeds $\Delta_2^2 = 4$ ; beyond that, the parallel-transition term is the ceiling. The number $6$ dB roughly corresponds to the asymptotic coding gain of a higher $\tilde{m}$ or a deeper partition level — 8-PSK simply does not have more levels to give.

Key Takeaway

Three rules, one optimal code family. Ungerboeck's design rules — (R1) parallel transitions carry deepest-coset points, (R2) outgoing and incoming edges of each state carry next-deepest-coset points, (R3) uniform input — are individually necessary and jointly sufficient to saturate the free-distance lower bound. Every entry in Ungerboeck's 1982 Tables I–III was produced by a computer search that enforced (R1)–(R3) and then maximized $d_{\rm free}^2$ over the remaining freedom. The rules are the bridge between the combinatorial design of the convolutional code and the geometric design of the partition.

Mapping by Set Partitioning and Distance Maximization Viterbi Decoding for TCM

Trellis Structure and the Ungerboeck Design Rules

The Trellis as the Design Blueprint

Definition: Trellis (of a Convolutional Code)

Definition: Ungerboeck TCM Code

Definition: Euclidean Distance Profile

Error Event

Ungerboeck Design Rules

Theorem: Ungerboeck Design Rules

Rule (R1) — parallel transitions

Rule (R2) — transitions from/into a single state

Rule (R3) — uniform usage

Combining the rules

One Section of the Canonical 4-State 8-PSK Ungerboeck Trellis

Parameters

Example: Verifying the Design Rules on the 4-State 8-PSK TCM

Rule (R1) — parallel transitions

Rule (R2) — outgoing-edge sets

Rule (R3) — uniform usage

Consequence

Historical Note: Viterbi and Forney: From Algorithm to Trellis Concept

Trellis Depth and Decoder Latency

Common Mistake: Catastrophic Convolutional Codes in TCM

Quick Check

Quick Check

Key Takeaway

Definition:
Trellis (of a Convolutional Code)

Definition:
Ungerboeck TCM Code

Definition:
Euclidean Distance Profile