Ferkans — Interactive Telecom Tutor

Why Separate Coding and Modulation Leaves Capacity on the Table

The classical textbook view, inherited from Shannon's original constructions, goes like this: take a binary channel code of minimum Hamming distance $d_H$ , then feed its bits into a QAM mapper. The code designer works in GF $(2)$ ; the modulator designer works in $\mathbb{R}^N$ . The two sides are decoupled, and the engineer can shop for each independently in the literature.

Now here is the key idea: this decoupling is a waste. The channel that the binary code "sees" after the QAM mapper is not a memoryless binary channel — it is a higher-order signal-space channel whose error events are Euclidean, not Hamming. A code optimized for Hamming distance does not automatically maximize Euclidean distance of the transmitted signal. This mismatch costs a fixed amount of $E_b/N_0$ , and at rates of 2 bits/2D or higher, the loss is sizeable.

Ungerboeck's 1982 insight — the founding idea of coded modulation — is that the code must be designed directly in signal space: the metric the code tries to maximize is the Euclidean distance of the transmitted vector, not the Hamming distance of the binary label. This chapter closes with a precise statement of why the separated approach is suboptimal, setting up the Ungerboeck construction in Chapter 2 and the BICM framework in Chapter 5.

, ,

Definition:
Separated Coding-and-Modulation vs. Coded Modulation

A separated coding-and-modulation (SC+M) system is one in which:

A binary outer code $\mathcal{C} \subset \{0,1\}^n$ of rate $R_c$ is selected independently of the constellation.
A bit-to-symbol mapping $\mu: \{0,1\}^{\log_2 M} \to \mathcal{X}$ is chosen independently of $\mathcal{C}$ .
The receiver computes bitwise soft information on the binary labels of the received symbols and decodes $\mathcal{C}$ as if the channel were memoryless binary.

A coded modulation (CM) system, by contrast, designs the sequence of transmitted signal vectors directly. The code lives on the signal set $\mathcal{X}$ , not on its binary labels, and the design criterion is the Euclidean distance spectrum of the sequence code, not the Hamming distance spectrum of the underlying binary code.

BICM (Chapter 5) is an interesting hybrid: it uses a binary code, a bit interleaver, and a QAM mapper — formally SC+M — but the Caire-Taricco-Biglieri analysis shows that with Gray labeling BICM comes surprisingly close to CM capacity despite the nominally suboptimal separation. Understanding why BICM works, and when it does not, is the subject of Chapters 5-8.

,

Theorem: SC+M with Independent Labeling Is Strictly Suboptimal

Fix a finite 2D constellation $\mathcal{X}$ with minimum Euclidean distance $d_{\rm E,\min}$ and $M = 2^m$ points. Consider a binary outer code $\mathcal{C}$ of rate $R_c = k/n$ and minimum Hamming distance $d_H$ , concatenated with a fixed bit-to-symbol mapping $\mu$ that is independent of $\mathcal{C}$ . Then the effective minimum Euclidean distance of the SC+M system satisfies

$d_{\rm E,\min}^2(\mathcal{C}, \mu) \;\le\; d_H \cdot d_{\rm E,\min}^2(\mathcal{X}, \mu)_{1\text{-bit}},$

where $d_{\rm E,\min}^2(\mathcal{X}, \mu)_{1\text{-bit}}$ is the minimum squared Euclidean distance between two constellation points whose labels differ in exactly one bit. For any mapping with $d_{\rm E,\min}^2(\mathcal{X}, \mu)_{1\text{-bit}} < d_{\rm E,\min}^2(\mathcal{X})$ — which holds for every non-trivial mapping when $M > 2$ — the bound is strictly loose: there exist binary codes and constellation point assignments that attain a strictly larger effective $d_{\rm E, \min}^2$ .

The binary code can guarantee that any two codewords differ in at least $d_H$ positions. In each differing position, the mapped symbol is displaced by at least $d_{\rm E,\min}^2(\mathcal{X}, \mu)_{1\text{-bit}}$ in Euclidean distance — which is the distance between two symbols whose labels differ in that one bit position. The product $d_H \cdot d_{\rm E,\min}^2(\mathcal{X}, \mu)_{1\text{-bit}}$ is therefore a lower bound on the total Euclidean displacement. For Gray labeling, the one-bit neighbor distance is $d_{\rm E,\min}^2(\mathcal{X})$ itself — not much to save. But other labelings have $d_{\rm E,\min}^2(\mathcal{X}, \mu)_{1\text{-bit}} \gg d_{\rm E,\min}^2(\mathcal{X})$ , at which point the binary-code design can no longer chase the full Euclidean distance directly, and a matched "joint" construction dominates.

Show Hint

Pick two codewords at Hamming distance $d_H$ ; count the differing bit positions and map each to a symbol pair.

Show that the total squared Euclidean displacement is at least $d_H$ times the minimum over all one-bit neighbor pairs in the constellation.

Argue that for $M > 2$ , some labelings have one-bit-neighbor distance much smaller than $d_{\rm E, \min}^2(\mathcal{X})$ (so this is 'Gray-like') while others have it much larger (so this is 'set-partition-like').

Proof

Lower-bound the Euclidean distance in each differing coordinate

Let $\mathbf{c}, \hat{\mathbf{c}} \in \mathcal{C}$ be two codewords at Hamming distance $d_H$ . Partition the coordinates into $n/m$ symbol blocks of $m = \log_2 M$ bits each. In each block in which $\mathbf{c}$ and $\hat{\mathbf{c}}$ differ in some number $\ell > 0$ of bits, the mapped symbols $\mu(\mathbf{c}_{\rm block})$ and $\mu(\hat{\mathbf{c}}_{\rm block})$ are at Euclidean distance at least $d_{\rm E,\min}^2(\mathcal{X}, \mu)_{\ell\text{-bit}}$ , i.e., the minimum over all pairs of constellation points whose labels differ in exactly $\ell$ bits.

Sum over differing blocks

Denote by $B$ the set of blocks where $\mathbf{c}$ and $\hat{\mathbf{c}}$ differ. For $b \in B$ , let $\ell_b$ be the number of differing bits in block $b$ . Then

$\|\mu(\mathbf{c}) - \mu(\hat{\mathbf{c}})\|^2 \;=\; \sum_{b \in B} \| \mu(\mathbf{c}_b) - \mu(\hat{\mathbf{c}}_b)\|^2 \;\ge\; \sum_{b \in B} d_{\rm E,\min}^2(\mathcal{X}, \mu)_{\ell_b\text{-bit}}.$

A particularly clean bound arises when each block has exactly one differing bit — we can always choose two codewords at Hamming distance exactly $d_H$ spread over $d_H$ distinct blocks:

$\|\mu(\mathbf{c}) - \mu(\hat{\mathbf{c}})\|^2 \;\ge\; d_H \cdot d_{\rm E,\min}^2(\mathcal{X}, \mu)_{1\text{-bit}}.$

Strict suboptimality

Taking the minimum over $(\mathbf{c}, \hat{\mathbf{c}})$ at Hamming distance $d_H$ gives

$d_{\rm E,\min}^2(\mathcal{C}, \mu) \;\le\; d_H \cdot d_{\rm E,\min}^2(\mathcal{X}, \mu)_{1\text{-bit}}.$

For any labeling $\mu$ of $\mathcal{X}$ with $M > 2$ , we have $d_{\rm E,\min}^2(\mathcal{X}, \mu)_{1\text{-bit}} < d_{\rm E,\min}^2(\mathcal{X}) \cdot m$ in the generic case (Gray labeling makes one-bit neighbors the closest pairs; set-partition labeling pushes them far apart). Neither choice of $\mu$ is matched to the code $\mathcal{C}$ unless the two are designed together — which is exactly the coded modulation program. $\blacksquare$

,

Example: Gray vs. Set-Partition Labeling on 16-QAM

Consider 16-QAM of energy $E_s$ . Compute the one-bit-neighbor minimum Euclidean distance $d_{\rm E,\min}^2(\mathcal{X}, \mu)_{1\text{-bit}}$ for (a) a Gray labeling and (b) a set-partition (Ungerboeck) labeling. Discuss the implication for the SC+M upper bound.

Solution

Geometry of 16-QAM

Normalize so $d_{\rm E,\min}^2(\mathcal{X}) = 4$ for points on a standard $\{-3, -1, 1, 3\}^2$ grid scaled so $E_s = 10$ (since $\overline{|x|^2 + |y|^2} = 2 \cdot \tfrac{1+9}{2} \cdot 2 = 20$ before scaling, and we want $E_s = 10$ , so scale the grid by $1/\sqrt{2}$ to get $d_{\rm E,\min}^2 = 2$ and $E_s = 10/2 = 5$ ). Let us keep the unnormalized geometry for simplicity: neighbors in the grid differ by $2$ in one coordinate, giving $d_{\rm E,\min}^2(\mathcal{X}) = 4$ at raw energy.

(a) Gray labeling

In a Gray labeling, adjacent points in the grid differ in exactly one label bit. Therefore the minimum distance among one-bit neighbors equals the minimum distance of the constellation itself:

$d_{\rm E,\min}^2(\mathcal{X}, \mu_{\rm Gray})_{1\text{-bit}} \;=\; 4 \;=\; d_{\rm E,\min}^2(\mathcal{X}).$

The SC+M bound becomes $d_{\rm E,\min}^2(\mathcal{C}, \mu_{\rm Gray}) \le 4 d_H$ . A rate- $1/2$ code with $d_H = 5$ gives at most $20$ , i.e., a $5\times$ improvement on $d_{\rm E,\min}^2$ over uncoded.

(b) Set-partition (Ungerboeck) labeling

The Ungerboeck labeling partitions 16-QAM into successively farther-apart subsets. The bit that picks the coarsest partition corresponds to a difference of two "super-squares" whose minimum intra-pair distance is $4$ times larger, i.e., $d_{\rm E,\min}^2_{1\text{-bit, label bit 0}} = 16$ . The bit that picks the intermediate partition corresponds to intra-superset distances of $8$ , and the two least-significant bits ride on the standard grid with distance $4$ . Overall,

$d_{\rm E,\min}^2(\mathcal{X}, \mu_{\rm SP})_{1\text{-bit}} \;=\; 4 \;=\; d_{\rm E,\min}^2(\mathcal{X})$

(the smallest of $16, 8, 4, 4$ ), but the distance spectrum across the four labeled bits is very non-uniform.

Implication

The SC+M bound for Gray and for set-partition has the same one-bit-neighbor value ( $=4$ in this normalization), so the simple $d_H \cdot d_{\rm E,\min}^2_{1\text{-bit}}$ bound does not yet distinguish them. The difference shows up in the full distance spectrum: with set-partition labeling, protecting the most significant bit with a strong code amplifies its high 1-bit-neighbor distance ( $16$ ), so a joint code-label design with the outer code protecting label-bit 0 strongly yields $d_{\rm E,\min}^2 \approx d_H \cdot 16 \gg 4 d_H$ . This is the Ungerboeck coding gain, and it is unreachable if the code and labeling are designed independently.

Separated Coding+Modulation vs. Coded Modulation

Aspect	Separated Coding + Modulation (SC+M)	Coded Modulation (CM)
Design variable	Binary code $\mathcal{C}$ ; mapper $\mu$ chosen independently	Sequence code on $\mathcal{X}^n$ ; all components chosen jointly
Minimum-distance criterion	Hamming $d_H$ (of $\mathcal{C}$ )	Euclidean $d_{\rm E, \min}$ (of the transmitted sequence)
Bandwidth expansion	Non-trivial; code rate $R_c < 1$ adds overhead unless QAM order is grown	Zero: constellation is grown by one bit, keeping $\eta$ fixed
Historical example	Reed-Solomon + QAM (1970s modems), convolutional + BPSK (Voyager)	TCM (Ungerboeck 1982), MLC (Imai-Hirasawa 1977)
Modern example	BICM: LDPC + QAM + Gray label (LTE/5G NR, Wi-Fi)	Ungerboeck TCM in V.32/V.34 modems; BICM-ID with matched label
Gap to capacity at $\eta = 4$	About 1-2 dB (BICM with LDPC)	About 0.5-1 dB (TCM at moderate state count)
Decoder	Binary soft-decision decoder with bitwise metrics	Full-state trellis / metric-space decoder (Viterbi on the symbol trellis)

Common Mistake: Gray labeling is a feature, not a coincidence

Mistake:

Concluding from the suboptimality theorem that any SC+M system loses a lot of capacity.

Correction:

Gray labeling has the special property that one-bit neighbors are Euclidean neighbors, so the binary-channel abstraction in BICM sees soft information that is close to the truth. With Gray labeling, BICM capacity is within a fraction of a dB of CM capacity at most spectral efficiencies. The SC+M bound we derived is loose for Gray-labeled BICM; it is tight for set-partition-labeled SC+M without iteration. The moral: labeling matters, and the Ungerboeck argument is that set-partition labeling requires a matched code to pay off.

⚠️Engineering Note

Why BICM, Not TCM, Dominates Modern Standards

Despite the theoretical elegance and provable optimality of Ungerboeck's TCM for the AWGN channel, modern standards (LTE, 5G NR, Wi-Fi 6/7, DVB-S2X) all use BICM — a formally "separated" scheme. Why?

Fading. On a slow-fading channel, TCM's tight signal-space trellis loses diversity when one constellation dimension is deep in a fade. BICM with a long bit interleaver spreads errors and extracts time diversity that TCM cannot.
HARQ. Binary LDPC codes support incremental redundancy and rate matching cleanly; TCM does not.
Flexibility. BICM decouples the code and modulation, letting the same LDPC code family serve QPSK, 16-QAM, 64-QAM, and 256-QAM. TCM needs a different code per constellation order.
Capacity gap. With Gray labeling, BICM capacity is within 0.3-0.5 dB of CM capacity at $\eta \le 6$ , so the theoretical loss is small in the regime of interest.

The right reading of the suboptimality theorem, then, is not "TCM beats BICM always" but "the separation is a real loss that must be quantified and managed." Modern BICM design (choice of labeling, LDPC degree distribution, HARQ protocol) explicitly tracks this loss.

Practical Constraints

•
5G NR LDPC + QAM + Gray label operates at 0.5-1 dB from CM capacity
•
Trellis codes from V.32/V.34 modems achieved similar gaps circa 1990 on AWGN, but did not scale to fading
•
BICM-ID (Chapter 8) can recover much of the BICM loss when signal-space labels are set-partition-like

🎓CommIT Contribution(1998)

BICM — The Caire-Taricco-Biglieri Framework

G. Caire, G. Taricco, E. Biglieri — IEEE Trans. Information Theory, vol. 44, no. 3, pp. 927-946

The foundational analysis of BICM, establishing that a powerful binary code concatenated with a bit interleaver and a higher-order modulator — a formally "separated" scheme — can achieve mutual information within a fraction of a dB of CM capacity on the AWGN channel, provided the labeling is Gray. The paper introduced the now-standard BICM capacity expression,

$C_{\rm BICM}(\mathcal{X}, \mu, \text{SNR}) \;=\; m - \sum_{i=1}^{m} \mathbb{E}\!\left[\log_2 \frac{\sum_{x \in \mathcal{X}} p(Y|x)}{\sum_{x \in \mathcal{X}_{b_i}} p(Y|x)}\right],$

and showed that with Gray labeling the gap $C_{\rm CM} - C_{\rm BICM}$ is small for all reasonable $\eta$ . The paper justified the widespread choice of BICM for standards and launched a 15-year research program on BICM extensions: iterative decoding (Ch. 8), fading-channel analysis (Ch. 6), error exponents (Ch. 7), and matched-to-BICM lattice codes. It is one of the most cited papers in coded modulation and a foundational CommIT contribution.

bicmfoundationalawgncapacityView Paper →

Key Takeaway

The code must live in signal space. Designing a binary code for Hamming distance and then mapping to QAM is an unforced error: the Euclidean-distance design criterion dominates on AWGN, and an independently chosen labeling wastes most of it. Either (a) design the code jointly with the labeling (TCM / MLC / BICM-ID), or (b) use Gray labeling, which is accidentally matched well enough to independent binary coding to yield most of the capacity (BICM). The rest of this book traces both paths.

Coded modulation

A family of techniques that combine channel coding and modulation into a single design problem, with the code living in signal space and its distance criterion expressed in Euclidean rather than Hamming distance. Includes TCM, MLC, and BICM.

Gray labeling

A bit-to-symbol mapping $\mu$ in which any two constellation points whose labels differ in exactly one bit are Euclidean neighbors. Gray labeling minimizes the per-symbol bit-error rate under bitwise ML detection and is the labeling used in all modern BICM standards.

Set-partition (Ungerboeck) labeling

A hierarchical partitioning of a signal constellation into nested subsets of increasing minimum distance, with the label bits indexing the subsets. The bit that selects the coarsest partition has the largest intra-pair one-bit distance, making set-partition labeling the natural match for TCM — but a poor match for independent binary codes.

Historical Note: Ungerboeck 1982 — The Coded-Modulation Revolution

1982

Gottfried Ungerboeck's 1982 paper "Channel Coding with Multilevel/Phase Signals" is arguably the single most influential paper in digital communications history at the practical level. Published in IEEE Transactions on Information Theory, it showed that a rate- $1/2$ convolutional code applied to a 4-state trellis over an 8-PSK constellation with set-partition labeling achieves a 3 dB coding gain over uncoded QPSK at $\eta = 2$ bits/2D — and with a 16-state trellis, 4.5 dB. This hit the desk of every modem designer within a year. The V.32 modem standard (1984, 9600 bps) used Ungerboeck's 8-state 8-PSK TCM; V.34 (1994, up to 33.6 kbps) used a more elaborate 64-state 4D TCM. Nearly every dial-up modem sold between 1985 and 2000 used TCM. The techniques were eventually supplanted by BICM (whose foundations Caire-Taricco-Biglieri established in 1998) — but Ungerboeck's insight that code and modulation must be co-designed remains the starting point of this book.

Quick Check

Consider a system with a binary outer code of minimum Hamming distance $d_H = 6$ , concatenated with a Gray-labeled 16-QAM mapper. The minimum distance of 16-QAM is $d_{\rm E, \min}^2 = 4$ (in some normalization). According to the SC+M bound, what is the largest value of effective $d_{\rm E, \min}^2$ achievable by any choice of codewords in the concatenated scheme, holding the labeling fixed?

$4$ (the constellation's own minimum distance)

$24$ , i.e., $d_H \cdot d_{\rm E, \min}^2 = 6 \cdot 4$

$64$ , i.e., $d_H^2 \cdot d_{\rm E, \min}^2$

Unbounded: any distance is achievable with a long enough code.

Correction:

24

, i.e.,

d_H \cdot d_{\rm E, \min}^2 = 6 \cdot 4

The SC+M theorem gives $d_{\rm E, \min}^2(\mathcal{C}, \mu) \le d_H \cdot d_{\rm E, \min}^2(\mathcal{X}, \mu)_{1\text{-bit}}$ . With Gray labeling, the one-bit-neighbor distance equals the constellation's minimum distance, $d_{\rm E, \min}^2(\mathcal{X}) = 4$ , so the bound is $6 \cdot 4 = 24$ . Achieving the bound requires a codeword pair at Hamming distance exactly $d_H = 6$ spread over six distinct symbol blocks.

Why This Matters: The Coded-Modulation Roadmap From Here

Chapter 1 closes with a theme that threads the rest of Book CM.

Chapter 2 presents Ungerboeck's trellis-coded modulation: the original coded-modulation scheme, where the code is a convolutional code on a set-partition-labeled constellation.
Chapter 3 generalizes to multilevel coding and multistage decoding, introducing the idea of different code rates on different label bits.
Chapter 4 develops lattice-based coded modulation, where coset codes over lattice partitions recover both coding and shaping gains.
Chapters 5-9 build the BICM framework, showing that Gray labeling lets us treat the concatenation as a set of parallel binary channels and that BICM approaches CM capacity while retaining the engineering benefits of binary coding.
Chapters 10-14 extend the Euclidean-distance criterion to fading MIMO channels, producing the rank-and-determinant criterion and the Zheng-Tse diversity-multiplexing tradeoff.
Chapters 15-18 use lattices to build DMT-optimal codes and introduce compute-and-forward.
Chapters 19-22 treat modern extensions: probabilistic shaping, massive MIMO quantization, and pragmatic 5G/6G considerations.

At every step, the design criterion is some version of "maximize the distance (Euclidean, rank+determinant, lattice minimum distance) of the code in the channel's natural metric space." The Chapter 1 insight — that this must be the design criterion, not the Hamming distance of the binary labels — sets the stage for all of it.

Why Separate Coding and Modulation Is Suboptimal

Why Separate Coding and Modulation Leaves Capacity on the Table

Definition: Separated Coding-and-Modulation vs. Coded Modulation

Theorem: SC+M with Independent Labeling Is Strictly Suboptimal

Lower-bound the Euclidean distance in each differing coordinate

Sum over differing blocks

Strict suboptimality

Example: Gray vs. Set-Partition Labeling on 16-QAM

Geometry of 16-QAM

(a) Gray labeling

(b) Set-partition (Ungerboeck) labeling

Implication

Separated Coding+Modulation vs. Coded Modulation

Common Mistake: Gray labeling is a feature, not a coincidence

Why BICM, Not TCM, Dominates Modern Standards

BICM — The Caire-Taricco-Biglieri Framework

Key Takeaway

Coded modulation

Gray labeling

Set-partition (Ungerboeck) labeling

Historical Note: Ungerboeck 1982 — The Coded-Modulation Revolution

Quick Check

Why This Matters: The Coded-Modulation Roadmap From Here

Definition:
Separated Coding-and-Modulation vs. Coded Modulation