Why Separate Coding and Modulation Is Suboptimal
Why Separate Coding and Modulation Leaves Capacity on the Table
The classical textbook view, inherited from Shannon's original constructions, goes like this: take a binary channel code of minimum Hamming distance , then feed its bits into a QAM mapper. The code designer works in GF; the modulator designer works in . The two sides are decoupled, and the engineer can shop for each independently in the literature.
Now here is the key idea: this decoupling is a waste. The channel that the binary code "sees" after the QAM mapper is not a memoryless binary channel β it is a higher-order signal-space channel whose error events are Euclidean, not Hamming. A code optimized for Hamming distance does not automatically maximize Euclidean distance of the transmitted signal. This mismatch costs a fixed amount of , and at rates of 2 bits/2D or higher, the loss is sizeable.
Ungerboeck's 1982 insight β the founding idea of coded modulation β is that the code must be designed directly in signal space: the metric the code tries to maximize is the Euclidean distance of the transmitted vector, not the Hamming distance of the binary label. This chapter closes with a precise statement of why the separated approach is suboptimal, setting up the Ungerboeck construction in Chapter 2 and the BICM framework in Chapter 5.
Definition: Separated Coding-and-Modulation vs. Coded Modulation
Separated Coding-and-Modulation vs. Coded Modulation
A separated coding-and-modulation (SC+M) system is one in which:
- A binary outer code of rate is selected independently of the constellation.
- A bit-to-symbol mapping is chosen independently of .
- The receiver computes bitwise soft information on the binary labels of the received symbols and decodes as if the channel were memoryless binary.
A coded modulation (CM) system, by contrast, designs the sequence of transmitted signal vectors directly. The code lives on the signal set , not on its binary labels, and the design criterion is the Euclidean distance spectrum of the sequence code, not the Hamming distance spectrum of the underlying binary code.
BICM (Chapter 5) is an interesting hybrid: it uses a binary code, a bit interleaver, and a QAM mapper β formally SC+M β but the Caire-Taricco-Biglieri analysis shows that with Gray labeling BICM comes surprisingly close to CM capacity despite the nominally suboptimal separation. Understanding why BICM works, and when it does not, is the subject of Chapters 5-8.
Theorem: SC+M with Independent Labeling Is Strictly Suboptimal
Fix a finite 2D constellation with minimum Euclidean distance and points. Consider a binary outer code of rate and minimum Hamming distance , concatenated with a fixed bit-to-symbol mapping that is independent of . Then the effective minimum Euclidean distance of the SC+M system satisfies
where is the minimum squared Euclidean distance between two constellation points whose labels differ in exactly one bit. For any mapping with β which holds for every non-trivial mapping when β the bound is strictly loose: there exist binary codes and constellation point assignments that attain a strictly larger effective .
The binary code can guarantee that any two codewords differ in at least positions. In each differing position, the mapped symbol is displaced by at least in Euclidean distance β which is the distance between two symbols whose labels differ in that one bit position. The product is therefore a lower bound on the total Euclidean displacement. For Gray labeling, the one-bit neighbor distance is itself β not much to save. But other labelings have , at which point the binary-code design can no longer chase the full Euclidean distance directly, and a matched "joint" construction dominates.
Pick two codewords at Hamming distance ; count the differing bit positions and map each to a symbol pair.
Show that the total squared Euclidean displacement is at least times the minimum over all one-bit neighbor pairs in the constellation.
Argue that for , some labelings have one-bit-neighbor distance much smaller than (so this is 'Gray-like') while others have it much larger (so this is 'set-partition-like').
Lower-bound the Euclidean distance in each differing coordinate
Let be two codewords at Hamming distance . Partition the coordinates into symbol blocks of bits each. In each block in which and differ in some number of bits, the mapped symbols and are at Euclidean distance at least , i.e., the minimum over all pairs of constellation points whose labels differ in exactly bits.
Sum over differing blocks
Denote by the set of blocks where and differ. For , let be the number of differing bits in block . Then
A particularly clean bound arises when each block has exactly one differing bit β we can always choose two codewords at Hamming distance exactly spread over distinct blocks:
Strict suboptimality
Taking the minimum over at Hamming distance gives
For any labeling of with , we have in the generic case (Gray labeling makes one-bit neighbors the closest pairs; set-partition labeling pushes them far apart). Neither choice of is matched to the code unless the two are designed together β which is exactly the coded modulation program.
Example: Gray vs. Set-Partition Labeling on 16-QAM
Consider 16-QAM of energy . Compute the one-bit-neighbor minimum Euclidean distance for (a) a Gray labeling and (b) a set-partition (Ungerboeck) labeling. Discuss the implication for the SC+M upper bound.
Geometry of 16-QAM
Normalize so for points on a standard grid scaled so (since before scaling, and we want , so scale the grid by to get and ). Let us keep the unnormalized geometry for simplicity: neighbors in the grid differ by in one coordinate, giving at raw energy.
(a) Gray labeling
In a Gray labeling, adjacent points in the grid differ in exactly one label bit. Therefore the minimum distance among one-bit neighbors equals the minimum distance of the constellation itself:
The SC+M bound becomes . A rate- code with gives at most , i.e., a improvement on over uncoded.
(b) Set-partition (Ungerboeck) labeling
The Ungerboeck labeling partitions 16-QAM into successively farther-apart subsets. The bit that picks the coarsest partition corresponds to a difference of two "super-squares" whose minimum intra-pair distance is times larger, i.e., d_{\rm E,\min}^2_{1\text{-bit, label bit 0}} = 16. The bit that picks the intermediate partition corresponds to intra-superset distances of , and the two least-significant bits ride on the standard grid with distance . Overall,
(the smallest of ), but the distance spectrum across the four labeled bits is very non-uniform.
Implication
The SC+M bound for Gray and for set-partition has the same one-bit-neighbor value ( in this normalization), so the simple d_H \cdot d_{\rm E,\min}^2_{1\text{-bit}} bound does not yet distinguish them. The difference shows up in the full distance spectrum: with set-partition labeling, protecting the most significant bit with a strong code amplifies its high 1-bit-neighbor distance (), so a joint code-label design with the outer code protecting label-bit 0 strongly yields . This is the Ungerboeck coding gain, and it is unreachable if the code and labeling are designed independently.
Separated Coding+Modulation vs. Coded Modulation
| Aspect | Separated Coding + Modulation (SC+M) | Coded Modulation (CM) |
|---|---|---|
| Design variable | Binary code ; mapper chosen independently | Sequence code on ; all components chosen jointly |
| Minimum-distance criterion | Hamming (of ) | Euclidean (of the transmitted sequence) |
| Bandwidth expansion | Non-trivial; code rate adds overhead unless QAM order is grown | Zero: constellation is grown by one bit, keeping fixed |
| Historical example | Reed-Solomon + QAM (1970s modems), convolutional + BPSK (Voyager) | TCM (Ungerboeck 1982), MLC (Imai-Hirasawa 1977) |
| Modern example | BICM: LDPC + QAM + Gray label (LTE/5G NR, Wi-Fi) | Ungerboeck TCM in V.32/V.34 modems; BICM-ID with matched label |
| Gap to capacity at | About 1-2 dB (BICM with LDPC) | About 0.5-1 dB (TCM at moderate state count) |
| Decoder | Binary soft-decision decoder with bitwise metrics | Full-state trellis / metric-space decoder (Viterbi on the symbol trellis) |
Common Mistake: Gray labeling is a feature, not a coincidence
Mistake:
Concluding from the suboptimality theorem that any SC+M system loses a lot of capacity.
Correction:
Gray labeling has the special property that one-bit neighbors are Euclidean neighbors, so the binary-channel abstraction in BICM sees soft information that is close to the truth. With Gray labeling, BICM capacity is within a fraction of a dB of CM capacity at most spectral efficiencies. The SC+M bound we derived is loose for Gray-labeled BICM; it is tight for set-partition-labeled SC+M without iteration. The moral: labeling matters, and the Ungerboeck argument is that set-partition labeling requires a matched code to pay off.
Why BICM, Not TCM, Dominates Modern Standards
Despite the theoretical elegance and provable optimality of Ungerboeck's TCM for the AWGN channel, modern standards (LTE, 5G NR, Wi-Fi 6/7, DVB-S2X) all use BICM β a formally "separated" scheme. Why?
- Fading. On a slow-fading channel, TCM's tight signal-space trellis loses diversity when one constellation dimension is deep in a fade. BICM with a long bit interleaver spreads errors and extracts time diversity that TCM cannot.
- HARQ. Binary LDPC codes support incremental redundancy and rate matching cleanly; TCM does not.
- Flexibility. BICM decouples the code and modulation, letting the same LDPC code family serve QPSK, 16-QAM, 64-QAM, and 256-QAM. TCM needs a different code per constellation order.
- Capacity gap. With Gray labeling, BICM capacity is within 0.3-0.5 dB of CM capacity at , so the theoretical loss is small in the regime of interest.
The right reading of the suboptimality theorem, then, is not "TCM beats BICM always" but "the separation is a real loss that must be quantified and managed." Modern BICM design (choice of labeling, LDPC degree distribution, HARQ protocol) explicitly tracks this loss.
- β’
5G NR LDPC + QAM + Gray label operates at 0.5-1 dB from CM capacity
- β’
Trellis codes from V.32/V.34 modems achieved similar gaps circa 1990 on AWGN, but did not scale to fading
- β’
BICM-ID (Chapter 8) can recover much of the BICM loss when signal-space labels are set-partition-like
BICM β The Caire-Taricco-Biglieri Framework
The foundational analysis of BICM, establishing that a powerful binary code concatenated with a bit interleaver and a higher-order modulator β a formally "separated" scheme β can achieve mutual information within a fraction of a dB of CM capacity on the AWGN channel, provided the labeling is Gray. The paper introduced the now-standard BICM capacity expression,
and showed that with Gray labeling the gap is small for all reasonable . The paper justified the widespread choice of BICM for standards and launched a 15-year research program on BICM extensions: iterative decoding (Ch. 8), fading-channel analysis (Ch. 6), error exponents (Ch. 7), and matched-to-BICM lattice codes. It is one of the most cited papers in coded modulation and a foundational CommIT contribution.
Key Takeaway
The code must live in signal space. Designing a binary code for Hamming distance and then mapping to QAM is an unforced error: the Euclidean-distance design criterion dominates on AWGN, and an independently chosen labeling wastes most of it. Either (a) design the code jointly with the labeling (TCM / MLC / BICM-ID), or (b) use Gray labeling, which is accidentally matched well enough to independent binary coding to yield most of the capacity (BICM). The rest of this book traces both paths.
Coded modulation
A family of techniques that combine channel coding and modulation into a single design problem, with the code living in signal space and its distance criterion expressed in Euclidean rather than Hamming distance. Includes TCM, MLC, and BICM.
Related: BICM Capacity, Branch Metric (TCM), Gray labeling
Gray labeling
A bit-to-symbol mapping in which any two constellation points whose labels differ in exactly one bit are Euclidean neighbors. Gray labeling minimizes the per-symbol bit-error rate under bitwise ML detection and is the labeling used in all modern BICM standards.
Related: BICM Capacity, Coded modulation, Set-partition (Ungerboeck) labeling
Set-partition (Ungerboeck) labeling
A hierarchical partitioning of a signal constellation into nested subsets of increasing minimum distance, with the label bits indexing the subsets. The bit that selects the coarsest partition has the largest intra-pair one-bit distance, making set-partition labeling the natural match for TCM β but a poor match for independent binary codes.
Related: Branch Metric (TCM), Coded modulation, Gray labeling
Historical Note: Ungerboeck 1982 β The Coded-Modulation Revolution
1982Gottfried Ungerboeck's 1982 paper "Channel Coding with Multilevel/Phase Signals" is arguably the single most influential paper in digital communications history at the practical level. Published in IEEE Transactions on Information Theory, it showed that a rate- convolutional code applied to a 4-state trellis over an 8-PSK constellation with set-partition labeling achieves a 3 dB coding gain over uncoded QPSK at bits/2D β and with a 16-state trellis, 4.5 dB. This hit the desk of every modem designer within a year. The V.32 modem standard (1984, 9600 bps) used Ungerboeck's 8-state 8-PSK TCM; V.34 (1994, up to 33.6 kbps) used a more elaborate 64-state 4D TCM. Nearly every dial-up modem sold between 1985 and 2000 used TCM. The techniques were eventually supplanted by BICM (whose foundations Caire-Taricco-Biglieri established in 1998) β but Ungerboeck's insight that code and modulation must be co-designed remains the starting point of this book.
Quick Check
Consider a system with a binary outer code of minimum Hamming distance , concatenated with a Gray-labeled 16-QAM mapper. The minimum distance of 16-QAM is (in some normalization). According to the SC+M bound, what is the largest value of effective achievable by any choice of codewords in the concatenated scheme, holding the labeling fixed?
(the constellation's own minimum distance)
, i.e.,
, i.e.,
Unbounded: any distance is achievable with a long enough code.
The SC+M theorem gives . With Gray labeling, the one-bit-neighbor distance equals the constellation's minimum distance, , so the bound is . Achieving the bound requires a codeword pair at Hamming distance exactly spread over six distinct symbol blocks.
Why This Matters: The Coded-Modulation Roadmap From Here
Chapter 1 closes with a theme that threads the rest of Book CM.
- Chapter 2 presents Ungerboeck's trellis-coded modulation: the original coded-modulation scheme, where the code is a convolutional code on a set-partition-labeled constellation.
- Chapter 3 generalizes to multilevel coding and multistage decoding, introducing the idea of different code rates on different label bits.
- Chapter 4 develops lattice-based coded modulation, where coset codes over lattice partitions recover both coding and shaping gains.
- Chapters 5-9 build the BICM framework, showing that Gray labeling lets us treat the concatenation as a set of parallel binary channels and that BICM approaches CM capacity while retaining the engineering benefits of binary coding.
- Chapters 10-14 extend the Euclidean-distance criterion to fading MIMO channels, producing the rank-and-determinant criterion and the Zheng-Tse diversity-multiplexing tradeoff.
- Chapters 15-18 use lattices to build DMT-optimal codes and introduce compute-and-forward.
- Chapters 19-22 treat modern extensions: probabilistic shaping, massive MIMO quantization, and pragmatic 5G/6G considerations.
At every step, the design criterion is some version of "maximize the distance (Euclidean, rank+determinant, lattice minimum distance) of the code in the channel's natural metric space." The Chapter 1 insight β that this must be the design criterion, not the Hamming distance of the binary labels β sets the stage for all of it.