Why Separate Coding and Modulation Is Suboptimal

Why Separate Coding and Modulation Leaves Capacity on the Table

The classical textbook view, inherited from Shannon's original constructions, goes like this: take a binary channel code of minimum Hamming distance dHd_H, then feed its bits into a QAM mapper. The code designer works in GF(2)(2); the modulator designer works in RN\mathbb{R}^N. The two sides are decoupled, and the engineer can shop for each independently in the literature.

Now here is the key idea: this decoupling is a waste. The channel that the binary code "sees" after the QAM mapper is not a memoryless binary channel β€” it is a higher-order signal-space channel whose error events are Euclidean, not Hamming. A code optimized for Hamming distance does not automatically maximize Euclidean distance of the transmitted signal. This mismatch costs a fixed amount of Eb/N0E_b/N_0, and at rates of 2 bits/2D or higher, the loss is sizeable.

Ungerboeck's 1982 insight β€” the founding idea of coded modulation β€” is that the code must be designed directly in signal space: the metric the code tries to maximize is the Euclidean distance of the transmitted vector, not the Hamming distance of the binary label. This chapter closes with a precise statement of why the separated approach is suboptimal, setting up the Ungerboeck construction in Chapter 2 and the BICM framework in Chapter 5.

, ,

Definition:

Separated Coding-and-Modulation vs. Coded Modulation

A separated coding-and-modulation (SC+M) system is one in which:

  1. A binary outer code CβŠ‚{0,1}n\mathcal{C} \subset \{0,1\}^n of rate RcR_c is selected independently of the constellation.
  2. A bit-to-symbol mapping ΞΌ:{0,1}log⁑2Mβ†’X\mu: \{0,1\}^{\log_2 M} \to \mathcal{X} is chosen independently of C\mathcal{C}.
  3. The receiver computes bitwise soft information on the binary labels of the received symbols and decodes C\mathcal{C} as if the channel were memoryless binary.

A coded modulation (CM) system, by contrast, designs the sequence of transmitted signal vectors directly. The code lives on the signal set X\mathcal{X}, not on its binary labels, and the design criterion is the Euclidean distance spectrum of the sequence code, not the Hamming distance spectrum of the underlying binary code.

BICM (Chapter 5) is an interesting hybrid: it uses a binary code, a bit interleaver, and a QAM mapper β€” formally SC+M β€” but the Caire-Taricco-Biglieri analysis shows that with Gray labeling BICM comes surprisingly close to CM capacity despite the nominally suboptimal separation. Understanding why BICM works, and when it does not, is the subject of Chapters 5-8.

,

Theorem: SC+M with Independent Labeling Is Strictly Suboptimal

Fix a finite 2D constellation X\mathcal{X} with minimum Euclidean distance dE,min⁑d_{\rm E,\min} and M=2mM = 2^m points. Consider a binary outer code C\mathcal{C} of rate Rc=k/nR_c = k/n and minimum Hamming distance dHd_H, concatenated with a fixed bit-to-symbol mapping μ\mu that is independent of C\mathcal{C}. Then the effective minimum Euclidean distance of the SC+M system satisfies

dE,min⁑2(C,ΞΌ)β€…β€Šβ‰€β€…β€ŠdHβ‹…dE,min⁑2(X,ΞΌ)1-bit,d_{\rm E,\min}^2(\mathcal{C}, \mu) \;\le\; d_H \cdot d_{\rm E,\min}^2(\mathcal{X}, \mu)_{1\text{-bit}},

where dE,min⁑2(X,ΞΌ)1-bitd_{\rm E,\min}^2(\mathcal{X}, \mu)_{1\text{-bit}} is the minimum squared Euclidean distance between two constellation points whose labels differ in exactly one bit. For any mapping with dE,min⁑2(X,ΞΌ)1-bit<dE,min⁑2(X)d_{\rm E,\min}^2(\mathcal{X}, \mu)_{1\text{-bit}} < d_{\rm E,\min}^2(\mathcal{X}) β€” which holds for every non-trivial mapping when M>2M > 2 β€” the bound is strictly loose: there exist binary codes and constellation point assignments that attain a strictly larger effective dE,min⁑2d_{\rm E, \min}^2.

The binary code can guarantee that any two codewords differ in at least dHd_H positions. In each differing position, the mapped symbol is displaced by at least dE,min⁑2(X,ΞΌ)1-bitd_{\rm E,\min}^2(\mathcal{X}, \mu)_{1\text{-bit}} in Euclidean distance β€” which is the distance between two symbols whose labels differ in that one bit position. The product dHβ‹…dE,min⁑2(X,ΞΌ)1-bitd_H \cdot d_{\rm E,\min}^2(\mathcal{X}, \mu)_{1\text{-bit}} is therefore a lower bound on the total Euclidean displacement. For Gray labeling, the one-bit neighbor distance is dE,min⁑2(X)d_{\rm E,\min}^2(\mathcal{X}) itself β€” not much to save. But other labelings have dE,min⁑2(X,ΞΌ)1-bit≫dE,min⁑2(X)d_{\rm E,\min}^2(\mathcal{X}, \mu)_{1\text{-bit}} \gg d_{\rm E,\min}^2(\mathcal{X}), at which point the binary-code design can no longer chase the full Euclidean distance directly, and a matched "joint" construction dominates.

,

Example: Gray vs. Set-Partition Labeling on 16-QAM

Consider 16-QAM of energy EsE_s. Compute the one-bit-neighbor minimum Euclidean distance dE,min⁑2(X,μ)1-bitd_{\rm E,\min}^2(\mathcal{X}, \mu)_{1\text{-bit}} for (a) a Gray labeling and (b) a set-partition (Ungerboeck) labeling. Discuss the implication for the SC+M upper bound.

Separated Coding+Modulation vs. Coded Modulation

AspectSeparated Coding + Modulation (SC+M)Coded Modulation (CM)
Design variableBinary code C\mathcal{C}; mapper ΞΌ\mu chosen independentlySequence code on Xn\mathcal{X}^n; all components chosen jointly
Minimum-distance criterionHamming dHd_H (of C\mathcal{C})Euclidean dE,min⁑d_{\rm E, \min} (of the transmitted sequence)
Bandwidth expansionNon-trivial; code rate Rc<1R_c < 1 adds overhead unless QAM order is grownZero: constellation is grown by one bit, keeping Ξ·\eta fixed
Historical exampleReed-Solomon + QAM (1970s modems), convolutional + BPSK (Voyager)TCM (Ungerboeck 1982), MLC (Imai-Hirasawa 1977)
Modern exampleBICM: LDPC + QAM + Gray label (LTE/5G NR, Wi-Fi)Ungerboeck TCM in V.32/V.34 modems; BICM-ID with matched label
Gap to capacity at Ξ·=4\eta = 4About 1-2 dB (BICM with LDPC)About 0.5-1 dB (TCM at moderate state count)
DecoderBinary soft-decision decoder with bitwise metricsFull-state trellis / metric-space decoder (Viterbi on the symbol trellis)

Common Mistake: Gray labeling is a feature, not a coincidence

Mistake:

Concluding from the suboptimality theorem that any SC+M system loses a lot of capacity.

Correction:

Gray labeling has the special property that one-bit neighbors are Euclidean neighbors, so the binary-channel abstraction in BICM sees soft information that is close to the truth. With Gray labeling, BICM capacity is within a fraction of a dB of CM capacity at most spectral efficiencies. The SC+M bound we derived is loose for Gray-labeled BICM; it is tight for set-partition-labeled SC+M without iteration. The moral: labeling matters, and the Ungerboeck argument is that set-partition labeling requires a matched code to pay off.

⚠️Engineering Note

Why BICM, Not TCM, Dominates Modern Standards

Despite the theoretical elegance and provable optimality of Ungerboeck's TCM for the AWGN channel, modern standards (LTE, 5G NR, Wi-Fi 6/7, DVB-S2X) all use BICM β€” a formally "separated" scheme. Why?

  1. Fading. On a slow-fading channel, TCM's tight signal-space trellis loses diversity when one constellation dimension is deep in a fade. BICM with a long bit interleaver spreads errors and extracts time diversity that TCM cannot.
  2. HARQ. Binary LDPC codes support incremental redundancy and rate matching cleanly; TCM does not.
  3. Flexibility. BICM decouples the code and modulation, letting the same LDPC code family serve QPSK, 16-QAM, 64-QAM, and 256-QAM. TCM needs a different code per constellation order.
  4. Capacity gap. With Gray labeling, BICM capacity is within 0.3-0.5 dB of CM capacity at η≀6\eta \le 6, so the theoretical loss is small in the regime of interest.

The right reading of the suboptimality theorem, then, is not "TCM beats BICM always" but "the separation is a real loss that must be quantified and managed." Modern BICM design (choice of labeling, LDPC degree distribution, HARQ protocol) explicitly tracks this loss.

Practical Constraints
  • β€’

    5G NR LDPC + QAM + Gray label operates at 0.5-1 dB from CM capacity

  • β€’

    Trellis codes from V.32/V.34 modems achieved similar gaps circa 1990 on AWGN, but did not scale to fading

  • β€’

    BICM-ID (Chapter 8) can recover much of the BICM loss when signal-space labels are set-partition-like

πŸŽ“CommIT Contribution(1998)

BICM β€” The Caire-Taricco-Biglieri Framework

G. Caire, G. Taricco, E. Biglieri β€” IEEE Trans. Information Theory, vol. 44, no. 3, pp. 927-946

The foundational analysis of BICM, establishing that a powerful binary code concatenated with a bit interleaver and a higher-order modulator β€” a formally "separated" scheme β€” can achieve mutual information within a fraction of a dB of CM capacity on the AWGN channel, provided the labeling is Gray. The paper introduced the now-standard BICM capacity expression,

CBICM(X,ΞΌ,SNR)β€…β€Š=β€…β€Šmβˆ’βˆ‘i=1mE ⁣[log⁑2βˆ‘x∈Xp(Y∣x)βˆ‘x∈Xbip(Y∣x)],C_{\rm BICM}(\mathcal{X}, \mu, \text{SNR}) \;=\; m - \sum_{i=1}^{m} \mathbb{E}\!\left[\log_2 \frac{\sum_{x \in \mathcal{X}} p(Y|x)}{\sum_{x \in \mathcal{X}_{b_i}} p(Y|x)}\right],

and showed that with Gray labeling the gap CCMβˆ’CBICMC_{\rm CM} - C_{\rm BICM} is small for all reasonable Ξ·\eta. The paper justified the widespread choice of BICM for standards and launched a 15-year research program on BICM extensions: iterative decoding (Ch. 8), fading-channel analysis (Ch. 6), error exponents (Ch. 7), and matched-to-BICM lattice codes. It is one of the most cited papers in coded modulation and a foundational CommIT contribution.

bicmfoundationalawgncapacityView Paper β†’

Key Takeaway

The code must live in signal space. Designing a binary code for Hamming distance and then mapping to QAM is an unforced error: the Euclidean-distance design criterion dominates on AWGN, and an independently chosen labeling wastes most of it. Either (a) design the code jointly with the labeling (TCM / MLC / BICM-ID), or (b) use Gray labeling, which is accidentally matched well enough to independent binary coding to yield most of the capacity (BICM). The rest of this book traces both paths.

Coded modulation

A family of techniques that combine channel coding and modulation into a single design problem, with the code living in signal space and its distance criterion expressed in Euclidean rather than Hamming distance. Includes TCM, MLC, and BICM.

Related: BICM Capacity, Branch Metric (TCM), Gray labeling

Gray labeling

A bit-to-symbol mapping ΞΌ\mu in which any two constellation points whose labels differ in exactly one bit are Euclidean neighbors. Gray labeling minimizes the per-symbol bit-error rate under bitwise ML detection and is the labeling used in all modern BICM standards.

Related: BICM Capacity, Coded modulation, Set-partition (Ungerboeck) labeling

Set-partition (Ungerboeck) labeling

A hierarchical partitioning of a signal constellation into nested subsets of increasing minimum distance, with the label bits indexing the subsets. The bit that selects the coarsest partition has the largest intra-pair one-bit distance, making set-partition labeling the natural match for TCM β€” but a poor match for independent binary codes.

Related: Branch Metric (TCM), Coded modulation, Gray labeling

Historical Note: Ungerboeck 1982 β€” The Coded-Modulation Revolution

1982

Gottfried Ungerboeck's 1982 paper "Channel Coding with Multilevel/Phase Signals" is arguably the single most influential paper in digital communications history at the practical level. Published in IEEE Transactions on Information Theory, it showed that a rate-1/21/2 convolutional code applied to a 4-state trellis over an 8-PSK constellation with set-partition labeling achieves a 3 dB coding gain over uncoded QPSK at Ξ·=2\eta = 2 bits/2D β€” and with a 16-state trellis, 4.5 dB. This hit the desk of every modem designer within a year. The V.32 modem standard (1984, 9600 bps) used Ungerboeck's 8-state 8-PSK TCM; V.34 (1994, up to 33.6 kbps) used a more elaborate 64-state 4D TCM. Nearly every dial-up modem sold between 1985 and 2000 used TCM. The techniques were eventually supplanted by BICM (whose foundations Caire-Taricco-Biglieri established in 1998) β€” but Ungerboeck's insight that code and modulation must be co-designed remains the starting point of this book.

Quick Check

Consider a system with a binary outer code of minimum Hamming distance dH=6d_H = 6, concatenated with a Gray-labeled 16-QAM mapper. The minimum distance of 16-QAM is dE,min⁑2=4d_{\rm E, \min}^2 = 4 (in some normalization). According to the SC+M bound, what is the largest value of effective dE,min⁑2d_{\rm E, \min}^2 achievable by any choice of codewords in the concatenated scheme, holding the labeling fixed?

44 (the constellation's own minimum distance)

2424, i.e., dHβ‹…dE,min⁑2=6β‹…4d_H \cdot d_{\rm E, \min}^2 = 6 \cdot 4

6464, i.e., dH2β‹…dE,min⁑2d_H^2 \cdot d_{\rm E, \min}^2

Unbounded: any distance is achievable with a long enough code.

Why This Matters: The Coded-Modulation Roadmap From Here

Chapter 1 closes with a theme that threads the rest of Book CM.

  • Chapter 2 presents Ungerboeck's trellis-coded modulation: the original coded-modulation scheme, where the code is a convolutional code on a set-partition-labeled constellation.
  • Chapter 3 generalizes to multilevel coding and multistage decoding, introducing the idea of different code rates on different label bits.
  • Chapter 4 develops lattice-based coded modulation, where coset codes over lattice partitions recover both coding and shaping gains.
  • Chapters 5-9 build the BICM framework, showing that Gray labeling lets us treat the concatenation as a set of parallel binary channels and that BICM approaches CM capacity while retaining the engineering benefits of binary coding.
  • Chapters 10-14 extend the Euclidean-distance criterion to fading MIMO channels, producing the rank-and-determinant criterion and the Zheng-Tse diversity-multiplexing tradeoff.
  • Chapters 15-18 use lattices to build DMT-optimal codes and introduce compute-and-forward.
  • Chapters 19-22 treat modern extensions: probabilistic shaping, massive MIMO quantization, and pragmatic 5G/6G considerations.

At every step, the design criterion is some version of "maximize the distance (Euclidean, rank+determinant, lattice minimum distance) of the code in the channel's natural metric space." The Chapter 1 insight β€” that this must be the design criterion, not the Hamming distance of the binary labels β€” sets the stage for all of it.