MLC vs. BICM: A Capacity Comparison
Two Rate Decompositions for the Same Channel
There is a second, seemingly innocent, way to turn a non-binary constellation into a sum of binary mutual informations. Instead of conditional binary capacities, take unconditional ones:
This is the BICM capacity. It corresponds to a decoder that views each bit position as an independent binary channel and processes the bit streams in parallel β no history conditioning. The construction is simpler than MLC/MSD because only one binary code is needed (the streams are demultiplexed from a single mother code after interleaving), and the decoder is a single binary decoder, not of them.
The question is: how much capacity do we lose by dropping the conditioning? The answer depends on the labelling . For Ungerboeck partition-based labelling the loss is substantial. For Gray labelling it is remarkably small β typically well under bit over the entire practical SNR range. This is the observation that Caire, Taricco, and Biglieri turned into a full framework in 1998, and that subsequently drove BICM to dominance in wireless standards.
This section quantifies the MLC-vs-BICM gap and explains why, despite MLC's theoretical optimality, BICM is what every modern wireless modem uses.
Definition: BICM Capacity
BICM Capacity
Fix a constellation of size , a labelling , and a channel with uniform inputs. The BICM capacity is
where is the -th label bit when with i.i.d.\ uniform label bits.
The BICM capacity depends on the labelling through the marginal distributions of each given . For the AWGN channel with uniform inputs, these marginals are the per-bit posteriors used by any BICM demapper. Different labellings give different BICM capacities; Gray labelling is the empirical champion.
Theorem: The Capacity Ordering: CM BICM sum of independent levels
For any constellation , labelling , and symmetric channel with uniform inputs,
where the middle inequality is tight for Gray labelling within a small gap ( bit over practical SNRs), and the left inequality is tight whenever the label bits are independent given (rare β happens only for pathological channels). Explicitly, the left gap is
and every term in the sum is non-negative.
Conditional mutual information is always at least as large as the unconditional one when the conditioning variable carries information about the output: β but there is a cleaner derivation via the chain rule, below.
Use the chain rule to decompose in two different ways.
Compare the conditional decomposition (CM) with the unconditional sum (BICM).
Each difference can be shown non-negative using the conditional mutual information / conditional entropy identities.
CM $\ge$ BICM via chain rule
By the capacity rule (Thm thm-capacity-rule),
By definition, . Their difference is
(The term cancels: .)
Each difference is non-negative
Define . Then
where the last equality uses that the label bits are independent a priori, so . Conditional mutual information is non-negative, proving the claim. Equality holds iff and are conditionally independent given β which, for AWGN channels with reasonable labellings, fails.
Gray labelling: the gap is small
Under Gray labelling, consecutive constellation points differ by exactly one label bit. This makes the marginal channel for each bit position behave almost like an independent binary channel at high SNR: a symbol error flips one bit, so conditioning on other bits changes the posterior of only slightly. Empirically, the gap is below bit for square QAM at all practical SNRs and peaks at about bit for 8-PSK β see the interactive plot below.
CM Capacity vs BICM Capacity under Gray Labelling
For 4-QAM, 16-QAM, 64-QAM, and 8-PSK, this plot compares the CM capacity (the upper envelope, achievable by MLC/MSD) with the BICM capacity under Gray labelling, together with the Shannon limit . For square QAM the two curves are nearly indistinguishable β the Gray-BICM gap is a few tenths of a dB at worst. For 8-PSK the gap is noticeable but still modest (about bit at medium SNR). This is the numerical evidence that Gray-BICM is "good enough" for practical wireless modulation.
Parameters
Example: The CM-vs-BICM Gap for 16-QAM at dB
Estimate the difference for 16-QAM at dB (so ) and compare it with the gap to Shannon capacity.
Numerical values at $\ntn{snr} = 10$ dB
Running the interactive plot (or the simulation underlying it) at , dB gives approximately bits/dim and bits/dim. The Shannon limit is bits/dim.
Interpret the two gaps
The CM-to-Shannon gap is bit, or about dB of SNR at this rate. This is the modulation-capacity loss of 16-QAM β the price of restricting to a finite, uniformly- distributed constellation.
The CM-to-BICM gap is bit, or about dB of SNR. This is the price of using Gray labelling with independent bit-level decoding instead of MLC/MSD.
The punchline
The modulation-capacity loss dominates the Gray-BICM suboptimality by roughly a factor of six in this example, and by even more at high SNR where the Gray-BICM gap shrinks further. For the designer, the message is clear: if you want to close the gap to Shannon, the first lever to pull is constellation shape (Ch. 4) or constellation size (MCS adaptation) β not MLC vs BICM.
TCM vs MLC/MSD vs BICM β A Structural Comparison
| Property | TCM (Ch. 2) | MLC/MSD (Ch. 3) | BICM (Ch. 5) |
|---|---|---|---|
| Number of codes | 1 (trellis) | (binary) | 1 (binary, mother code) |
| Labelling | Ungerboeck partition | Ungerboeck partition | Gray (typically) |
| Encoder modularity | Monolithic trellis code | Modular, one per level | Single code + interleaver + mapper |
| Decoder | Single Viterbi on joint trellis | sequential binary decoders (MSD) | Single binary decoder + demapper |
| Capacity achieved | Close to CM at low | (exact) | (slightly below with Gray) |
| Rate allocation | One rate, per trellis | rates (capacity rule) | One rate, per modulation |
| Error propagation | None (jointly decoded) | Yes (next-stage sensitive to previous) | None (bits independent after demapping) |
| Historical dominance | V.32 / V.34 modems (1986β96) | Never widely deployed | DVB-S2 / LTE / 5G NR (2004β) |
Why BICM Won (Despite the Capacity Gap)
From a pure capacity perspective MLC/MSD strictly dominates BICM. So why is every modern wireless modem BICM? The reasons are practical, not theoretical:
- Code-table simplicity. BICM uses one code rate per modulation. MLC needs code rates per modulation β a separate LDPC or polar code optimised for each level's SNR. The standard's MODCOD table would grow by a factor of .
- Rate adaptation. In adaptive modulation (as in 5G), the base station re-selects MCS every few ms. A single code handling every modulation by changing rate is operationally simpler than a bank of codes, one per level per modulation.
- Gray labelling is a near-optimal choice. As the plot above shows, under Gray labelling the BICM capacity is within a few tenths of a bit of CM capacity for QAM, i.e.\ within a dB SNR penalty. This is much smaller than the gap to Shannon that constellation shape already contributes β so closing the MLC-vs-BICM gap buys little.
- Error propagation in MSD. Even a small residual BER at stage 0 can propagate and break the error-floor requirements of commercial modems, requiring an iterative outer loop that dissolves the complexity advantage of MSD.
The CommIT paper of Caire, Taricco, and Biglieri (1998) proved that Gray-BICM with a good binary code is essentially as good as CM β it is the theoretical foundation of this design decision. We treat that paper and the full BICM framework in Chapter 5.
Why This Matters: Forward to Chapter 5: The BICM Capacity Framework
The Gray-BICM bound derived here is the starting point for the foundational BICM capacity analysis of Caire, Taricco, and Biglieri (IEEE Trans. IT, 1998) β the first CommIT contribution encountered in this book. That paper formalises BICM as an independent-parallel- channels model, proves the capacity formula is the right operational quantity, analyses its coding-gain behaviour at high SNR, and establishes the design rule: use a powerful binary code, a bit interleaver, and Gray labelling. That simple recipe is what ships in every contemporary wireless standard from 3G onwards.
In Chapter 5 we unpack the full BICM framework β channel model, capacity formula, demapper structure, and the suboptimality-by-independence argument that parallels (with the opposite sign) the MLC conditioning argument of this chapter. In Chapters 6β9 we extend it to error probability analysis, iterative decoding, and adaptive modulation in practical standards.
Common Mistake: "BICM is always strictly worse than MLC/MSD" β with a caveat
Mistake:
Stating that BICM is strictly suboptimal relative to CM or MLC/MSD for every channel and constellation.
Correction:
Strictly, the inequality holds with equality iff are conditionally independent given . This happens, for instance, on fully symmetric channels where the marginal bit posteriors factorise. Moreover, with Gray labelling the gap is numerically tiny (often bit) across the practical SNR range. So "strictly suboptimal in theory" is correct; "strictly suboptimal in practice" is misleading β in practice the two are essentially interchangeable.
Historical Note: The Industry's Choice: A Case Study in Capacity vs Simplicity
1977β2005Imai and Hirasawa's 1977 MLC paper predates Ungerboeck's 1982 TCM by five years, yet MLC never enjoyed TCM's wide deployment. Two decades later, when Wachsmann, Fischer, and Huber (1999) carefully quantified the capacity rule and its ability to approach Shannon with capacity-approaching binary codes, one might have expected MLC to finally displace TCM.
Instead, the year before Wachsmann et al., Caire, Taricco, and Biglieri published their BICM analysis. The argument was structurally devastating: with Gray labelling and a good binary code, BICM gets almost all of MLC's rate at a fraction of the decoder complexity. When the DVB-S2 standardisation committee sat down in 2003 to pick a coded-modulation scheme for the next generation of satellite TV, they chose Gray-labelled BICM with a single rate-adjustable LDPC code. Every subsequent wireless standard β LTE, 5G NR, Wi-Fi 6 β followed suit.
The moral is that a scheme's success is determined by (capacity gain) / (complexity) Γ (ease of adaptation), not by capacity alone. MLC remains the right answer when the constellation has no Gray labelling (non-standard APSK, lattice points in high dimensions) β but for workhorse QAM, Gray-BICM won.
Quick Check
Which of the following orderings between and is always true?
for all labellings
for all labellings , with equality possible
The ordering depends on the SNR
for every Gray labelling
dominates for every labelling, because the difference equals a sum of conditional mutual informations . Equality holds only when the label bits are conditionally independent given the output, which is a very restrictive condition.
BICM capacity
The achievable rate of bit-interleaved coded modulation, , where the label bits are treated as independent parallel binary sub-channels. Depends on the labelling and is maximised (in practice) by Gray labelling for QAM.
Related: The Capacity Rule for MLC, Gray Labelling, Multilevel Code (MLC) Encoder