MLC vs. BICM: A Capacity Comparison

Two Rate Decompositions for the Same Channel

There is a second, seemingly innocent, way to turn a non-binary constellation into a sum of binary mutual informations. Instead of conditional binary capacities, take unconditional ones:

CBICMβ€…β€Šβ‰œβ€…β€Šβˆ‘i=0Lβˆ’1I(Y;Bi).C_{\rm BICM} \;\triangleq\; \sum_{i=0}^{L-1} I(Y; B_i).

This is the BICM capacity. It corresponds to a decoder that views each bit position as an independent binary channel and processes the LL bit streams in parallel β€” no history conditioning. The construction is simpler than MLC/MSD because only one binary code is needed (the streams are demultiplexed from a single mother code after interleaving), and the decoder is a single binary decoder, not LL of them.

The question is: how much capacity do we lose by dropping the conditioning? The answer depends on the labelling ΞΌ\mu. For Ungerboeck partition-based labelling the loss is substantial. For Gray labelling it is remarkably small β€” typically well under 0.50.5 bit over the entire practical SNR range. This is the observation that Caire, Taricco, and Biglieri turned into a full framework in 1998, and that subsequently drove BICM to dominance in wireless standards.

This section quantifies the MLC-vs-BICM gap and explains why, despite MLC's theoretical optimality, BICM is what every modern wireless modem uses.

,

Definition:

BICM Capacity

Fix a constellation X\mathcal{X} of size M=2LM = 2^L, a labelling ΞΌ:{0,1}Lβ†’X\mu : \{0,1\}^L \to \mathcal{X}, and a channel p(y∣x)p(y \mid x) with uniform inputs. The BICM capacity is

CBICM(ΞΌ)β€…β€Šβ‰œβ€…β€Šβˆ‘i=0Lβˆ’1I(Y;Bi)C_{\rm BICM}(\mu) \;\triangleq\; \sum_{i=0}^{L-1} I(Y; B_i)

where BiB_i is the ii-th label bit when X=ΞΌ(B0,…,BLβˆ’1)X = \mu(B_0, \ldots, B_{L-1}) with i.i.d.\ uniform label bits.

The BICM capacity depends on the labelling through the marginal distributions of each BiB_i given YY. For the AWGN channel with uniform inputs, these marginals are the per-bit posteriors used by any BICM demapper. Different labellings give different BICM capacities; Gray labelling is the empirical champion.

Theorem: The Capacity Ordering: CM β‰₯\ge BICM β‰₯\ge sum of independent levels

For any constellation X\mathcal{X}, labelling ΞΌ\mu, and symmetric channel with uniform inputs,

CCMβ€…β€Šβ‰₯β€…β€ŠCBICM(ΞΌ)β€…β€Šβ‰₯β€…β€Šβˆ‘i=0Lβˆ’1max⁑(0,I(Y;Bi)βˆ’Ξ”i),C_{\rm CM} \;\ge\; C_{\rm BICM}(\mu) \;\ge\; \sum_{i=0}^{L-1} \max\bigl(0, I(Y; B_i) - \Delta_i\bigr),

where the middle inequality is tight for Gray labelling within a small gap (<0.5< 0.5 bit over practical SNRs), and the left inequality is tight whenever the label bits are independent given YY (rare β€” happens only for pathological channels). Explicitly, the left gap is

CCMβˆ’CBICM(ΞΌ)β€…β€Š=β€…β€Šβˆ‘i=1Lβˆ’1[I(Y;Bi∣B0,…,Biβˆ’1)βˆ’I(Y;Bi)],C_{\rm CM} - C_{\rm BICM}(\mu) \;=\; \sum_{i=1}^{L-1} \bigl[I(Y; B_i \mid B_0, \ldots, B_{i-1}) - I(Y; B_i)\bigr],

and every term in the sum is non-negative.

Conditional mutual information is always at least as large as the unconditional one when the conditioning variable carries information about the output: I(Y;Bi∣B0,…,Biβˆ’1)β‰₯I(Y;Bi)βˆ’I(Y;B0,…,Biβˆ’1)+I(Y;B0,…,Biβˆ’1)I(Y; B_i \mid B_0, \ldots, B_{i-1}) \ge I(Y; B_i) - I(Y; B_0, \ldots, B_{i-1}) + I(Y; B_0, \ldots, B_{i-1}) β€” but there is a cleaner derivation via the chain rule, below.

,

CM Capacity vs BICM Capacity under Gray Labelling

For 4-QAM, 16-QAM, 64-QAM, and 8-PSK, this plot compares the CM capacity (the upper envelope, achievable by MLC/MSD) with the BICM capacity under Gray labelling, together with the Shannon limit log⁑2(1+SNR)\log_2(1 + \text{SNR}). For square QAM the two curves are nearly indistinguishable β€” the Gray-BICM gap is a few tenths of a dB at worst. For 8-PSK the gap is noticeable but still modest (about 0.40.4 bit at medium SNR). This is the numerical evidence that Gray-BICM is "good enough" for practical wireless modulation.

Parameters

Example: The CM-vs-BICM Gap for 16-QAM at SNR=10\text{SNR} = 10 dB

Estimate the difference CCMβˆ’CBICM,GrayC_{\rm CM} - C_{\rm BICM, Gray} for 16-QAM at SNR=10\text{SNR} = 10 dB (so Es/N0=10E_s/N_0 = 10) and compare it with the gap to Shannon capacity.

TCM vs MLC/MSD vs BICM β€” A Structural Comparison

PropertyTCM (Ch. 2)MLC/MSD (Ch. 3)BICM (Ch. 5)
Number of codes1 (trellis)L=log⁑2ML = \log_2 M (binary)1 (binary, mother code)
LabellingUngerboeck partitionUngerboeck partitionGray (typically)
Encoder modularityMonolithic trellis codeModular, one per levelSingle code + interleaver + mapper
DecoderSingle Viterbi on joint trellisLL sequential binary decoders (MSD)Single binary decoder + demapper
Capacity achievedClose to CM at low LLCCMC_{\rm CM} (exact)CBICMC_{\rm BICM} (slightly below CCMC_{\rm CM} with Gray)
Rate allocationOne rate, per trellisLL rates (capacity rule)One rate, per modulation
Error propagationNone (jointly decoded)Yes (next-stage sensitive to previous)None (bits independent after demapping)
Historical dominanceV.32 / V.34 modems (1986–96)Never widely deployedDVB-S2 / LTE / 5G NR (2004–)

Why BICM Won (Despite the Capacity Gap)

From a pure capacity perspective MLC/MSD strictly dominates BICM. So why is every modern wireless modem BICM? The reasons are practical, not theoretical:

  1. Code-table simplicity. BICM uses one code rate per modulation. MLC needs LL code rates per modulation β€” a separate LDPC or polar code optimised for each level's SNR. The standard's MODCOD table would grow by a factor of LL.
  2. Rate adaptation. In adaptive modulation (as in 5G), the base station re-selects MCS every few ms. A single code handling every modulation by changing rate is operationally simpler than a bank of codes, one per level per modulation.
  3. Gray labelling is a near-optimal choice. As the plot above shows, under Gray labelling the BICM capacity is within a few tenths of a bit of CM capacity for QAM, i.e.\ within a ≲0.5\lesssim 0.5 dB SNR penalty. This is much smaller than the gap to Shannon that constellation shape already contributes β€” so closing the MLC-vs-BICM gap buys little.
  4. Error propagation in MSD. Even a small residual BER at stage 0 can propagate and break the β‰₯10βˆ’9\ge 10^{-9} error-floor requirements of commercial modems, requiring an iterative outer loop that dissolves the complexity advantage of MSD.

The CommIT paper of Caire, Taricco, and Biglieri (1998) proved that Gray-BICM with a good binary code is essentially as good as CM β€” it is the theoretical foundation of this design decision. We treat that paper and the full BICM framework in Chapter 5.

,

Why This Matters: Forward to Chapter 5: The BICM Capacity Framework

The Gray-BICM bound derived here is the starting point for the foundational BICM capacity analysis of Caire, Taricco, and Biglieri (IEEE Trans. IT, 1998) β€” the first CommIT contribution encountered in this book. That paper formalises BICM as an independent-parallel- channels model, proves the capacity formula CBICM=βˆ‘iI(Y;Bi)C_{\rm BICM} = \sum_i I(Y; B_i) is the right operational quantity, analyses its coding-gain behaviour at high SNR, and establishes the design rule: use a powerful binary code, a bit interleaver, and Gray labelling. That simple recipe is what ships in every contemporary wireless standard from 3G onwards.

In Chapter 5 we unpack the full BICM framework β€” channel model, capacity formula, demapper structure, and the suboptimality-by-independence argument that parallels (with the opposite sign) the MLC conditioning argument of this chapter. In Chapters 6–9 we extend it to error probability analysis, iterative decoding, and adaptive modulation in practical standards.

,

Common Mistake: "BICM is always strictly worse than MLC/MSD" β€” with a caveat

Mistake:

Stating that BICM is strictly suboptimal relative to CM or MLC/MSD for every channel and constellation.

Correction:

Strictly, the inequality CBICM≀CCMC_{\rm BICM} \le C_{\rm CM} holds with equality iff B0,…,BLβˆ’1B_0, \ldots, B_{L-1} are conditionally independent given YY. This happens, for instance, on fully symmetric channels where the marginal bit posteriors factorise. Moreover, with Gray labelling the gap is numerically tiny (often <0.1< 0.1 bit) across the practical SNR range. So "strictly suboptimal in theory" is correct; "strictly suboptimal in practice" is misleading β€” in practice the two are essentially interchangeable.

Historical Note: The Industry's Choice: A Case Study in Capacity vs Simplicity

1977–2005

Imai and Hirasawa's 1977 MLC paper predates Ungerboeck's 1982 TCM by five years, yet MLC never enjoyed TCM's wide deployment. Two decades later, when Wachsmann, Fischer, and Huber (1999) carefully quantified the capacity rule and its ability to approach Shannon with capacity-approaching binary codes, one might have expected MLC to finally displace TCM.

Instead, the year before Wachsmann et al., Caire, Taricco, and Biglieri published their BICM analysis. The argument was structurally devastating: with Gray labelling and a good binary code, BICM gets almost all of MLC's rate at a fraction of the decoder complexity. When the DVB-S2 standardisation committee sat down in 2003 to pick a coded-modulation scheme for the next generation of satellite TV, they chose Gray-labelled BICM with a single rate-adjustable LDPC code. Every subsequent wireless standard β€” LTE, 5G NR, Wi-Fi 6 β€” followed suit.

The moral is that a scheme's success is determined by (capacity gain) / (complexity) Γ— (ease of adaptation), not by capacity alone. MLC remains the right answer when the constellation has no Gray labelling (non-standard APSK, lattice points in high dimensions) β€” but for workhorse QAM, Gray-BICM won.

, ,

Quick Check

Which of the following orderings between CCMC_{\rm CM} and CBICM(ΞΌ)C_{\rm BICM}(\mu) is always true?

CCM<CBICM(ΞΌ)C_{\rm CM} < C_{\rm BICM}(\mu) for all labellings ΞΌ\mu

CCMβ‰₯CBICM(ΞΌ)C_{\rm CM} \ge C_{\rm BICM}(\mu) for all labellings ΞΌ\mu, with equality possible

The ordering depends on the SNR

CCM=CBICM(ΞΌ)C_{\rm CM} = C_{\rm BICM}(\mu) for every Gray labelling

BICM capacity

The achievable rate of bit-interleaved coded modulation, CBICM(ΞΌ)=βˆ‘iI(Y;Bi)C_{\rm BICM}(\mu) = \sum_i I(Y; B_i), where the label bits are treated as independent parallel binary sub-channels. Depends on the labelling and is maximised (in practice) by Gray labelling for QAM.

Related: The Capacity Rule for MLC, Gray Labelling, Multilevel Code (MLC) Encoder