Ferkans — Interactive Telecom Tutor

Two Rate Decompositions for the Same Channel

There is a second, seemingly innocent, way to turn a non-binary constellation into a sum of binary mutual informations. Instead of conditional binary capacities, take unconditional ones:

$C_{\rm BICM} \;\triangleq\; \sum_{i=0}^{L-1} I(Y; B_i).$

This is the BICM capacity. It corresponds to a decoder that views each bit position as an independent binary channel and processes the $L$ bit streams in parallel — no history conditioning. The construction is simpler than MLC/MSD because only one binary code is needed (the streams are demultiplexed from a single mother code after interleaving), and the decoder is a single binary decoder, not $L$ of them.

The question is: how much capacity do we lose by dropping the conditioning? The answer depends on the labelling $\mu$ . For Ungerboeck partition-based labelling the loss is substantial. For Gray labelling it is remarkably small — typically well under $0.5$ bit over the entire practical SNR range. This is the observation that Caire, Taricco, and Biglieri turned into a full framework in 1998, and that subsequently drove BICM to dominance in wireless standards.

This section quantifies the MLC-vs-BICM gap and explains why, despite MLC's theoretical optimality, BICM is what every modern wireless modem uses.

,

Definition:
BICM Capacity

Fix a constellation $\mathcal{X}$ of size $M = 2^L$ , a labelling $\mu : \{0,1\}^L \to \mathcal{X}$ , and a channel $p(y \mid x)$ with uniform inputs. The BICM capacity is

$C_{\rm BICM}(\mu) \;\triangleq\; \sum_{i=0}^{L-1} I(Y; B_i)$

where $B_i$ is the $i$ -th label bit when $X = \mu(B_0, \ldots, B_{L-1})$ with i.i.d.\ uniform label bits.

The BICM capacity depends on the labelling through the marginal distributions of each $B_i$ given $Y$ . For the AWGN channel with uniform inputs, these marginals are the per-bit posteriors used by any BICM demapper. Different labellings give different BICM capacities; Gray labelling is the empirical champion.

Theorem: The Capacity Ordering: CM $\ge$ BICM $\ge$ sum of independent levels

For any constellation $\mathcal{X}$ , labelling $\mu$ , and symmetric channel with uniform inputs,

$C_{\rm CM} \;\ge\; C_{\rm BICM}(\mu) \;\ge\; \sum_{i=0}^{L-1} \max\bigl(0, I(Y; B_i) - \Delta_i\bigr),$

where the middle inequality is tight for Gray labelling within a small gap ( $< 0.5$ bit over practical SNRs), and the left inequality is tight whenever the label bits are independent given $Y$ (rare — happens only for pathological channels). Explicitly, the left gap is

$C_{\rm CM} - C_{\rm BICM}(\mu) \;=\; \sum_{i=1}^{L-1} \bigl[I(Y; B_i \mid B_0, \ldots, B_{i-1}) - I(Y; B_i)\bigr],$

and every term in the sum is non-negative.

Conditional mutual information is always at least as large as the unconditional one when the conditioning variable carries information about the output: $I(Y; B_i \mid B_0, \ldots, B_{i-1}) \ge I(Y; B_i) - I(Y; B_0, \ldots, B_{i-1}) + I(Y; B_0, \ldots, B_{i-1})$ — but there is a cleaner derivation via the chain rule, below.

Show Hint

Use the chain rule to decompose $I(Y; B_0, \ldots, B_{L-1})$ in two different ways.

Compare the conditional decomposition (CM) with the unconditional sum (BICM).

Each difference $I(Y; B_i \mid B_0, \ldots, B_{i-1}) - I(Y; B_i)$ can be shown non-negative using the conditional mutual information / conditional entropy identities.

Proof

CM $\ge$ BICM via chain rule

By the capacity rule (Thm thm-capacity-rule),

$C_{\rm CM} = \sum_{i=0}^{L-1} I(Y; B_i \mid B_0, \ldots, B_{i-1}).$

By definition, $C_{\rm BICM} = \sum_{i=0}^{L-1} I(Y; B_i)$ . Their difference is

$C_{\rm CM} - C_{\rm BICM} = \sum_{i=1}^{L-1} \bigl[ I(Y; B_i \mid B_0, \ldots, B_{i-1}) - I(Y; B_i) \bigr].$

(The $i = 0$ term cancels: $I(Y; B_0 \mid \emptyset) = I(Y; B_0)$ .)

Each difference is non-negative

Define $B_{<i} = (B_0, \ldots, B_{i-1})$ . Then

$I(Y; B_i \mid B_{<i}) - I(Y; B_i) \;=\; I(B_i; B_{<i} \mid Y) - I(B_i; B_{<i}) \;=\; I(B_i; B_{<i} \mid Y),$

where the last equality uses that the label bits are independent a priori, so $I(B_i; B_{<i}) = 0$ . Conditional mutual information is non-negative, proving the claim. Equality holds iff $B_i$ and $B_{<i}$ are conditionally independent given $Y$ — which, for AWGN channels with reasonable labellings, fails.

Gray labelling: the gap is small

Under Gray labelling, consecutive constellation points differ by exactly one label bit. This makes the marginal channel for each bit position behave almost like an independent binary channel at high SNR: a symbol error flips one bit, so conditioning on other bits changes the posterior of $B_i$ only slightly. Empirically, the gap $C_{\rm CM} - C_{\rm BICM, Gray}$ is below $0.1$ bit for square QAM at all practical SNRs and peaks at about $0.4$ bit for 8-PSK — see the interactive plot below. $\blacksquare$

,

CM Capacity vs BICM Capacity under Gray Labelling

For 4-QAM, 16-QAM, 64-QAM, and 8-PSK, this plot compares the CM capacity (the upper envelope, achievable by MLC/MSD) with the BICM capacity under Gray labelling, together with the Shannon limit $\log_2(1 + \text{SNR})$ . For square QAM the two curves are nearly indistinguishable — the Gray-BICM gap is a few tenths of a dB at worst. For 8-PSK the gap is noticeable but still modest (about $0.4$ bit at medium SNR). This is the numerical evidence that Gray-BICM is "good enough" for practical wireless modulation.

Parameters

QAM size

M

Example: The CM-vs-BICM Gap for 16-QAM at $\text{SNR} = 10$ dB

Estimate the difference $C_{\rm CM} - C_{\rm BICM, Gray}$ for 16-QAM at $\text{SNR} = 10$ dB (so $E_s/N_0 = 10$ ) and compare it with the gap to Shannon capacity.

Solution

Numerical values at $\ntn{snr} = 10$ dB

Running the interactive plot (or the simulation underlying it) at $M = 16$ , $\text{SNR} = 10$ dB gives approximately $C_{\rm CM} \approx 3.22$ bits/dim and $C_{\rm BICM, Gray} \approx 3.18$ bits/dim. The Shannon limit is $\log_2(1 + 10) \approx 3.46$ bits/dim.

Interpret the two gaps

The CM-to-Shannon gap is $3.46 - 3.22 = 0.24$ bit, or about $1.3$ dB of SNR at this rate. This is the modulation-capacity loss of 16-QAM — the price of restricting to a finite, uniformly- distributed constellation.

The CM-to-BICM gap is $3.22 - 3.18 = 0.04$ bit, or about $0.2$ dB of SNR. This is the price of using Gray labelling with independent bit-level decoding instead of MLC/MSD.

The punchline

The modulation-capacity loss dominates the Gray-BICM suboptimality by roughly a factor of six in this example, and by even more at high SNR where the Gray-BICM gap shrinks further. For the designer, the message is clear: if you want to close the gap to Shannon, the first lever to pull is constellation shape (Ch. 4) or constellation size (MCS adaptation) — not MLC vs BICM.

TCM vs MLC/MSD vs BICM — A Structural Comparison

Property	TCM (Ch. 2)	MLC/MSD (Ch. 3)	BICM (Ch. 5)
Number of codes	1 (trellis)	$L = \log_2 M$ (binary)	1 (binary, mother code)
Labelling	Ungerboeck partition	Ungerboeck partition	Gray (typically)
Encoder modularity	Monolithic trellis code	Modular, one per level	Single code + interleaver + mapper
Decoder	Single Viterbi on joint trellis	$L$ sequential binary decoders (MSD)	Single binary decoder + demapper
Capacity achieved	Close to CM at low $L$	$C_{\rm CM}$ (exact)	$C_{\rm BICM}$ (slightly below $C_{\rm CM}$ with Gray)
Rate allocation	One rate, per trellis	$L$ rates (capacity rule)	One rate, per modulation
Error propagation	None (jointly decoded)	Yes (next-stage sensitive to previous)	None (bits independent after demapping)
Historical dominance	V.32 / V.34 modems (1986–96)	Never widely deployed	DVB-S2 / LTE / 5G NR (2004–)

Why BICM Won (Despite the Capacity Gap)

From a pure capacity perspective MLC/MSD strictly dominates BICM. So why is every modern wireless modem BICM? The reasons are practical, not theoretical:

Code-table simplicity. BICM uses one code rate per modulation. MLC needs $L$ code rates per modulation — a separate LDPC or polar code optimised for each level's SNR. The standard's MODCOD table would grow by a factor of $L$ .
Rate adaptation. In adaptive modulation (as in 5G), the base station re-selects MCS every few ms. A single code handling every modulation by changing rate is operationally simpler than a bank of codes, one per level per modulation.
Gray labelling is a near-optimal choice. As the plot above shows, under Gray labelling the BICM capacity is within a few tenths of a bit of CM capacity for QAM, i.e.\ within a $\lesssim 0.5$ dB SNR penalty. This is much smaller than the gap to Shannon that constellation shape already contributes — so closing the MLC-vs-BICM gap buys little.
Error propagation in MSD. Even a small residual BER at stage 0 can propagate and break the $\ge 10^{-9}$ error-floor requirements of commercial modems, requiring an iterative outer loop that dissolves the complexity advantage of MSD.

The CommIT paper of Caire, Taricco, and Biglieri (1998) proved that Gray-BICM with a good binary code is essentially as good as CM — it is the theoretical foundation of this design decision. We treat that paper and the full BICM framework in Chapter 5.

,

Why This Matters: Forward to Chapter 5: The BICM Capacity Framework

The Gray-BICM bound derived here is the starting point for the foundational BICM capacity analysis of Caire, Taricco, and Biglieri (IEEE Trans. IT, 1998) — the first CommIT contribution encountered in this book. That paper formalises BICM as an independent-parallel- channels model, proves the capacity formula $C_{\rm BICM} = \sum_i I(Y; B_i)$ is the right operational quantity, analyses its coding-gain behaviour at high SNR, and establishes the design rule: use a powerful binary code, a bit interleaver, and Gray labelling. That simple recipe is what ships in every contemporary wireless standard from 3G onwards.

In Chapter 5 we unpack the full BICM framework — channel model, capacity formula, demapper structure, and the suboptimality-by-independence argument that parallels (with the opposite sign) the MLC conditioning argument of this chapter. In Chapters 6–9 we extend it to error probability analysis, iterative decoding, and adaptive modulation in practical standards.

,

Common Mistake: "BICM is always strictly worse than MLC/MSD" — with a caveat

Mistake:

Stating that BICM is strictly suboptimal relative to CM or MLC/MSD for every channel and constellation.

Correction:

Strictly, the inequality $C_{\rm BICM} \le C_{\rm CM}$ holds with equality iff $B_0, \ldots, B_{L-1}$ are conditionally independent given $Y$ . This happens, for instance, on fully symmetric channels where the marginal bit posteriors factorise. Moreover, with Gray labelling the gap is numerically tiny (often $< 0.1$ bit) across the practical SNR range. So "strictly suboptimal in theory" is correct; "strictly suboptimal in practice" is misleading — in practice the two are essentially interchangeable.

Historical Note: The Industry's Choice: A Case Study in Capacity vs Simplicity

1977–2005

Imai and Hirasawa's 1977 MLC paper predates Ungerboeck's 1982 TCM by five years, yet MLC never enjoyed TCM's wide deployment. Two decades later, when Wachsmann, Fischer, and Huber (1999) carefully quantified the capacity rule and its ability to approach Shannon with capacity-approaching binary codes, one might have expected MLC to finally displace TCM.

Instead, the year before Wachsmann et al., Caire, Taricco, and Biglieri published their BICM analysis. The argument was structurally devastating: with Gray labelling and a good binary code, BICM gets almost all of MLC's rate at a fraction of the decoder complexity. When the DVB-S2 standardisation committee sat down in 2003 to pick a coded-modulation scheme for the next generation of satellite TV, they chose Gray-labelled BICM with a single rate-adjustable LDPC code. Every subsequent wireless standard — LTE, 5G NR, Wi-Fi 6 — followed suit.

The moral is that a scheme's success is determined by (capacity gain) / (complexity) × (ease of adaptation), not by capacity alone. MLC remains the right answer when the constellation has no Gray labelling (non-standard APSK, lattice points in high dimensions) — but for workhorse QAM, Gray-BICM won.

, ,

Quick Check

Which of the following orderings between $C_{\rm CM}$ and $C_{\rm BICM}(\mu)$ is always true?

$C_{\rm CM} < C_{\rm BICM}(\mu)$ for all labellings $\mu$

$C_{\rm CM} \ge C_{\rm BICM}(\mu)$ for all labellings $\mu$ , with equality possible

The ordering depends on the SNR

$C_{\rm CM} = C_{\rm BICM}(\mu)$ for every Gray labelling

Correction:

C_{\rm CM} \ge C_{\rm BICM}(\mu)

for all labellings

\mu

, with equality possible

$C_{\rm CM}$ dominates $C_{\rm BICM}(\mu)$ for every labelling, because the difference equals a sum of conditional mutual informations $I(B_i; B_{<i} \mid Y) \ge 0$ . Equality holds only when the label bits are conditionally independent given the output, which is a very restrictive condition.

BICM capacity

The achievable rate of bit-interleaved coded modulation, $C_{\rm BICM}(\mu) = \sum_i I(Y; B_i)$ , where the label bits are treated as independent parallel binary sub-channels. Depends on the labelling and is maximised (in practice) by Gray labelling for QAM.

MLC vs. BICM: A Capacity Comparison