Ferkans — Interactive Telecom Tutor

The Main Theorem of BICM

Everything built so far — the BICM encoder, the ideal-interleaver model, the per-bit LLR — converges on a single question: what rate can a BICM system reliably sustain? The answer, first derived by Caire, Taricco, and Biglieri, is the BICM capacity:

$C_{\rm BICM}(\mu) \;=\; \sum_{\ell = 0}^{L-1} I(Y; B_\ell).$

This is arguably the most important formula of the chapter. It says that the achievable rate of BICM is the sum of the unconditional per-bit mutual informations, and that the sum depends on the labelling only through these marginals. Two labellings that produce the same $\{I(Y; B_\ell)\}_\ell$ achieve the same BICM capacity — the geometry of the constellation reaches the binary decoder only through these $L$ scalar numbers.

We now derive this formula carefully, compare it to the CM capacity of Ch. 3, and show numerically how the gap scales with SNR and modulation order.

Definition:
BICM Capacity

Fix a constellation $\mathcal{X}$ of size $M = 2^L$ , a labelling $\mu$ , a memoryless channel $p(y \mid x)$ , and uniform inputs. The BICM capacity under labelling $\mu$ is $C_{\rm BICM}(\mu) \;\triangleq\; \sum_{\ell = 0}^{L-1} I(Y; B_\ell),$ where $B_\ell$ is the $\ell$ -th label bit with $X = \mu(B_0, \ldots, B_{L-1})$ and the bits i.i.d.\ uniform. It is the supremum of rates achievable by any BICM scheme (binary code + ideal interleaver + mapper $\mu$ + per-bit demapper) with vanishing error probability as the blocklength grows.

Equivalently, since each binary sub-channel has a binary-entropy interpretation, $C_{\rm BICM}(\mu) \;=\; L \;-\; \sum_{\ell = 0}^{L-1} H(B_\ell \mid Y),$ so BICM capacity equals the total label entropy $L$ minus the sum of per-bit residual uncertainties after observing $Y$ .

,

Theorem: The BICM Capacity Decomposition

Under the ideal-interleaver assumption, a BICM encoder with mapper $\mu$ and a capacity-approaching binary code operating on the BICM bit channel (Thm. $\Rightarrow$ $\Rightarrow$ Memoryless Parallel Bit Channels" data-ref-type="theorem">TIdeal Interleaver $\Rightarrow$ Memoryless Parallel Bit Channels) can reliably communicate at any rate $R \;<\; C_{\rm BICM}(\mu) \;=\; \sum_{\ell = 0}^{L-1} I(Y; B_\ell).$ Conversely, no BICM scheme with the same mapper and per-bit demapping can exceed this rate with vanishing error probability.

Moreover, comparing with the CM capacity $C_{\rm CM} = I(Y; X) = I(Y; B_0, \ldots, B_{L-1})$ , we have $C_{\rm CM} - C_{\rm BICM}(\mu) \;=\; \sum_{\ell = 0}^{L-1} \bigl[I(Y; B_\ell \mid B_{<\ell}) - I(Y; B_\ell)\bigr] \;\ge\; 0,$ with equality iff the label bits are conditionally independent given $Y$ — a non-generic condition that fails for every non-trivial QAM labelling.

The forward direction is achievability: with an ideal interleaver, the mixture bit channel is memoryless and has capacity $\tfrac{1}{L} \sum_\ell I(Y; B_\ell)$ (per use), and the coded-bit rate supported is $L$ times that, giving the stated sum. The converse follows from a mismatched-decoding argument: the BICM decoder uses the product metric $\prod_\ell p(y \mid B_\ell)$ , not the true joint likelihood $p(y \mid X)$ . The GMI under this product metric is exactly the sum of marginals. For the detailed GMI derivation see Ch. 7.

The gap formula is a direct application of the chain rule of mutual information — the same identity that drove the MLC capacity rule in Ch. 3. There, the chain rule was used to add conditional informations; here, the difference between the conditional sum and the unconditional sum is the BICM suboptimality.

Show Hint

For achievability, apply Shannon's coding theorem to the BICM mixture bit channel of Thm. $\Rightarrow$ $\Rightarrow$ Memoryless Parallel Bit Channels" data-ref-type="theorem">TIdeal Interleaver $\Rightarrow$ Memoryless Parallel Bit Channels and convert coded-bit rate to information-bit rate.

For the gap formula, write the chain rule for $I(Y; B_0, \ldots, B_{L-1})$ and subtract the unconditional sum term by term.

Show that each $I(Y; B_\ell \mid B_{<\ell}) - I(Y; B_\ell)$ equals a non-negative conditional mutual information of $B_\ell$ with $B_{<\ell}$ given $Y$ .

Proof

Achievability: mixture-channel capacity

By Theorem $\Rightarrow$ $\Rightarrow$ Memoryless Parallel Bit Channels" data-ref-type="theorem">TIdeal Interleaver $\Rightarrow$ Memoryless Parallel Bit Channels, the binary decoder sees a memoryless channel with transition $p_{W_{\rm BICM}}(y \mid b) = \frac{1}{L} \sum_\ell p_{W_\ell}(y \mid b)$ . Shannon's noisy-channel coding theorem states that this channel's capacity is $C_{W_{\rm BICM}} \;=\; I(B; Y) \;=\; \frac{1}{L} \sum_{\ell} I(Y; B_\ell),$ where the second equality follows from the fact that $I$ is linear in the input distribution when the conditional is a mixture (by direct expansion of the mutual information or by noting that choosing $\ell$ uniformly at random acts as auxiliary randomness).

The BICM system sends $N$ coded bits per codeword and maps them to $N/L$ symbols. A code of rate $R_c$ delivers $K = R_c N$ information bits. The spectral efficiency is therefore $\eta = R_c L$ bits per symbol. Applying the binary-channel coding theorem gives $R_c \;<\; C_{W_{\rm BICM}} \;=\; \frac{1}{L} \sum_\ell I(Y; B_\ell),$ i.e.\ $\eta = R_c L < \sum_\ell I(Y; B_\ell) = C_{\rm BICM}(\mu)$ .

Converse: mismatched-decoding upper bound

The BICM decoder's metric factorises over bits: $M(\mathbf{y}, \mathbf{c}) = \prod_n p_{W_{\rm BICM}}(y_n \mid c_n)$ . This is a mismatched likelihood with respect to the true joint symbol likelihood. The maximum reliable rate under any such mismatched decoder is upper-bounded by the generalised mutual information (GMI) — a standard result in the mismatched-decoding literature. For the product-form bit metric used in BICM, the GMI coincides with $\sum_\ell I(Y; B_\ell)$ (Caire-Taricco-Biglieri Thm. 1; a clean derivation is in Martínez–Guillén i Fàbregas–Caire 2008 and our Ch. 7). Hence no BICM scheme exceeds $C_{\rm BICM}(\mu)$ with vanishing error.

The CM–BICM gap via the chain rule

By the chain rule for mutual information, $C_{\rm CM} = I(Y; B_0, B_1, \ldots, B_{L-1}) = \sum_{\ell = 0}^{L-1} I(Y; B_\ell \mid B_{<\ell}),$ while by definition $C_{\rm BICM}(\mu) = \sum_\ell I(Y; B_\ell)$ . Subtracting, $C_{\rm CM} - C_{\rm BICM}(\mu) = \sum_{\ell = 1}^{L-1} \bigl[I(Y; B_\ell \mid B_{<\ell}) - I(Y; B_\ell)\bigr].$ (The $\ell = 0$ term cancels.)

Each gap term is non-negative

Using the identity $I(Y; B_\ell \mid B_{<\ell}) - I(Y; B_\ell) = I(B_\ell; B_{<\ell} \mid Y) - I(B_\ell; B_{<\ell})$ and the fact that the label bits are a priori independent (so $I(B_\ell; B_{<\ell}) = 0$ for i.i.d.\ uniform priors), we obtain $I(Y; B_\ell \mid B_{<\ell}) - I(Y; B_\ell) = I(B_\ell; B_{<\ell} \mid Y) \ge 0.$ Conditional mutual information is non-negative, and is zero iff $B_\ell$ and $B_{<\ell}$ are conditionally independent given $Y$ . Hence $C_{\rm CM} \ge C_{\rm BICM}(\mu)$ , with the equality condition stated in the theorem. $\blacksquare$

,

CM vs BICM Capacity for QAM on AWGN

Four curves on the same axes: (i) Shannon capacity $\log_2(1 + \text{SNR})$ , the ultimate upper bound; (ii) CM capacity $C_{\rm CM}$ , the capacity of an optimal $M$ -ary decoder on the uniform- $\mathcal{X}$ constellation; (iii) BICM capacity under Gray labelling, computed from the formula of Thm. TThe BICM Capacity Decomposition; (iv) BICM capacity under SP labelling, the same formula with the SP bit-channel capacities. For square QAM and all practical SNRs, $C_{\rm BICM, Gray}$ is nearly on top of $C_{\rm CM}$ , while $C_{\rm BICM, SP}$ sits visibly below. Zoom into the region where $C \approx L$ to see the low-SNR behaviour; zoom out to see the saturation at $L$ bits/symbol.

Parameters

QAM size

M

BICM/CM Capacity Ratio Across Modulation Orders

How close does BICM get to CM? This plot shows the ratio $C_{\rm BICM}(\mu) / C_{\rm CM}$ as a function of SNR for $M = 4, 16, 64$ . Under Gray labelling the ratio stays above $0.98$ for QAM across the useful SNR range (above each modulation's "waterfall"); under SP labelling the ratio drops to $0.85$ - $0.9$ at moderate SNRs. Toggle the labelling dropdown to see the difference.

Parameters

Bit labelling

Example: 16-QAM BICM Capacity at 10 dB

For 16-QAM at $\text{SNR} = 10$ dB on the AWGN channel, compute $C_{\rm CM}$ , $C_{\rm BICM, Gray}$ , and the Shannon capacity $\log_2(1 + \text{SNR})$ . Express the Gray-BICM gap in both bits and equivalent dB of SNR.

Solution

Numerical values

At $\text{SNR} = 10$ dB ( $= 10$ linearly),

Shannon: $\log_2(11) \approx 3.459$ bits/symbol.
16-QAM $C_{\rm CM} \approx 3.218$ bits/symbol (numerical integration of the AWGN likelihood over the 16-point constellation).
Under Gray labelling, $C_{\rm BICM, Gray} \approx 3.176$ bits/symbol (sum of the four per-bit capacities).

Gray-BICM gap

$C_{\rm CM} - C_{\rm BICM, Gray} \approx 0.042$ bits. At the operating point ( $\eta \approx 3.2$ bits), moving that much of the rate curve to the right costs about $0.2$ dB of SNR. This is the "cost" of using BICM instead of MLC with the same 16-QAM constellation.

Modulation-capacity gap

$\text{Shannon} - C_{\rm CM} \approx 0.241$ bits, or $\approx 1.3$ dB of SNR. This gap — the price of a uniform finite constellation vs a Gaussian input — dominates the BICM penalty by roughly a factor of six. The designer's first-order lever is constellation shaping (Ch. 4), not BICM-vs-MLC.

The punchline

BICM gives up almost nothing in capacity relative to MLC, and gives up far less than what a fixed QAM constellation already concedes relative to Gaussian signalling. This is why one binary code and a Gray-labelled QAM is the modular design of choice — the theoretical cost is negligible; the engineering savings are enormous.

The Formula Is Tight Only For the Right Metric

The capacity formula $C_{\rm BICM}(\mu) = \sum_\ell I(Y; B_\ell)$ is tight only under a specific choice of decoding metric: the demapper must compute exact per-bit marginal likelihoods $p_{W_\ell}(y \mid b)$ (or log-likelihoods). If the demapper instead uses a generic metric — say, a max-log LLR with non-ideal scaling, or a quantised soft metric — then the achievable rate is an upper-bounded by a generalised mutual information (GMI) that is strictly less than the BICM capacity and depends on the metric choice.

Three practical consequences:

Max-log demappers lose a fraction of a dB relative to exact marginal demappers; this is the operationally relevant number in a 5G receiver, not the theoretical $C_{\rm BICM}$ itself.
Bit-LLR quantisation (say, 6 bits per LLR) further tightens the upper bound. Chapter 7 quantifies the loss with GMI-based analysis.
The formula $\sum_\ell I(Y; B_\ell)$ is an upper bound (over all BICM decoders using the labelling $\mu$ ) and the achievable rate under the exact marginal metric. For this chapter we assume the exact metric unless noted otherwise.

,

Theorem: BICM Capacity — High-SNR Asymptotics

On the AWGN channel with constellation $\mathcal{X}$ of size $M = 2^L$ and any Gray labelling $\mu_G$ , $\lim_{\text{SNR} \to \infty} \bigl[C_{\rm CM} - C_{\rm BICM}(\mu_G)\bigr] \;=\; 0.$ That is, the Gray-BICM capacity converges to the CM capacity at high SNR. The convergence rate is exponential in $\text{SNR}$ : the gap decays like $Q(\sqrt{\text{SNR}} \cdot d_{\min}/\sqrt{2})$ times polynomial factors.

For any non-Gray labelling the limit gap is strictly positive; for SP the limit is bounded away from zero by at least $(L-1) - \log_2(L)$ bits for large $L$ .

At high SNR the symbol-error probability of the $M$ -ary constellation decays as $Q(\sqrt{\text{SNR}} \, d_{\min}/\sqrt{2})$ , and the dominant error events are nearest-neighbour confusions. Under Gray labelling, nearest-neighbour confusions flip one bit — so each bit position sees an almost-independent low-crossover-probability binary channel, and the product-metric decoding is near-optimal. Under SP labelling, nearest-neighbour confusions flip many bits at once, and the joint structure matters — a product metric drops a lot of information.

Show Hint

At high SNR, write $C_{\rm CM} = L - H(X \mid Y)$ and bound $H(X \mid Y)$ using the dominant symbol-error event.

Under Gray labelling, a symbol error corresponds to exactly one bit error, so $H(X \mid Y) \approx \sum_\ell H(B_\ell \mid Y)$ .

Under SP labelling, a symbol error can flip up to $L$ label bits at once; the joint entropy is strictly less than the sum of marginals.

Proof

High-SNR error event structure

At large $\text{SNR}$ the posterior $p(X = x \mid y)$ concentrates on the ML-decoded $\hat{x}(y)$ and on its nearest neighbours. The probability of a symbol error is dominated by $\tfrac{1}{2} N_{\min} Q(\sqrt{\text{SNR}/2} \, d_{\min})$ , where $N_{\min}$ is the average number of nearest neighbours and $d_{\min}$ is the minimum Euclidean distance.

Gray: one-bit flips dominate

Under Gray labelling, any nearest-neighbour pair differs in exactly one label bit. Therefore a symbol error at the ML decoder's margin corresponds to exactly one bit flip — and only one bit position has non-trivial residual uncertainty given $y$ . That is, $H(B_\ell \mid Y) \;\approx\; h_2\bigl(Q(\sqrt{\text{SNR}/2} \, d_{\min})\bigr) \big/ (2 \text{ or } 4),$ where $h_2$ is the binary entropy. Summing over $\ell$ gives the same high-SNR decay as the joint entropy $H(X \mid Y)$ , so the gap $C_{\rm CM} - C_{\rm BICM, Gray}$ vanishes exponentially.

SP: multi-bit flips cost

Under SP labelling, a nearest-neighbour flip at the ML decoder changes $b_{L-1}$ with probability $\approx 1$ , but also (with non-negligible probability) flips higher-level bits through neighbouring-coset confusion at higher intra-coset distances. The marginal entropies $H(B_\ell \mid Y)$ for small $\ell$ do not concentrate as tightly as their joint, and the gap $\sum_\ell I(Y; B_\ell) - I(Y; X)$ stays bounded away from zero. Detailed tables are in Caire-Taricco-Biglieri 1998, §V.B. $\blacksquare$

Common Mistake: BICM Is Not Optimal — Just Close Enough

Mistake:

Treating $C_{\rm BICM}$ as if it were the capacity of the underlying channel.

Correction:

$C_{\rm BICM}(\mu)$ is the capacity of the channel plus the decoder structure (single binary code + per-bit demapping). A joint $M$ -ary ML decoder achieves $C_{\rm CM} \ge C_{\rm BICM}$ . The BICM penalty is small (a few hundredths of a bit with Gray on square QAM) but strictly positive for all non-pathological cases. When we say "BICM is (nearly) optimal" we mean "nearly optimal among all schemes with the same modular-encoder constraint."

Quick Check

The capacity gap $C_{\rm CM} - C_{\rm BICM}(\mu)$ equals

$\sum_\ell I(B_\ell; B_{<\ell} \mid Y)$

$\sum_\ell H(B_\ell \mid Y)$

$I(X; Y) / L$

Zero whenever the bits are marginally independent

Correction:

\sum_\ell I(B_\ell; B_{<\ell} \mid Y)

By the chain rule, the gap is $\sum_\ell [I(Y; B_\ell \mid B_{<\ell}) - I(Y; B_\ell)]$ , and each difference equals $I(B_\ell; B_{<\ell} \mid Y)$ using the symmetry and the prior-independence of label bits. Each term is a non-negative conditional mutual information.

BICM Capacity

The maximum reliable rate of a BICM system with a given mapper $\mu$ : $C_{\rm BICM}(\mu) = \sum_{\ell=0}^{L-1} I(Y; B_\ell)$ . Upper- bounded by $C_{\rm CM} = I(Y; X)$ ; the gap is a sum of conditional mutual informations that is non-negative and vanishes at high SNR under Gray labelling.

Coded-Modulation (CM) Capacity

The capacity of a memoryless channel $p(y \mid x)$ when the input is uniform on a fixed constellation $\mathcal{X}$ : $C_{\rm CM} = I(Y; X)$ . Achievable by a joint $M$ -ary decoder (e.g., MLC/MSD), upper bound on BICM capacity.

BICM Mutual Information (Capacity)

The Main Theorem of BICM

Definition: BICM Capacity

Theorem: The BICM Capacity Decomposition

Achievability: mixture-channel capacity

Converse: mismatched-decoding upper bound

The CM–BICM gap via the chain rule

Each gap term is non-negative

CM vs BICM Capacity for QAM on AWGN

Parameters

BICM/CM Capacity Ratio Across Modulation Orders

Parameters

Example: 16-QAM BICM Capacity at 10 dB

Numerical values

Gray-BICM gap

Modulation-capacity gap

The punchline

The Formula Is Tight Only For the Right Metric

Theorem: BICM Capacity — High-SNR Asymptotics

High-SNR error event structure

Gray: one-bit flips dominate

SP: multi-bit flips cost

Common Mistake: BICM Is Not Optimal — Just Close Enough

Quick Check

BICM Capacity

Coded-Modulation (CM) Capacity

Definition:
BICM Capacity