BICM Mutual Information (Capacity)

The Main Theorem of BICM

Everything built so far β€” the BICM encoder, the ideal-interleaver model, the per-bit LLR β€” converges on a single question: what rate can a BICM system reliably sustain? The answer, first derived by Caire, Taricco, and Biglieri, is the BICM capacity:

CBICM(ΞΌ)β€…β€Š=β€…β€Šβˆ‘β„“=0Lβˆ’1I(Y;Bβ„“).C_{\rm BICM}(\mu) \;=\; \sum_{\ell = 0}^{L-1} I(Y; B_\ell).

This is arguably the most important formula of the chapter. It says that the achievable rate of BICM is the sum of the unconditional per-bit mutual informations, and that the sum depends on the labelling only through these marginals. Two labellings that produce the same {I(Y;Bβ„“)}β„“\{I(Y; B_\ell)\}_\ell achieve the same BICM capacity β€” the geometry of the constellation reaches the binary decoder only through these LL scalar numbers.

We now derive this formula carefully, compare it to the CM capacity of Ch. 3, and show numerically how the gap scales with SNR and modulation order.

Definition:

BICM Capacity

Fix a constellation X\mathcal{X} of size M=2LM = 2^L, a labelling ΞΌ\mu, a memoryless channel p(y∣x)p(y \mid x), and uniform inputs. The BICM capacity under labelling ΞΌ\mu is CBICM(ΞΌ)β€…β€Šβ‰œβ€…β€Šβˆ‘β„“=0Lβˆ’1I(Y;Bβ„“),C_{\rm BICM}(\mu) \;\triangleq\; \sum_{\ell = 0}^{L-1} I(Y; B_\ell), where Bβ„“B_\ell is the β„“\ell-th label bit with X=ΞΌ(B0,…,BLβˆ’1)X = \mu(B_0, \ldots, B_{L-1}) and the bits i.i.d.\ uniform. It is the supremum of rates achievable by any BICM scheme (binary code + ideal interleaver + mapper ΞΌ\mu + per-bit demapper) with vanishing error probability as the blocklength grows.

Equivalently, since each binary sub-channel has a binary-entropy interpretation, CBICM(ΞΌ)β€…β€Š=β€…β€ŠLβ€…β€Šβˆ’β€…β€Šβˆ‘β„“=0Lβˆ’1H(Bβ„“βˆ£Y),C_{\rm BICM}(\mu) \;=\; L \;-\; \sum_{\ell = 0}^{L-1} H(B_\ell \mid Y), so BICM capacity equals the total label entropy LL minus the sum of per-bit residual uncertainties after observing YY.

,

Theorem: The BICM Capacity Decomposition

Under the ideal-interleaver assumption, a BICM encoder with mapper ΞΌ\mu and a capacity-approaching binary code operating on the BICM bit channel (Thm. β‡’\Rightarrow Memoryless Parallel Bit Channels" data-ref-type="theorem">TIdeal Interleaver β‡’\Rightarrow Memoryless Parallel Bit Channels) can reliably communicate at any rate Rβ€…β€Š<β€…β€ŠCBICM(ΞΌ)β€…β€Š=β€…β€Šβˆ‘β„“=0Lβˆ’1I(Y;Bβ„“).R \;<\; C_{\rm BICM}(\mu) \;=\; \sum_{\ell = 0}^{L-1} I(Y; B_\ell). Conversely, no BICM scheme with the same mapper and per-bit demapping can exceed this rate with vanishing error probability.

Moreover, comparing with the CM capacity CCM=I(Y;X)=I(Y;B0,…,BLβˆ’1)C_{\rm CM} = I(Y; X) = I(Y; B_0, \ldots, B_{L-1}), we have CCMβˆ’CBICM(ΞΌ)β€…β€Š=β€…β€Šβˆ‘β„“=0Lβˆ’1[I(Y;Bβ„“βˆ£B<β„“)βˆ’I(Y;Bβ„“)]β€…β€Šβ‰₯β€…β€Š0,C_{\rm CM} - C_{\rm BICM}(\mu) \;=\; \sum_{\ell = 0}^{L-1} \bigl[I(Y; B_\ell \mid B_{<\ell}) - I(Y; B_\ell)\bigr] \;\ge\; 0, with equality iff the label bits are conditionally independent given YY β€” a non-generic condition that fails for every non-trivial QAM labelling.

The forward direction is achievability: with an ideal interleaver, the mixture bit channel is memoryless and has capacity 1Lβˆ‘β„“I(Y;Bβ„“)\tfrac{1}{L} \sum_\ell I(Y; B_\ell) (per use), and the coded-bit rate supported is LL times that, giving the stated sum. The converse follows from a mismatched-decoding argument: the BICM decoder uses the product metric βˆβ„“p(y∣Bβ„“)\prod_\ell p(y \mid B_\ell), not the true joint likelihood p(y∣X)p(y \mid X). The GMI under this product metric is exactly the sum of marginals. For the detailed GMI derivation see Ch. 7.

The gap formula is a direct application of the chain rule of mutual information β€” the same identity that drove the MLC capacity rule in Ch. 3. There, the chain rule was used to add conditional informations; here, the difference between the conditional sum and the unconditional sum is the BICM suboptimality.

,

CM vs BICM Capacity for QAM on AWGN

Four curves on the same axes: (i) Shannon capacity log⁑2(1+SNR)\log_2(1 + \text{SNR}), the ultimate upper bound; (ii) CM capacity CCMC_{\rm CM}, the capacity of an optimal MM-ary decoder on the uniform-X\mathcal{X} constellation; (iii) BICM capacity under Gray labelling, computed from the formula of Thm. TThe BICM Capacity Decomposition; (iv) BICM capacity under SP labelling, the same formula with the SP bit-channel capacities. For square QAM and all practical SNRs, CBICM,GrayC_{\rm BICM, Gray} is nearly on top of CCMC_{\rm CM}, while CBICM,SPC_{\rm BICM, SP} sits visibly below. Zoom into the region where Cβ‰ˆLC \approx L to see the low-SNR behaviour; zoom out to see the saturation at LL bits/symbol.

Parameters

BICM/CM Capacity Ratio Across Modulation Orders

How close does BICM get to CM? This plot shows the ratio CBICM(ΞΌ)/CCMC_{\rm BICM}(\mu) / C_{\rm CM} as a function of SNR for M=4,16,64M = 4, 16, 64. Under Gray labelling the ratio stays above 0.980.98 for QAM across the useful SNR range (above each modulation's "waterfall"); under SP labelling the ratio drops to 0.850.85-0.90.9 at moderate SNRs. Toggle the labelling dropdown to see the difference.

Parameters

Example: 16-QAM BICM Capacity at 10 dB

For 16-QAM at SNR=10\text{SNR} = 10 dB on the AWGN channel, compute CCMC_{\rm CM}, CBICM,GrayC_{\rm BICM, Gray}, and the Shannon capacity log⁑2(1+SNR)\log_2(1 + \text{SNR}). Express the Gray-BICM gap in both bits and equivalent dB of SNR.

The Formula Is Tight Only For the Right Metric

The capacity formula CBICM(ΞΌ)=βˆ‘β„“I(Y;Bβ„“)C_{\rm BICM}(\mu) = \sum_\ell I(Y; B_\ell) is tight only under a specific choice of decoding metric: the demapper must compute exact per-bit marginal likelihoods pWβ„“(y∣b)p_{W_\ell}(y \mid b) (or log-likelihoods). If the demapper instead uses a generic metric β€” say, a max-log LLR with non-ideal scaling, or a quantised soft metric β€” then the achievable rate is an upper-bounded by a generalised mutual information (GMI) that is strictly less than the BICM capacity and depends on the metric choice.

Three practical consequences:

  1. Max-log demappers lose a fraction of a dB relative to exact marginal demappers; this is the operationally relevant number in a 5G receiver, not the theoretical CBICMC_{\rm BICM} itself.
  2. Bit-LLR quantisation (say, 6 bits per LLR) further tightens the upper bound. Chapter 7 quantifies the loss with GMI-based analysis.
  3. The formula βˆ‘β„“I(Y;Bβ„“)\sum_\ell I(Y; B_\ell) is an upper bound (over all BICM decoders using the labelling ΞΌ\mu) and the achievable rate under the exact marginal metric. For this chapter we assume the exact metric unless noted otherwise.
,

Theorem: BICM Capacity β€” High-SNR Asymptotics

On the AWGN channel with constellation X\mathcal{X} of size M=2LM = 2^L and any Gray labelling ΞΌG\mu_G, lim⁑SNRβ†’βˆž[CCMβˆ’CBICM(ΞΌG)]β€…β€Š=β€…β€Š0.\lim_{\text{SNR} \to \infty} \bigl[C_{\rm CM} - C_{\rm BICM}(\mu_G)\bigr] \;=\; 0. That is, the Gray-BICM capacity converges to the CM capacity at high SNR. The convergence rate is exponential in SNR\text{SNR}: the gap decays like Q(SNRβ‹…dmin⁑/2)Q(\sqrt{\text{SNR}} \cdot d_{\min}/\sqrt{2}) times polynomial factors.

For any non-Gray labelling the limit gap is strictly positive; for SP the limit is bounded away from zero by at least (Lβˆ’1)βˆ’log⁑2(L)(L-1) - \log_2(L) bits for large LL.

At high SNR the symbol-error probability of the MM-ary constellation decays as Q(SNR dmin⁑/2)Q(\sqrt{\text{SNR}} \, d_{\min}/\sqrt{2}), and the dominant error events are nearest-neighbour confusions. Under Gray labelling, nearest-neighbour confusions flip one bit β€” so each bit position sees an almost-independent low-crossover-probability binary channel, and the product-metric decoding is near-optimal. Under SP labelling, nearest-neighbour confusions flip many bits at once, and the joint structure matters β€” a product metric drops a lot of information.

Common Mistake: BICM Is Not Optimal β€” Just Close Enough

Mistake:

Treating CBICMC_{\rm BICM} as if it were the capacity of the underlying channel.

Correction:

CBICM(ΞΌ)C_{\rm BICM}(\mu) is the capacity of the channel plus the decoder structure (single binary code + per-bit demapping). A joint MM-ary ML decoder achieves CCMβ‰₯CBICMC_{\rm CM} \ge C_{\rm BICM}. The BICM penalty is small (a few hundredths of a bit with Gray on square QAM) but strictly positive for all non-pathological cases. When we say "BICM is (nearly) optimal" we mean "nearly optimal among all schemes with the same modular-encoder constraint."

Quick Check

The capacity gap CCMβˆ’CBICM(ΞΌ)C_{\rm CM} - C_{\rm BICM}(\mu) equals

βˆ‘β„“I(Bβ„“;B<β„“βˆ£Y)\sum_\ell I(B_\ell; B_{<\ell} \mid Y)

βˆ‘β„“H(Bβ„“βˆ£Y)\sum_\ell H(B_\ell \mid Y)

I(X;Y)/LI(X; Y) / L

Zero whenever the bits are marginally independent

BICM Capacity

The maximum reliable rate of a BICM system with a given mapper ΞΌ\mu: CBICM(ΞΌ)=βˆ‘β„“=0Lβˆ’1I(Y;Bβ„“)C_{\rm BICM}(\mu) = \sum_{\ell=0}^{L-1} I(Y; B_\ell). Upper- bounded by CCM=I(Y;X)C_{\rm CM} = I(Y; X); the gap is a sum of conditional mutual informations that is non-negative and vanishes at high SNR under Gray labelling.

Related: The β„“\ell-th BICM Bit Channel, CM Capacity of a Uniform Input Constellation, Gray Labelling, Mismatched Maximum-Metric Decoding

Coded-Modulation (CM) Capacity

The capacity of a memoryless channel p(y∣x)p(y \mid x) when the input is uniform on a fixed constellation X\mathcal{X}: CCM=I(Y;X)C_{\rm CM} = I(Y; X). Achievable by a joint MM-ary decoder (e.g., MLC/MSD), upper bound on BICM capacity.

Related: BICM Capacity, CM / MLC / BICM β€” A Structural Side-by-Side, Shannon Capacity