BICM Mutual Information (Capacity)
The Main Theorem of BICM
Everything built so far β the BICM encoder, the ideal-interleaver model, the per-bit LLR β converges on a single question: what rate can a BICM system reliably sustain? The answer, first derived by Caire, Taricco, and Biglieri, is the BICM capacity:
This is arguably the most important formula of the chapter. It says that the achievable rate of BICM is the sum of the unconditional per-bit mutual informations, and that the sum depends on the labelling only through these marginals. Two labellings that produce the same achieve the same BICM capacity β the geometry of the constellation reaches the binary decoder only through these scalar numbers.
We now derive this formula carefully, compare it to the CM capacity of Ch. 3, and show numerically how the gap scales with SNR and modulation order.
Definition: BICM Capacity
BICM Capacity
Fix a constellation of size , a labelling , a memoryless channel , and uniform inputs. The BICM capacity under labelling is where is the -th label bit with and the bits i.i.d.\ uniform. It is the supremum of rates achievable by any BICM scheme (binary code + ideal interleaver + mapper + per-bit demapper) with vanishing error probability as the blocklength grows.
Equivalently, since each binary sub-channel has a binary-entropy interpretation, so BICM capacity equals the total label entropy minus the sum of per-bit residual uncertainties after observing .
Theorem: The BICM Capacity Decomposition
Under the ideal-interleaver assumption, a BICM encoder with mapper and a capacity-approaching binary code operating on the BICM bit channel (Thm. Memoryless Parallel Bit Channels" data-ref-type="theorem">TIdeal Interleaver Memoryless Parallel Bit Channels) can reliably communicate at any rate Conversely, no BICM scheme with the same mapper and per-bit demapping can exceed this rate with vanishing error probability.
Moreover, comparing with the CM capacity , we have with equality iff the label bits are conditionally independent given β a non-generic condition that fails for every non-trivial QAM labelling.
The forward direction is achievability: with an ideal interleaver, the mixture bit channel is memoryless and has capacity (per use), and the coded-bit rate supported is times that, giving the stated sum. The converse follows from a mismatched-decoding argument: the BICM decoder uses the product metric , not the true joint likelihood . The GMI under this product metric is exactly the sum of marginals. For the detailed GMI derivation see Ch. 7.
The gap formula is a direct application of the chain rule of mutual information β the same identity that drove the MLC capacity rule in Ch. 3. There, the chain rule was used to add conditional informations; here, the difference between the conditional sum and the unconditional sum is the BICM suboptimality.
For achievability, apply Shannon's coding theorem to the BICM mixture bit channel of Thm. Memoryless Parallel Bit Channels" data-ref-type="theorem">TIdeal Interleaver Memoryless Parallel Bit Channels and convert coded-bit rate to information-bit rate.
For the gap formula, write the chain rule for and subtract the unconditional sum term by term.
Show that each equals a non-negative conditional mutual information of with given .
Achievability: mixture-channel capacity
By Theorem Memoryless Parallel Bit Channels" data-ref-type="theorem">TIdeal Interleaver Memoryless Parallel Bit Channels, the binary decoder sees a memoryless channel with transition . Shannon's noisy-channel coding theorem states that this channel's capacity is where the second equality follows from the fact that is linear in the input distribution when the conditional is a mixture (by direct expansion of the mutual information or by noting that choosing uniformly at random acts as auxiliary randomness).
The BICM system sends coded bits per codeword and maps them to symbols. A code of rate delivers information bits. The spectral efficiency is therefore bits per symbol. Applying the binary-channel coding theorem gives i.e.\ .
Converse: mismatched-decoding upper bound
The BICM decoder's metric factorises over bits: . This is a mismatched likelihood with respect to the true joint symbol likelihood. The maximum reliable rate under any such mismatched decoder is upper-bounded by the generalised mutual information (GMI) β a standard result in the mismatched-decoding literature. For the product-form bit metric used in BICM, the GMI coincides with (Caire-Taricco-Biglieri Thm. 1; a clean derivation is in MartΓnezβGuillΓ©n i FΓ bregasβCaire 2008 and our Ch. 7). Hence no BICM scheme exceeds with vanishing error.
The CMβBICM gap via the chain rule
By the chain rule for mutual information, while by definition . Subtracting, (The term cancels.)
Each gap term is non-negative
Using the identity and the fact that the label bits are a priori independent (so for i.i.d.\ uniform priors), we obtain Conditional mutual information is non-negative, and is zero iff and are conditionally independent given . Hence , with the equality condition stated in the theorem.
CM vs BICM Capacity for QAM on AWGN
Four curves on the same axes: (i) Shannon capacity , the ultimate upper bound; (ii) CM capacity , the capacity of an optimal -ary decoder on the uniform- constellation; (iii) BICM capacity under Gray labelling, computed from the formula of Thm. TThe BICM Capacity Decomposition; (iv) BICM capacity under SP labelling, the same formula with the SP bit-channel capacities. For square QAM and all practical SNRs, is nearly on top of , while sits visibly below. Zoom into the region where to see the low-SNR behaviour; zoom out to see the saturation at bits/symbol.
Parameters
BICM/CM Capacity Ratio Across Modulation Orders
How close does BICM get to CM? This plot shows the ratio as a function of SNR for . Under Gray labelling the ratio stays above for QAM across the useful SNR range (above each modulation's "waterfall"); under SP labelling the ratio drops to - at moderate SNRs. Toggle the labelling dropdown to see the difference.
Parameters
Example: 16-QAM BICM Capacity at 10 dB
For 16-QAM at dB on the AWGN channel, compute , , and the Shannon capacity . Express the Gray-BICM gap in both bits and equivalent dB of SNR.
Numerical values
At dB ( linearly),
- Shannon: bits/symbol.
- 16-QAM bits/symbol (numerical integration of the AWGN likelihood over the 16-point constellation).
- Under Gray labelling, bits/symbol (sum of the four per-bit capacities).
Gray-BICM gap
bits. At the operating point ( bits), moving that much of the rate curve to the right costs about dB of SNR. This is the "cost" of using BICM instead of MLC with the same 16-QAM constellation.
Modulation-capacity gap
bits, or dB of SNR. This gap β the price of a uniform finite constellation vs a Gaussian input β dominates the BICM penalty by roughly a factor of six. The designer's first-order lever is constellation shaping (Ch. 4), not BICM-vs-MLC.
The punchline
BICM gives up almost nothing in capacity relative to MLC, and gives up far less than what a fixed QAM constellation already concedes relative to Gaussian signalling. This is why one binary code and a Gray-labelled QAM is the modular design of choice β the theoretical cost is negligible; the engineering savings are enormous.
The Formula Is Tight Only For the Right Metric
The capacity formula is tight only under a specific choice of decoding metric: the demapper must compute exact per-bit marginal likelihoods (or log-likelihoods). If the demapper instead uses a generic metric β say, a max-log LLR with non-ideal scaling, or a quantised soft metric β then the achievable rate is an upper-bounded by a generalised mutual information (GMI) that is strictly less than the BICM capacity and depends on the metric choice.
Three practical consequences:
- Max-log demappers lose a fraction of a dB relative to exact marginal demappers; this is the operationally relevant number in a 5G receiver, not the theoretical itself.
- Bit-LLR quantisation (say, 6 bits per LLR) further tightens the upper bound. Chapter 7 quantifies the loss with GMI-based analysis.
- The formula is an upper bound (over all BICM decoders using the labelling ) and the achievable rate under the exact marginal metric. For this chapter we assume the exact metric unless noted otherwise.
Theorem: BICM Capacity β High-SNR Asymptotics
On the AWGN channel with constellation of size and any Gray labelling , That is, the Gray-BICM capacity converges to the CM capacity at high SNR. The convergence rate is exponential in : the gap decays like times polynomial factors.
For any non-Gray labelling the limit gap is strictly positive; for SP the limit is bounded away from zero by at least bits for large .
At high SNR the symbol-error probability of the -ary constellation decays as , and the dominant error events are nearest-neighbour confusions. Under Gray labelling, nearest-neighbour confusions flip one bit β so each bit position sees an almost-independent low-crossover-probability binary channel, and the product-metric decoding is near-optimal. Under SP labelling, nearest-neighbour confusions flip many bits at once, and the joint structure matters β a product metric drops a lot of information.
At high SNR, write and bound using the dominant symbol-error event.
Under Gray labelling, a symbol error corresponds to exactly one bit error, so .
Under SP labelling, a symbol error can flip up to label bits at once; the joint entropy is strictly less than the sum of marginals.
High-SNR error event structure
At large the posterior concentrates on the ML-decoded and on its nearest neighbours. The probability of a symbol error is dominated by , where is the average number of nearest neighbours and is the minimum Euclidean distance.
Gray: one-bit flips dominate
Under Gray labelling, any nearest-neighbour pair differs in exactly one label bit. Therefore a symbol error at the ML decoder's margin corresponds to exactly one bit flip β and only one bit position has non-trivial residual uncertainty given . That is, where is the binary entropy. Summing over gives the same high-SNR decay as the joint entropy , so the gap vanishes exponentially.
SP: multi-bit flips cost
Under SP labelling, a nearest-neighbour flip at the ML decoder changes with probability , but also (with non-negligible probability) flips higher-level bits through neighbouring-coset confusion at higher intra-coset distances. The marginal entropies for small do not concentrate as tightly as their joint, and the gap stays bounded away from zero. Detailed tables are in Caire-Taricco-Biglieri 1998, Β§V.B.
Common Mistake: BICM Is Not Optimal β Just Close Enough
Mistake:
Treating as if it were the capacity of the underlying channel.
Correction:
is the capacity of the channel plus the decoder structure (single binary code + per-bit demapping). A joint -ary ML decoder achieves . The BICM penalty is small (a few hundredths of a bit with Gray on square QAM) but strictly positive for all non-pathological cases. When we say "BICM is (nearly) optimal" we mean "nearly optimal among all schemes with the same modular-encoder constraint."
Quick Check
The capacity gap equals
Zero whenever the bits are marginally independent
By the chain rule, the gap is , and each difference equals using the symmetry and the prior-independence of label bits. Each term is a non-negative conditional mutual information.
BICM Capacity
The maximum reliable rate of a BICM system with a given mapper : . Upper- bounded by ; the gap is a sum of conditional mutual informations that is non-negative and vanishes at high SNR under Gray labelling.
Related: The -th BICM Bit Channel, CM Capacity of a Uniform Input Constellation, Gray Labelling, Mismatched Maximum-Metric Decoding
Coded-Modulation (CM) Capacity
The capacity of a memoryless channel when the input is uniform on a fixed constellation : . Achievable by a joint -ary decoder (e.g., MLC/MSD), upper bound on BICM capacity.
Related: BICM Capacity, CM / MLC / BICM β A Structural Side-by-Side, Shannon Capacity