Ferkans — Interactive Telecom Tutor

Why BICM: Modularity Wins

The point of BICM is to make coded modulation modular. With a single off-the-shelf binary code and a bit interleaver, we can drive any QAM (or PSK, or APSK) constellation. The code does not need to know what the constellation is; the constellation does not need to know what the code is. They are connected by a permutation of indices. The cost is a small gap to CM capacity; the gain is immense engineering simplicity.

To appreciate the stakes, remember what MLC (Ch. 3) asks of the designer: a separate binary code per partition level, each designed at a different rate matched to that level's conditional capacity. For 64-QAM that is six codes, and if the standard must also support 16-QAM and 256-QAM, you are at eighteen codes — every one a complex LDPC optimisation. BICM replaces this zoo with a single binary code whose rate is adjusted by puncturing or shortening. The modulation order and the code become independent design dimensions.

Ungerboeck's TCM (Ch. 2) and Imai's MLC (Ch. 3) tightly couple code to modulation. BICM is the decoupling. This section introduces the paradigm and block diagram; subsequent sections quantify the capacity cost.

Definition:
BICM Encoder

A bit-interleaved coded modulation (BICM) encoder is a cascade of three stages:

A binary encoder $\mathcal{E}: \{0,1\}^K \to \{0,1\}^N$ of rate $R_c = K/N$ — any capacity-approaching binary linear code (LDPC, polar, turbo, convolutional) works.
A bit interleaver $\pi$ , a fixed permutation of the $N$ coded bits.
A memoryless mapper with constellation $\mathcal{X}$ and labelling $\mu : \{0,1\}^L \to \mathcal{X}$ , where $L = \log_2 M$ . The mapper groups the interleaved bits into blocks of $L$ and maps each block to one constellation point.

The overall spectral efficiency is $\eta = R_c \cdot L$ bits per 2D symbol. The interleaver is assumed ideal: it randomises the order of coded bits enough that, at the demapper, the $L$ bit positions inside each QAM symbol appear as if drawn independently from the output of $L$ separate binary codes. This independence assumption is the central modelling device of BICM.

In the original Caire-Taricco-Biglieri analysis the interleaver is modelled as a uniformly random permutation; finite-length interleavers are addressed in Ch. 6 on error-probability analysis. The important modelling property is that no two bits landing in the same QAM symbol came from adjacent positions in the code trellis or factor graph.

,

Definition:
BICM Demapper + Decoder

The BICM receiver matches the encoder in reverse:

A bit-level demapper produces, from each received symbol $y$ , a vector of $L$ soft bit metrics — typically log-likelihood ratios $\lambda_\ell(y) = \log \frac{P(B_\ell = 0 \mid y)}{P(B_\ell = 1 \mid y)}$ for $\ell = 0, \ldots, L-1$ .
A de-interleaver $\pi^{-1}$ reorders these LLRs back into the coded-bit order.
A binary decoder (matched to the code used in the encoder — e.g., a belief-propagation LDPC decoder) processes the LLR stream and outputs the information bits.

The crucial structural point is that the decoder sees a stream of per-bit LLRs, not symbols. From the decoder's viewpoint, the channel is a single binary-input soft-output channel. The constellation and labelling are hidden inside the LLR computation.

The demapper can be implemented by exact marginalisation $P(B_\ell = b \mid y) = \sum_{x \in \mathcal{X}_\ell^{(b)}} P(Y = y \mid x) P(x) / P(y)$ , or by the max-log approximation $\lambda_\ell(y) \approx \min_{x \in \mathcal{X}_\ell^{(1)}} \|y - x\|^2 - \min_{x \in \mathcal{X}_\ell^{(0)}} \|y - x\|^2$ , which costs only $M$ distance evaluations per symbol and is what every 5G NR receiver actually runs.

BICM Encoder and Decoder — Top: BICM encoder. Information bits $\mathbf{u}$ are encoded by a single binary code, interleaved by $\pi$ , and mapped to constellation symbols via labelling $\mu$ . Bottom: BICM decoder. The demapper produces $L$ LLRs per received symbol, de-interleaves them, and feeds a single binary decoder. The elegance of BICM is that the decoder sees a scalar binary channel; the constellation is hidden inside the LLR computation.

BICM Transmitter and Receiver: The Paradigm

Animated walk-through of the BICM encoder-channel-decoder chain. Bits flow into a binary LDPC encoder, pass through the interleaver, get grouped into

L

-tuples, and drive a 16-QAM mapper. At the receiver, the demapper produces per-bit LLRs, de-interleaves, and feeds the LDPC decoder. The video highlights how the constellation and code remain modular — swapping 16-QAM for 64-QAM does not change the LDPC decoder at all.

The modularity principle of BICM: one binary code drives any constellation through the interleaver + labelling combination.

Definition:
Gray Labelling

A labelling $\mu : \{0,1\}^L \to \mathcal{X}$ is Gray if every pair of nearest-neighbour constellation points differ in exactly one label bit. Equivalently, for each pair $(x_1, x_2) \in \mathcal{X}^2$ with minimum Euclidean distance, $d_H(\mu^{-1}(x_1), \mu^{-1}(x_2)) = 1$ .

The canonical 16-QAM Gray labelling decomposes the four label bits into two independent Gray codes, one for the in-phase 4-PAM component and one for the quadrature 4-PAM component. Explicitly, the 4-PAM Gray code maps $00 \to -3, 01 \to -1, 11 \to +1, 10 \to +3$ . The four bits of 16-QAM are then $(b_0^I, b_1^I, b_0^Q, b_1^Q)$ , each pair independently Gray-coded.

Gray labellings exist for every square and rectangular QAM ( $2^k \times 2^k$ and $2^k \times 2^{k+1}$ ), for every $2^k$ -PSK, and for several but not all cross constellations. For non-square shapes like 32-APSK (as in DVB-S2), a quasi-Gray labelling is used that minimises the average number of bit flips between nearest neighbours.

,

Definition:
Set-Partition (Ungerboeck) Labelling

A labelling $\mu : \{0,1\}^L \to \mathcal{X}$ is set-partition (SP) (or Ungerboeck) if the bits $b_0, b_1, \ldots, b_{L-1}$ index the levels of Ungerboeck's set-partition tree from coarsest to finest:

Setting $b_0$ picks one of the two top-level cosets — the two cosets are maximally separated, with intra-coset minimum distance $\Delta_0 = 2 d_{\min}(\mathcal{X})$ .
Given $b_0$ , setting $b_1$ picks one of the two sub-cosets with intra-coset distance $\Delta_1 = 2\Delta_0$ .
And so on, each bit doubling the intra-coset distance at its level.

SP labelling places information in a strictly hierarchical way: $b_0$ carries the coarsest information, $b_{L-1}$ the finest. This structure is exactly what MLC (Ch. 3) exploits when it assigns progressively higher-rate codes to progressively cleaner bit levels.

SP is optimal for MLC because the conditional bit capacities $I(Y; B_\ell \mid B_{<\ell})$ grow with $\ell$ : the top bit is the hardest to decode, the bottom bit the easiest. For BICM (unconditional decoding) the ordering is reversed and irrelevant — what matters is the unconditional marginal capacity $I(Y; B_\ell)$ , which Gray labelling maximises.

,

🎓CommIT Contribution(1998)

Bit-Interleaved Coded Modulation

G. Caire, G. Taricco, E. Biglieri — IEEE Trans. Inf. Theory, vol. 44, no. 3, pp. 927–946

This 1998 paper by Caire, Taricco, and Biglieri is the foundational BICM paper — the reference that every subsequent work on coded modulation for wireless channels cites. Its three technical contributions are each decisive for the field:

(i) BICM as parallel independent binary channels. The paper formalises the now-standard model in which an ideal interleaver decomposes the $M$ -ary channel into $L$ parallel independent binary channels, one per label position. The BICM capacity is then the sum of the individual bit-channel capacities:

$C_{\rm BICM}(\mu) \;=\; \sum_{\ell = 0}^{L-1} I(Y; B_\ell).$

This decomposition, along with a mismatched-decoding converse that the paper proves separately (Thm. 1), turns the analysis of BICM into the analysis of $L$ scalar binary-input channels — a massive simplification over the joint $M$ -ary analysis.

(ii) Gray labelling is (near-)optimal on AWGN. Theorem 3 and the numerical tables of §V establish that, for the AWGN channel with square QAM and Gray labelling, the BICM capacity is within a few tenths of a bit of the coded-modulation capacity $C_{\rm CM}$ over the whole practical SNR range. This makes BICM essentially capacity-achieving with a single binary code and answers the design question left open by Zehavi's 1992 8-PSK precursor.

(iii) PEP and diversity analysis on fading channels. Section IV derives the pairwise error probability (PEP) of a BICM system over a fading channel and identifies the BICM diversity order as the product of the code's free distance and the minimum number of distinct bit positions between codewords at that distance. The result: BICM automatically exploits the fading diversity inherent in the interleaved code, without requiring a fading-specific code design. We develop this analysis in Chapter 6.

Why it revolutionised wireless. Before 1998, coded modulation for wireless meant either TCM (monolithic, per-channel designs) or MLC (multiple codes per modulation). Caire-Taricco-Biglieri showed that a single binary code, a bit interleaver, and a Gray labelling achieve essentially the same capacity with a fraction of the design burden. Every modern wireless standard — 5G NR (LDPC + QPSK/16/64/256/1024-QAM), Wi-Fi 6 and 7 (LDPC + up to 4096-QAM), LTE (turbo + QAM), and DVB-S2/S2X (LDPC + QPSK/8-PSK/APSK) — uses BICM as its core MCS paradigm. This is arguably the single most influential coded-modulation result of the last three decades, and it is Caire's foundational contribution to the field.

bicmcoded-modulationmutual-informationcapacitygray-labelingView Paper →

Key Takeaway

BICM decouples coding from modulation. A single binary code drives any constellation via a bit interleaver and a labelling. The cost is a small capacity gap; the benefit is modular, rate-adaptable design. This is why every modern wireless standard is BICM, and why Chapter 5 is the pivotal chapter of this book.

Historical Note: From Zehavi's 8-PSK to the CTB Framework

1992–1998

The practical idea of putting an interleaver between a binary code and a non-binary mapper predates the capacity theory. Ephraim Zehavi's 1992 paper, "8-PSK trellis codes for a Rayleigh channel," observed that on a fading channel — where TCM's Ungerboeck labelling loses its diversity advantage — one could instead bit-interleave a binary convolutional code and drive an 8-PSK mapper. Zehavi's motivation was diversity: the interleaver breaks up consecutive symbol errors into single-bit errors, which a strong binary code handles cleanly.

Zehavi's construction worked, but the capacity-theoretic explanation was missing. Why does this work? How close to CM capacity does it come? Does the labelling matter, and if so, which is best?

Six years later, Caire, Taricco, and Biglieri provided the answers in a single 20-page paper. They formalised BICM as a parallel-bit-channel model, derived the capacity formula, proved Gray labelling is near-optimal on AWGN, and analysed the diversity order on fading channels. The paper is a rare example of theory that explained a practical design decision and guided all subsequent ones. It shaped DVB-S2 in 2003, LTE in 2008, and every 5G NR and Wi-Fi profile since.

,

Why This Matters: Every Modern Wireless Standard Is BICM

A short tour of the modulation-and-coding section of every recent wireless standard:

5G NR (3GPP TS 38.212): a single LDPC base graph (two variants for long/short codewords) feeds every QPSK / 16 / 64 / 256 / 1024-QAM. Rate matching (puncturing + shortening) provides the rate adaptation. Gray labelling on all QAM constellations.
Wi-Fi 6 / 6E (IEEE 802.11ax) and Wi-Fi 7 (802.11be): LDPC driving BPSK through 1024-QAM (WiFi 6) or 4096-QAM (WiFi 7). Gray labelling.
DVB-S2 / S2X (ETSI EN 302 307): LDPC + BCH outer code, driving QPSK / 8-PSK / 16-APSK / 32-APSK / up to 256-APSK in S2X. Quasi-Gray labelling on APSK.
LTE: turbo code with rate matching, driving QPSK/16/64/256-QAM. Still BICM; the code family changed for 5G but the paradigm did not.

In every case the structure is the same: one binary code + bit interleaver + mapper. Chapter 9 of this book examines the standards-level details; this chapter establishes why that structure is near-optimal in capacity.

, ,

Quick Check

Which of the following is not a component of the standard BICM encoder?

A single binary forward-error-correction code

A bit-level interleaver $\pi$

$L = \log_2 M$ separate binary codes, one per label level

A constellation-and-labelling mapper $\mu : \{0,1\}^L \to \mathcal{X}$

Correction:

L = \log_2 M

separate binary codes, one per label level

This is the MLC (multilevel coding) encoder from Chapter 3 — one code per bit level. BICM uses a single binary code; its modularity is precisely that the same code drives any constellation. Using $L$ codes is characteristic of MLC and is what BICM was designed to avoid.

Bit-Interleaved Coded Modulation (BICM)

A coded-modulation scheme consisting of a single binary code, a bit interleaver, and a higher-order modulation mapper. The receiver computes per-bit log-likelihood ratios from each received symbol, de-interleaves them, and feeds them to a single binary decoder. Introduced by Zehavi (1992) and analysed information-theoretically by Caire-Taricco-Biglieri (1998). The dominant coded-modulation paradigm in every modern wireless standard.

Gray Labelling

A labelling of a constellation in which every pair of nearest-neighbour points differs in exactly one label bit. For QAM it decomposes into two independent PAM Gray codes. Gray labelling empirically maximises the BICM capacity on AWGN and is used in every modern wireless standard.

Set-Partition (Ungerboeck) Labelling

A labelling in which the label bits index the levels of Ungerboeck's set-partition tree, from coarsest (top bit) to finest (bottom bit). Optimal for MLC/MSD; strictly suboptimal for BICM on AWGN but useful for BICM-ID (Chapter 8) where iterative feedback makes the SP structure exploitable.

The BICM Paradigm