Ferkans — Interactive Telecom Tutor

The Central Modelling Device of BICM

The capacity analysis of BICM rests on a single simplifying assumption — that the bit interleaver is so large and well chosen that any two bits belonging to the same constellation symbol come from effectively independent positions in the coded stream. Under this assumption, the decoder sees not an $M$ -ary channel but $L$ separate binary channels operating in parallel. Each binary channel is characterised by a single scalar per-bit metric, and their capacities add.

This section makes that picture precise. We define the per-bit channel carefully, prove that ideal interleaving forces it to be memoryless, and derive the soft LLR metric that the demapper uses. The derivation looks innocent, but it contains a subtle twist that every student of BICM must see: the bit channels are marginally independent from the decoder's viewpoint, but they are not jointly independent — they all depend on the same underlying $y$ . The BICM capacity we will derive in §3 is the correct capacity for a decoder that processes the bits as if they were marginally independent. This is a mismatched decoding rule, and the capacity is the corresponding generalised mutual information.

Definition:
The $\ell$ -th BICM Bit Channel

Fix a constellation $\mathcal{X}$ of size $M = 2^L$ , a labelling $\mu$ , a memoryless channel law $p(y \mid x)$ , and uniform inputs on $\mathcal{X}$ . For each bit position $\ell \in \{0, \ldots, L-1\}$ , the $\ell$ -th BICM bit channel $W_\ell$ is the scalar binary-input channel with:

Input alphabet $\{0,1\}$ , with uniform prior $P(B_\ell = 0) = P(B_\ell = 1) = \tfrac12$ .
Output $Y$ (taking values in the original channel's output alphabet).
Transition law $p_{W_\ell}(y \mid b) \;=\; \frac{1}{M/2} \sum_{x \in \mathcal{X}_\ell^{(b)}} p(y \mid x),$ where $\mathcal{X}_\ell^{(b)} = \{x \in \mathcal{X} : \mu^{-1}(x)_\ell = b\}$ is the subset of constellation points whose $\ell$ -th label bit is $b$ .

The soft bit metric or log-likelihood ratio is $\lambda_\ell(y) \;=\; \log \frac{p_{W_\ell}(y \mid 0)}{p_{W_\ell}(y \mid 1)} \;=\; \log \frac{\sum_{x \in \mathcal{X}_\ell^{(0)}} p(y \mid x)}{\sum_{x \in \mathcal{X}_\ell^{(1)}} p(y \mid x)}.$

This is the quantity the demapper computes and passes (through the de-interleaver) to the binary decoder.

The definition averages over the $M/2$ other bits: the bits at positions $\ell' \ne \ell$ are marginalised out uniformly. This is what makes the bit channel scalar; it is also what costs capacity relative to MLC, which keeps those bits as conditioning variables.

Theorem: Ideal Interleaver $\Rightarrow$ Memoryless Parallel Bit Channels

Consider the BICM encoder of Definition DBICM Encoder with an ideal interleaver — i.e., a random permutation selected uniformly from the symmetric group on $N$ elements, revealed to both transmitter and receiver. Then from the binary decoder's viewpoint, the per-bit LLR stream $\{\lambda_n\}_{n=1}^N$ is produced by a memoryless binary-input channel $W_{\rm BICM}$ whose transition law is the mixture $p_{W_{\rm BICM}}(y \mid b) \;=\; \frac{1}{L} \sum_{\ell = 0}^{L-1} p_{W_\ell}(y \mid b).$ Equivalently, the BICM bit channel is the time-averaged random parallel channel over the $L$ bit positions, with each position selected with probability $1/L$ .

The interleaver shuffles coded bits so thoroughly that the $n$ -th coded bit is, from the decoder's perspective, equally likely to have been mapped to any of the $L$ label positions inside any constellation symbol — with all other bits of that symbol drawn independently from the interleaver's output. This converts the structured joint symbol channel into the mixture channel above, which is memoryless by construction.

Show Hint

Fix a coded-bit position $n$ . Conditional on the interleaver, it is mapped to some label position $\ell_n$ inside some constellation symbol.

The other $L-1$ bits of that symbol come from other parts of the codeword and, through the interleaver's randomness, are uniform and independent at long block lengths.

Average the symbol channel transition law over this randomness.

Proof

Conditional transition law

Fix any coded-bit index $n$ and condition on the interleaver $\pi$ and on the event $\pi(n) = (s, \ell)$ — i.e., coded bit $n$ is mapped to label position $\ell$ of constellation symbol $s$ . Let $B_\ell$ denote the value of that bit and $B_{\ne \ell}$ denote the $L-1$ other bits of symbol $s$ . Then the output $Y_s$ received from symbol $s$ satisfies $p(y_s \mid b, \pi, \pi(n) = (s, \ell)) \;=\; \sum_{\mathbf{b}' \in \{0,1\}^{L-1}} P(B_{\ne \ell} = \mathbf{b}') \, p(y_s \mid x = \mu(b \text{ at } \ell, \mathbf{b}')).$

Ideal interleaver $\Rightarrow$ uniform other bits

Under the ideal-interleaver assumption, the $L-1$ other bits of symbol $s$ originate from coded positions that are effectively independent of position $n$ , and in the capacity-approaching random- coding regime they are asymptotically uniform i.i.d.\ on $\{0,1\}^{L-1}$ . Hence $P(B_{\ne \ell} = \mathbf{b}') = 2^{-(L-1)}$ , and the transition law becomes the per-bit channel law $p(y_s \mid b, \pi(n) = (s, \ell)) \;=\; p_{W_\ell}(y_s \mid b),$ which is exactly the $\ell$ -th BICM bit channel from Definition $\ell$ $ℓ$ -th BICM Bit Channel" data-ref-type="definition">DThe $\ell$ -th BICM Bit Channel.

Average over label position

The interleaver is uniform, so $P(\pi(n) = (s, \ell)) = 1/L$ for each of the $L$ label positions inside symbol $s$ . Averaging over this uniform choice gives $p_{W_{\rm BICM}}(y \mid b) \;=\; \frac{1}{L} \sum_{\ell = 0}^{L-1} p_{W_\ell}(y \mid b).$ This is the channel seen by the binary decoder.

Memorylessness

The claim of memorylessness requires that the joint law of two coded bits $n_1 \ne n_2$ factorise into the product of individual laws. This holds because (i) different coded bits end up in different symbols (with probability approaching one as $N \to \infty$ ), and (ii) conditional on the interleaver, the noise processes of different symbols are independent. The ideal-interleaver mixture channel is therefore i.i.d.\ in the temporal direction. $\blacksquare$

,

Max-Log Per-Bit LLR Computation at the BICM Demapper

Complexity:

O(M \cdot L)

arithmetic operations per received symbol

Input: received symbol

y \in \mathbb{C}

; constellation

\mathcal{X} = \{x_1, \ldots, x_M\}

; labelling

\mu

; noise variance

{\sigma^2}^{2}

Output: vector of LLRs

(\lambda_0, \lambda_1, \ldots, \lambda_{L-1})

1. Compute

d_i \leftarrow \|y - x_i\|^2

for

i = 1, \ldots, M

2. for

\ell = 0, \ldots, L-1

do

3.

\quad d_\ell^{(0)} \leftarrow \min_{i : \mu^{-1}(x_i)_\ell = 0} d_i

4.

\quad d_\ell^{(1)} \leftarrow \min_{i : \mu^{-1}(x_i)_\ell = 1} d_i

5.

\quad \lambda_\ell \leftarrow (d_\ell^{(1)} - d_\ell^{(0)}) / {\sigma^2}^{2}

6. end for

7. return

(\lambda_0, \ldots, \lambda_{L-1})

This is the max-log approximation: it replaces $\log \sum_i \exp(-d_i / {\sigma^2}^{2})$ by $-\min_i d_i / {\sigma^2}^{2}$ . For moderate to high SNR the approximation is tight to within a fraction of a dB; for 5G NR and Wi-Fi receivers this is the standard implementation.

Per-Level Bit-Channel Capacities $C_\ell$ vs. SNR

Each of the $L = \log_2 M$ BICM bit channels has its own capacity $C_\ell = I(Y; B_\ell)$ , which depends on the labelling. Under Gray labelling, all $L$ curves look qualitatively similar — each bit sees an "average" binary-input AWGN-like channel. Under SP labelling, the curves split: the top-level bit $b_0$ has near-Shannon capacity (because it lives on well-separated cosets), while the bottom-level bit $b_{L-1}$ has near-zero capacity at low SNR (its cosets overlap heavily). This split is exactly why SP is optimal for MLC (one matches a low-rate code to $b_0$ and a high-rate code to $b_{L-1}$ ) and suboptimal for BICM (the sum of unconditional capacities is smaller).

Parameters

QAM size

M

Bit labelling

Example: The Four Bit Channels of 16-QAM at 10 dB

For 16-QAM at $\text{SNR} = 10$ dB under Gray labelling, compute the four bit- channel capacities $C_{0}, C_{1}, C_{2}, C_{3}$ and compare to the SP labelling. Explain the pattern.

Solution

Gray labelling — approximate symmetry

Under Gray labelling, 16-QAM decomposes as two independent Gray- coded 4-PAM components. The four BICM bit channels are therefore two copies each of the two 4-PAM Gray bit channels. Numerically, at $\text{SNR} = 10$ dB, $(C_{0}^{I}, C_{1}^{I}) \approx (0.92, 0.66) \text{ bits},$ and the same for the $Q$ component. The sum is $C_{\rm BICM, Gray} \approx 2(0.92 + 0.66) \approx 3.16$ bits.

SP labelling — dramatic split

Under SP labelling the capacities split into a highly asymmetric set: $(C_{0}, C_{1}, C_{2}, C_{3}) \approx (0.99, 0.96, 0.67, 0.15).$ The top two bit channels are nearly saturated (their cosets are maximally separated), while $b_3$ is close to useless in isolation because its two cosets overlap in Euclidean space. The sum is $C_{\rm BICM, SP} \approx 2.77$ bits — lower than Gray by $\approx 0.4$ bits.

The punchline

Gray labelling is not maximising any single $C_\ell$ ; it is maximising the sum $\sum_\ell C_\ell$ . Each Gray bit sees a somewhat less-than-perfect binary channel, but the four of them add up to more than the SP sum. Geometrically, Gray spreads information evenly across bits; SP concentrates it in the top bits. Adding uniformly-useful sub-channels beats adding an almost-saturated sub- channel to an almost-useless one — at least in the unconditional capacity sense. MLC recovers the SP "wasted" bits through conditioning; BICM cannot.

BICM as Mismatched Decoding

Strictly speaking, the BICM decoder treats the $L$ label bits of each symbol as if they were outputs of independent binary channels, whereas they are in fact deterministic functions of the same $y$ . This makes the BICM decoding metric a mismatched metric — it is not the true likelihood of the codeword given $y$ . The capacity of a channel under a mismatched decoder is the generalised mutual information (GMI), and the BICM capacity formula $\sum_\ell I(Y; B_\ell)$ is in fact the GMI under the product-form decoding metric. Chapter 7 revisits this mismatch perspective rigorously following Martínez–Guillén i Fàbregas–Caire (2008). For now, remember: the capacity we are about to derive is operationally correct for the BICM decoder as implemented, not for an optimal $M$ -ary ML decoder — which would of course achieve $C_{\rm CM}$ .

Common Mistake: "Independent Bit Channels" — Marginal, Not Joint

Mistake:

Assuming that the $L$ bit positions $B_0, B_1, \ldots, B_{L-1}$ of a constellation symbol are jointly independent given $Y$ .

Correction:

They are marginally independent in the sense that the BICM capacity formula treats each $I(Y; B_\ell)$ separately — but they are all deterministic functions of the same $y$ , so jointly conditional on $Y$ they have a specific joint distribution that generally does not factorise. The quantity $I(Y; B_0, B_1, \ldots, B_{L-1})$ (the CM capacity) is therefore greater than or equal to $\sum_\ell I(Y; B_\ell)$ (the BICM capacity), with equality only when the joint conditional really does factorise — which is a knife-edge condition.

The BICM decoder acts as if the bits were jointly independent; that mismatch is precisely what costs it the gap to CM capacity.

Quick Check

Under the ideal-interleaver assumption, the BICM bit channel seen by the binary decoder is

an $M$ -ary channel whose output is $y$ and whose input is the $L$ -tuple $(B_0, \ldots, B_{L-1})$

a uniform mixture of the $L$ per-position binary channels $W_\ell$ , each selected with probability $1/L$

a deterministic channel (no noise) because interleaving removes correlations

a BSC with crossover probability equal to the symbol error rate of the constellation

Correction:

a uniform mixture of the

L

per-position binary channels

W_\ell

, each selected with probability

1/L

The ideal interleaver distributes coded bits uniformly over the $L$ label positions, so the decoder sees the mixture $\frac{1}{L} \sum_\ell p_{W_\ell}(y \mid b)$ . This is the key modelling reduction of BICM: the $M$ -ary symbol channel becomes a scalar binary channel.

Bit Channel (BICM)

The scalar binary-input channel $W_\ell$ seen by bit position $\ell$ after marginalising out the other $L-1$ label bits and the interleaver. Its transition law is the average of the full-symbol likelihood over the two subsets $\mathcal{X}_\ell^{(0)}$ and $\mathcal{X}_\ell^{(1)}$ . The sum of its capacities across $\ell$ is the BICM capacity.

Log-Likelihood Ratio (LLR)

For a binary-input channel, $\lambda = \log \frac{P(B=0 \mid y)}{P(B=1 \mid y)}$ . The sufficient soft statistic for optimal binary decoding; modern BICM receivers compute one LLR per bit position per received symbol and feed them to a soft-input binary decoder.

Independent Parallel Bit Channels