Ferkans — Interactive Telecom Tutor

Two Ways to Look at the Same Channel

There are two operational pictures of what the BICM decoder sees.

Picture A — the time-averaged mixture (Caire-Taricco-Biglieri 1998). At the output of the ideal interleaver, every coded bit is equally likely to have been mapped to any of the $L$ label positions inside a constellation symbol. The decoder's effective binary channel is therefore the uniform mixture $\tfrac{1}{L}\sum_\ell p_{W_\ell}(y\mid b)$ of the $L$ per-position bit channels. This is the view of Chapter 5 and it produces the capacity formula $C_{\rm BICM}(\mu) = \sum_\ell I(Y; B_\ell)$ by reading $I$ on the mixture channel.

Picture B — $L$ parallel independent binary channels (Wachsmann-Fischer-Huber 1999). Equivalently — and this is the viewpoint we adopt throughout this chapter — we imagine the BICM system as $L$ separate binary channels running in parallel, one per label position. Each parallel channel is $W_\ell$ ; each has its own capacity $C_\ell = I(Y; B_\ell)$ ; the code, after de-interleaving, sees the $L$ parallel channels selected uniformly in time. The sum $\sum_\ell C_\ell$ is then a direct information-theoretic consequence of the parallel-channel decomposition: the capacity of $L$ equally-used parallel channels in a time-sharing sense is the average of their capacities, and the per-symbol information rate is $L$ times that.

The two pictures are mathematically equivalent — they represent the same conditional joint law $p(y\mid b_0, \ldots, b_{L-1})$ decomposed in two different ways. Picture A is cleaner for defining the BICM capacity as a mutual information on a single scalar channel. Picture B is cleaner for the error-exponent / cutoff-rate / PEP analyses that we do in §4 and §5 — there the natural object is the product-metric Gallager function $E_0^{\mathrm{BICM}}(\rho) = -\log \sum_y \left( \tfrac{1}{L}\sum_\ell \sum_{b} \tfrac{1}{2} p_{W_\ell}(y\mid b)^{1/(1+\rho)} \right)^{1+\rho}$ , which is defined through the parallel-channel decomposition.

This section makes Picture B precise, proves the sum-capacity formula directly from parallel channels, and sets up the notation we use in §2 to expose the mismatched-decoding structure hiding inside BICM.

,

Definition:
BICM as $L$ Parallel Binary Channels

Fix a constellation $\mathcal{X}$ of size $M = 2^L$ , a labelling $\mu: \{0,1\}^L \to \mathcal{X}$ , and a memoryless channel law $p(y\mid x)$ with uniform inputs. The BICM parallel-channel model represents the BICM system as the time-shared use of $L$ independent binary-input channels $\{W_\ell\}_{\ell=0}^{L-1}$ , where the $\ell$ -th channel has:

Input: $B_\ell \in \{0,1\}$ with uniform prior.
Output: $Y$ in the original output alphabet of $p(y\mid x)$ .
Transition law: $p_{W_\ell}(y\mid b) \;=\; \frac{2}{M} \sum_{x \in \mathcal{X}_\ell^{(b)}} p(y\mid x),$ with $\mathcal{X}_\ell^{(b)} = \{x: \mu^{-1}(x)_\ell = b\}$ .

A BICM encoder that emits $N$ code bits through a mapper to $N/L$ symbols is, under the ideal-interleaver assumption, equivalent in law to a system that splits the $N$ code bits round-robin into $L$ streams of $N/L$ bits each, sends the $\ell$ -th stream through the binary channel $W_\ell$ , and reassembles the outputs at the receiver.

Wachsmann, Fischer, and Huber (1999) introduced this viewpoint in the context of MLC, where the parallel channels are the conditional bit channels $W_\ell^{\mathrm{MLC}}$ indexed by $B_{<\ell}$ . For BICM, the parallel channels are the marginal bit channels $W_\ell$ — conditioning has been removed by the average over the other $L-1$ bits. This is exactly the extra entropy that Gray labelling manages to almost completely hide in the sum $\sum_\ell C_\ell$ , and that set-partition labelling does not.

,

Theorem: BICM Capacity from Parallel Binary Channels

The information rate (in bits per channel use) achievable over the BICM parallel-channel model of Definition $L$ $L$ Parallel Binary Channels" data-ref-type="definition">DBICM as $L$ Parallel Binary Channels with a product-form random code whose $\ell$ -th component is capacity-achieving for $W_\ell$ is $C_{\rm BICM}(\mu) \;=\; \sum_{\ell = 0}^{L-1} I(Y; B_\ell) \;=\; \sum_{\ell = 0}^{L-1} C_\ell.$ This coincides with the BICM capacity formula derived in Chapter 5 (Thm. TThe BICM Capacity Decomposition) through the mixture viewpoint.

$L$ parallel channels carry, in parallel, the sum of their individual capacities. This is a one-line consequence of the independence of the channels: the code splits naturally into $L$ sub-codes, each operating on its own binary channel, and they do not interfere because the bit positions are marginally independent at the channel output (different symbols at different times).

Show Hint

Split the information rate into $L$ parallel streams; each stream sees channel $W_\ell$ and is decoded independently.

Shannon's theorem on each $W_\ell$ gives the achievable rate per stream; sum over streams.

Identify the result with the Chapter 5 formula by noting that the mixture-channel mutual information is $\tfrac{1}{L}\sum_\ell I(Y;B_\ell)$ , and the per-symbol information rate is $L$ times the per-bit rate.

Proof

Parallel-channel coding theorem

For $L$ independent parallel discrete memoryless channels with transition laws $\{p_\ell(y\mid b)\}_{\ell=1}^{L}$ and uniform inputs, the sum capacity achievable with independent codebooks on each sub-channel is $C_{\rm sum} \;=\; \sum_{\ell=0}^{L-1} I(Y; B_\ell) \text{ bits per vector use.}$ This is a standard result: the parallel channel is a product channel, its joint mutual information factorises because the output $Y_\ell$ of sub-channel $\ell$ depends only on its input $B_\ell$ , and the single-letterisation of Shannon's theorem gives the sum.

Apply to BICM

By Definition $L$ $L$ Parallel Binary Channels" data-ref-type="definition">DBICM as $L$ Parallel Binary Channels, BICM — under ideal interleaving and uniform inputs — is equivalent to the parallel channel $\prod_\ell W_\ell$ . Applying the parallel-channel coding theorem gives the achievable sum rate $\sum_\ell C_\ell$ per vector use, i.e., per constellation symbol.

Match the Chapter 5 formula

In Chapter 5 the BICM capacity was defined through the mixture channel seen by the binary decoder after ideal interleaving — one output per code bit — with transition law $p_{W_{\rm BICM}}(y\mid b) = \tfrac{1}{L}\sum_\ell p_{W_\ell}(y\mid b)$ . The mutual information of this mixture equals $\tfrac{1}{L}\sum_\ell I(Y; B_\ell)$ (average of the per-position MIs, because the mixing is uniform and the bits are marginally independent across positions). The per-code-bit rate is thus $\tfrac{1}{L}\sum_\ell C_\ell$ , and the per-symbol rate is $L$ times that — exactly $\sum_\ell C_\ell$ .

Pictures A and B therefore give the same numerical rate; they differ only in which channel you call "the BICM channel". $\blacksquare$

, ,

Example: QPSK with Gray Labelling: Two Identical Parallel BI-AWGN Channels

Consider QPSK with Gray labelling ( $M = 4$ , $L = 2$ ) on an AWGN channel with SNR $\text{SNR} = E_s/N_0 = 4$ dB. Show that the two BICM bit channels $W_0, W_1$ are both equivalent to a BI-AWGN channel with per-bit SNR $\text{SNR}_{b} = \text{SNR}$ , compute their common capacity $C_\ell$ , and verify that $C_{\rm BICM} = 2C_\ell$ .

Solution

Gray QPSK as two parallel BPSK channels

Under Gray labelling $(00, 01, 11, 10) \mapsto (-1-j, -1+j, +1+j, +1-j)$ (up to a scale), the first label bit $b_0$ controls only the real (in-phase) part of the transmitted symbol and the second bit $b_1$ controls only the imaginary (quadrature) part. Because the noise on the in-phase and quadrature components is independent, the mapping $(b_0, b_1) \to (I, Q)$ is channel-symmetric and the two bit channels decouple exactly into two BI-AWGN channels with equal per-bit SNR equal to the symbol SNR $\text{SNR}$ .

Compute the BI-AWGN capacity

The BI-AWGN capacity at SNR $\text{SNR}$ is $C_{\rm BI-AWGN}(\text{SNR}) \;=\; 1 - \int \tfrac{1}{\sqrt{2\pi}} e^{-z^2/2} \log_2(1 + e^{-2\sqrt{2\text{SNR}}\,z - 2\text{SNR}})\,dz.$ Numerically at $\text{SNR} = 4$ dB $\approx 2.51$ linear, the integral evaluates to $C_\ell \approx 0.914$ bits/use.

Sum over parallel channels

Therefore $C_{\rm BICM} = 2 \times 0.914 = 1.828$ bits/symbol. Since Gray QPSK is exactly the same as two independent BPSK transmissions, it is also the case that $C_{\rm CM} = 2 C_{\rm BI-AWGN} = 1.828$ bits/symbol — i.e. Gray QPSK has zero BICM-to-CM capacity gap, for all SNRs.

This is the only constellation-and-labelling pair in practical use that is BICM-optimal exactly. For $M \ge 16$ the labelling induces a strictly positive (but small under Gray) gap — the subject of §2.

Sanity check via parallel-channel theorem

Thm. TBICM Capacity from Parallel Binary Channels predicts $C_{\rm BICM} = \sum_\ell C_\ell = 2 C_\ell$ . Direct evaluation gives the same $1.828$ bits/symbol. The parallel-channel decomposition is tight for Gray-QPSK because the two bit channels are both marginally and jointly independent given $y$ — no chain-rule residual.

,

Cutoff Rate $R_0$ vs SNR for CM and BICM (Gray)

The cutoff rate $R_0 = E_0(1)$ is the Gallager parameter at $\rho = 1$ — a tighter rate threshold than capacity for sequential and list decoders. This plot compares $R_0^{\mathrm{CM}}$ (full $M$ -ary ML decoder) with $R_0^{\mathrm{BICM}}$ (product bit metric) for $M$ -QAM with Gray labelling, side-by-side with the respective capacities $C_{\rm CM}$ and $C_{\rm BICM}$ . The $(C, R_0)$ gap is the practical "margin for implementation" of any non-ML decoder; the $(C_{\rm CM} - C_{\rm BICM})$ gap is the labelling-induced cost of BICM as discussed in Chapter 5. The cutoff rate, being a tighter proxy for decoder work per bit, is the object of §5 of this chapter.

Parameters

QAM size

M

Picture A vs Picture B: Two Views of the BICM Channel

Aspect	Picture A — Mixture (Ch. 5)	Picture B — Parallel (this Chapter)
Effective channel	One binary channel with law $\tfrac{1}{L}\sum_\ell p_{W_\ell}(y\mid b)$	$L$ parallel binary channels $W_0, W_1, \ldots, W_{L-1}$ each with law $p_{W_\ell}$
MI formula	$I(\bar Y; B) = \tfrac{1}{L}\sum_\ell I(Y; B_\ell)$ bits per code bit	$C_{\rm sum} = \sum_\ell C_\ell$ bits per symbol (= $L$ code bits)
Capacity in bits/symbol	$L \cdot I(\bar Y; B) = \sum_\ell C_\ell$	$\sum_\ell C_\ell$
Natural for	Defining BICM capacity as MI on a single scalar channel (Ch. 5)	Error-exponent and cutoff-rate analyses (this chapter, §4–5)
Relation to MLC	Marginalised bit metric (loses chain-rule terms)	MLC parallel channels are conditional $W_\ell^{\mathrm{MLC}}$ ; BICM parallel channels are marginal $W_\ell$
Decoding implication	Single binary decoder on the mixture LLR stream	A product-metric decoder $\prod_\ell p_{W_\ell}(y\mid b_\ell)$ — same decoder, different bookkeeping

Where the Labelling Enters

Both pictures compute the same number: $\sum_\ell I(Y; B_\ell)$ . What depends on the labelling is how close this sum is to the CM capacity $I(Y; X) = I(Y; B_0, \ldots, B_{L-1})$ . The chain rule (Ch. 5 Thm. [?ch05:thm-cm-bicm-ordering]) gives $C_{\rm CM} - C_{\rm BICM}(\mu) \;=\; \sum_{\ell=1}^{L-1} I(B_\ell; B_{<\ell} \mid Y),$ each term being the "conditional coupling" that BICM throws away by marginalising. For Gray on QAM at high SNR every term is exponentially small; for SP on QAM they are $O(1)$ and dominate the gap. This chapter's task is to translate the chain-rule arithmetic into error-probability arithmetic via Gallager's random-coding exponent. The key insight is that the chain-rule residual reappears, but in the form of a mismatched-decoding penalty — which is the topic of §2.

Historical Note: Wachsmann, Fischer, Huber: Parallel Channels in MLC and BICM

1999

Udo Wachsmann, Robert Fischer, and Johannes Huber's 1999 IEEE Trans. IT paper, "Multilevel codes: theoretical concepts and practical design rules" (vol. 45, no. 5, pp. 1361–1391), is the canonical exposition of the parallel-channel framework for multilevel coding (MLC). The three authors, working at Erlangen, gave the definitive analysis of MLC under multi-stage decoding (MSD) and parallel-independent decoding (PID). The parallel framework, in which each bit level is treated as its own channel with its own code, is due to them in its modern rigorous form.

BICM, as Caire-Taricco-Biglieri had shown the previous year (1998), is the PID-without-conditioning version of MLC: the bit levels are decoded in parallel but the demapper averages over the other bits rather than conditioning on them. In the Wachsmann–Fischer–Huber taxonomy BICM is therefore "MLC with maximal marginalisation" — and the per-position capacities $C_\ell$ of BICM are exactly the marginal versions of the MLC conditional capacities $C_\ell^{\mathrm{MLC}} = I(Y; B_\ell \mid B_{<\ell})$ .

Putting BICM inside the parallel-channel picture — which Guillén-Martínez- Caire formalised in 2008 — is what allows the error-exponent machinery of Gallager to carry over almost unchanged: random coding, union bounds, Bhattacharyya factors, and cutoff rates all lift from the binary channels $W_\ell$ to BICM one level at a time, with the labelling entering only through the per-position channel laws. This is why §4 of this chapter is structurally so clean.

Common Mistake: Parallel $\ne$ Independent Given $Y$

Mistake:

Reading "BICM is $L$ parallel independent binary channels" as the claim that the $L$ label bits $B_0, \ldots, B_{L-1}$ of a single symbol are jointly independent conditional on $Y$ — and therefore concluding that the BICM decoder is as good as the joint-ML decoder.

Correction:

The label bits are marginally independent (and uniform) at the input, because they are the outputs of an ideal interleaver applied to a uniform code. At the output $Y$ , the posteriors $P(B_\ell = b \mid Y)$ generally do not factorise into a product: they share the dependence on the same $y$ through the joint $P(B_0, \ldots, B_{L-1} \mid Y)$ . The BICM decoder pretends they do factorise — it multiplies per-bit posteriors. That pretence is the product-metric mismatch of §2.

So the parallel-channel model is an operational picture of how the encoder and the decoder talk to each other; it is not a claim about the conditional joint distribution of bits given $Y$ . Confusing the two collapses the BICM-to-CM gap and misses the entire mismatched-decoding analysis of this chapter.

Quick Check

In the parallel-channel picture of BICM, the per-symbol information rate achievable by a product-form random code is

$L$ times the capacity of the mixture channel $\bar W = \tfrac{1}{L}\sum_\ell W_\ell$

The CM capacity $I(Y; X)$

The maximum of the individual bit capacities, $\max_\ell C_\ell$

$\log_2 M - \sum_\ell H(B_\ell \mid Y)$

Correction:

L

times the capacity of the mixture channel

\bar W = \tfrac{1}{L}\sum_\ell W_\ell

The mixture channel has capacity $\tfrac{1}{L}\sum_\ell C_\ell$ per code bit. Multiplying by $L$ code bits per symbol gives $\sum_\ell C_\ell$ bits per symbol — exactly the parallel-channel sum capacity. This is the content of Thm. TBICM Capacity from Parallel Binary Channels.

Parallel Channel (BICM)

The information-theoretic model in which a BICM system is represented as $L$ independent binary-input channels $W_0, \ldots, W_{L-1}$ operating in parallel, with $W_\ell$ being the marginal bit channel of label position $\ell$ . Due to Wachsmann-Fischer-Huber (1999). Equivalent in rate to the mixture-channel viewpoint of Chapter 5.

Mixture Channel (BICM)

The effective binary-input channel seen by the BICM binary decoder after ideal interleaving, whose transition law is the uniform mixture $\bar W(y\mid b) = \tfrac{1}{L}\sum_\ell p_{W_\ell}(y\mid b)$ of the $L$ per-position bit channels. Introduced in Ch. 5; equivalent in rate to the parallel-channel model of §1 of this chapter.

Parallel Binary Channels: The Wachsmann-Fischer-Huber View