Parallel Binary Channels: The Wachsmann-Fischer-Huber View

Two Ways to Look at the Same Channel

There are two operational pictures of what the BICM decoder sees.

Picture A — the time-averaged mixture (Caire-Taricco-Biglieri 1998). At the output of the ideal interleaver, every coded bit is equally likely to have been mapped to any of the LL label positions inside a constellation symbol. The decoder's effective binary channel is therefore the uniform mixture 1LpW(yb)\tfrac{1}{L}\sum_\ell p_{W_\ell}(y\mid b) of the LL per-position bit channels. This is the view of Chapter 5 and it produces the capacity formula CBICM(μ)=I(Y;B)C_{\rm BICM}(\mu) = \sum_\ell I(Y; B_\ell) by reading II on the mixture channel.

Picture B — LL parallel independent binary channels (Wachsmann-Fischer-Huber 1999). Equivalently — and this is the viewpoint we adopt throughout this chapter — we imagine the BICM system as LL separate binary channels running in parallel, one per label position. Each parallel channel is WW_\ell; each has its own capacity C=I(Y;B)C_\ell = I(Y; B_\ell); the code, after de-interleaving, sees the LL parallel channels selected uniformly in time. The sum C\sum_\ell C_\ell is then a direct information-theoretic consequence of the parallel-channel decomposition: the capacity of LL equally-used parallel channels in a time-sharing sense is the average of their capacities, and the per-symbol information rate is LL times that.

The two pictures are mathematically equivalent — they represent the same conditional joint law p(yb0,,bL1)p(y\mid b_0, \ldots, b_{L-1}) decomposed in two different ways. Picture A is cleaner for defining the BICM capacity as a mutual information on a single scalar channel. Picture B is cleaner for the error-exponent / cutoff-rate / PEP analyses that we do in §4 and §5 — there the natural object is the product-metric Gallager function E0BICM(ρ)=logy(1Lb12pW(yb)1/(1+ρ))1+ρE_0^{\mathrm{BICM}}(\rho) = -\log \sum_y \left( \tfrac{1}{L}\sum_\ell \sum_{b} \tfrac{1}{2} p_{W_\ell}(y\mid b)^{1/(1+\rho)} \right)^{1+\rho}, which is defined through the parallel-channel decomposition.

This section makes Picture B precise, proves the sum-capacity formula directly from parallel channels, and sets up the notation we use in §2 to expose the mismatched-decoding structure hiding inside BICM.

,

Definition:

BICM as LL Parallel Binary Channels

Fix a constellation X\mathcal{X} of size M=2LM = 2^L, a labelling μ:{0,1}LX\mu: \{0,1\}^L \to \mathcal{X}, and a memoryless channel law p(yx)p(y\mid x) with uniform inputs. The BICM parallel-channel model represents the BICM system as the time-shared use of LL independent binary-input channels {W}=0L1\{W_\ell\}_{\ell=0}^{L-1}, where the \ell-th channel has:

  • Input: B{0,1}B_\ell \in \{0,1\} with uniform prior.
  • Output: YY in the original output alphabet of p(yx)p(y\mid x).
  • Transition law: pW(yb)  =  2MxX(b)p(yx),p_{W_\ell}(y\mid b) \;=\; \frac{2}{M} \sum_{x \in \mathcal{X}_\ell^{(b)}} p(y\mid x), with X(b)={x:μ1(x)=b}\mathcal{X}_\ell^{(b)} = \{x: \mu^{-1}(x)_\ell = b\}.

A BICM encoder that emits NN code bits through a mapper to N/LN/L symbols is, under the ideal-interleaver assumption, equivalent in law to a system that splits the NN code bits round-robin into LL streams of N/LN/L bits each, sends the \ell-th stream through the binary channel WW_\ell, and reassembles the outputs at the receiver.

Wachsmann, Fischer, and Huber (1999) introduced this viewpoint in the context of MLC, where the parallel channels are the conditional bit channels WMLCW_\ell^{\mathrm{MLC}} indexed by B<B_{<\ell}. For BICM, the parallel channels are the marginal bit channels WW_\ell — conditioning has been removed by the average over the other L1L-1 bits. This is exactly the extra entropy that Gray labelling manages to almost completely hide in the sum C\sum_\ell C_\ell, and that set-partition labelling does not.

,

Theorem: BICM Capacity from Parallel Binary Channels

The information rate (in bits per channel use) achievable over the BICM parallel-channel model of Definition LL Parallel Binary Channels" data-ref-type="definition">DBICM as LL Parallel Binary Channels with a product-form random code whose \ell-th component is capacity-achieving for WW_\ell is CBICM(μ)  =  =0L1I(Y;B)  =  =0L1C.C_{\rm BICM}(\mu) \;=\; \sum_{\ell = 0}^{L-1} I(Y; B_\ell) \;=\; \sum_{\ell = 0}^{L-1} C_\ell. This coincides with the BICM capacity formula derived in Chapter 5 (Thm. TThe BICM Capacity Decomposition) through the mixture viewpoint.

LL parallel channels carry, in parallel, the sum of their individual capacities. This is a one-line consequence of the independence of the channels: the code splits naturally into LL sub-codes, each operating on its own binary channel, and they do not interfere because the bit positions are marginally independent at the channel output (different symbols at different times).

, ,

Example: QPSK with Gray Labelling: Two Identical Parallel BI-AWGN Channels

Consider QPSK with Gray labelling (M=4M = 4, L=2L = 2) on an AWGN channel with SNR SNR=Es/N0=4\text{SNR} = E_s/N_0 = 4 dB. Show that the two BICM bit channels W0,W1W_0, W_1 are both equivalent to a BI-AWGN channel with per-bit SNR SNRb=SNR\text{SNR}_{b} = \text{SNR}, compute their common capacity CC_\ell, and verify that CBICM=2CC_{\rm BICM} = 2C_\ell.

,

Cutoff Rate R0R_0 vs SNR for CM and BICM (Gray)

The cutoff rate R0=E0(1)R_0 = E_0(1) is the Gallager parameter at ρ=1\rho = 1 — a tighter rate threshold than capacity for sequential and list decoders. This plot compares R0CMR_0^{\mathrm{CM}} (full MM-ary ML decoder) with R0BICMR_0^{\mathrm{BICM}} (product bit metric) for MM-QAM with Gray labelling, side-by-side with the respective capacities CCMC_{\rm CM} and CBICMC_{\rm BICM}. The (C,R0)(C, R_0) gap is the practical "margin for implementation" of any non-ML decoder; the (CCMCBICM)(C_{\rm CM} - C_{\rm BICM}) gap is the labelling-induced cost of BICM as discussed in Chapter 5. The cutoff rate, being a tighter proxy for decoder work per bit, is the object of §5 of this chapter.

Parameters

Picture A vs Picture B: Two Views of the BICM Channel

AspectPicture A — Mixture (Ch. 5)Picture B — Parallel (this Chapter)
Effective channelOne binary channel with law 1LpW(yb)\tfrac{1}{L}\sum_\ell p_{W_\ell}(y\mid b)LL parallel binary channels W0,W1,,WL1W_0, W_1, \ldots, W_{L-1} each with law pWp_{W_\ell}
MI formulaI(Yˉ;B)=1LI(Y;B)I(\bar Y; B) = \tfrac{1}{L}\sum_\ell I(Y; B_\ell) bits per code bitCsum=CC_{\rm sum} = \sum_\ell C_\ell bits per symbol (= LL code bits)
Capacity in bits/symbolLI(Yˉ;B)=CL \cdot I(\bar Y; B) = \sum_\ell C_\ellC\sum_\ell C_\ell
Natural forDefining BICM capacity as MI on a single scalar channel (Ch. 5)Error-exponent and cutoff-rate analyses (this chapter, §4–5)
Relation to MLCMarginalised bit metric (loses chain-rule terms)MLC parallel channels are conditional WMLCW_\ell^{\mathrm{MLC}}; BICM parallel channels are marginal WW_\ell
Decoding implicationSingle binary decoder on the mixture LLR streamA product-metric decoder pW(yb)\prod_\ell p_{W_\ell}(y\mid b_\ell) — same decoder, different bookkeeping

Where the Labelling Enters

Both pictures compute the same number: I(Y;B)\sum_\ell I(Y; B_\ell). What depends on the labelling is how close this sum is to the CM capacity I(Y;X)=I(Y;B0,,BL1)I(Y; X) = I(Y; B_0, \ldots, B_{L-1}). The chain rule (Ch. 5 Thm. [?ch05:thm-cm-bicm-ordering]) gives CCMCBICM(μ)  =  =1L1I(B;B<Y),C_{\rm CM} - C_{\rm BICM}(\mu) \;=\; \sum_{\ell=1}^{L-1} I(B_\ell; B_{<\ell} \mid Y), each term being the "conditional coupling" that BICM throws away by marginalising. For Gray on QAM at high SNR every term is exponentially small; for SP on QAM they are O(1)O(1) and dominate the gap. This chapter's task is to translate the chain-rule arithmetic into error-probability arithmetic via Gallager's random-coding exponent. The key insight is that the chain-rule residual reappears, but in the form of a mismatched-decoding penalty — which is the topic of §2.

Historical Note: Wachsmann, Fischer, Huber: Parallel Channels in MLC and BICM

1999

Udo Wachsmann, Robert Fischer, and Johannes Huber's 1999 IEEE Trans. IT paper, "Multilevel codes: theoretical concepts and practical design rules" (vol. 45, no. 5, pp. 1361–1391), is the canonical exposition of the parallel-channel framework for multilevel coding (MLC). The three authors, working at Erlangen, gave the definitive analysis of MLC under multi-stage decoding (MSD) and parallel-independent decoding (PID). The parallel framework, in which each bit level is treated as its own channel with its own code, is due to them in its modern rigorous form.

BICM, as Caire-Taricco-Biglieri had shown the previous year (1998), is the PID-without-conditioning version of MLC: the bit levels are decoded in parallel but the demapper averages over the other bits rather than conditioning on them. In the Wachsmann–Fischer–Huber taxonomy BICM is therefore "MLC with maximal marginalisation" — and the per-position capacities CC_\ell of BICM are exactly the marginal versions of the MLC conditional capacities CMLC=I(Y;BB<)C_\ell^{\mathrm{MLC}} = I(Y; B_\ell \mid B_{<\ell}).

Putting BICM inside the parallel-channel picture — which Guillén-Martínez- Caire formalised in 2008 — is what allows the error-exponent machinery of Gallager to carry over almost unchanged: random coding, union bounds, Bhattacharyya factors, and cutoff rates all lift from the binary channels WW_\ell to BICM one level at a time, with the labelling entering only through the per-position channel laws. This is why §4 of this chapter is structurally so clean.

Common Mistake: Parallel \ne Independent Given YY

Mistake:

Reading "BICM is LL parallel independent binary channels" as the claim that the LL label bits B0,,BL1B_0, \ldots, B_{L-1} of a single symbol are jointly independent conditional on YY — and therefore concluding that the BICM decoder is as good as the joint-ML decoder.

Correction:

The label bits are marginally independent (and uniform) at the input, because they are the outputs of an ideal interleaver applied to a uniform code. At the output YY, the posteriors P(B=bY)P(B_\ell = b \mid Y) generally do not factorise into a product: they share the dependence on the same yy through the joint P(B0,,BL1Y)P(B_0, \ldots, B_{L-1} \mid Y). The BICM decoder pretends they do factorise — it multiplies per-bit posteriors. That pretence is the product-metric mismatch of §2.

So the parallel-channel model is an operational picture of how the encoder and the decoder talk to each other; it is not a claim about the conditional joint distribution of bits given YY. Confusing the two collapses the BICM-to-CM gap and misses the entire mismatched-decoding analysis of this chapter.

Quick Check

In the parallel-channel picture of BICM, the per-symbol information rate achievable by a product-form random code is

LL times the capacity of the mixture channel Wˉ=1LW\bar W = \tfrac{1}{L}\sum_\ell W_\ell

The CM capacity I(Y;X)I(Y; X)

The maximum of the individual bit capacities, maxC\max_\ell C_\ell

log2MH(BY)\log_2 M - \sum_\ell H(B_\ell \mid Y)

Parallel Channel (BICM)

The information-theoretic model in which a BICM system is represented as LL independent binary-input channels W0,,WL1W_0, \ldots, W_{L-1} operating in parallel, with WW_\ell being the marginal bit channel of label position \ell. Due to Wachsmann-Fischer-Huber (1999). Equivalent in rate to the mixture-channel viewpoint of Chapter 5.

Related: BICM as LL Parallel Binary Channels, Bit Channel (BICM), Wachsmann, Fischer, Huber: Parallel Channels in MLC and BICM, Mixture Channel (BICM)

Mixture Channel (BICM)

The effective binary-input channel seen by the BICM binary decoder after ideal interleaving, whose transition law is the uniform mixture Wˉ(yb)=1LpW(yb)\bar W(y\mid b) = \tfrac{1}{L}\sum_\ell p_{W_\ell}(y\mid b) of the LL per-position bit channels. Introduced in Ch. 5; equivalent in rate to the parallel-channel model of §1 of this chapter.

Related: Bit Channel (BICM), Parallel Channel (BICM), BICM Capacity