Parallel Binary Channels: The Wachsmann-Fischer-Huber View
Two Ways to Look at the Same Channel
There are two operational pictures of what the BICM decoder sees.
Picture A — the time-averaged mixture (Caire-Taricco-Biglieri 1998). At the output of the ideal interleaver, every coded bit is equally likely to have been mapped to any of the label positions inside a constellation symbol. The decoder's effective binary channel is therefore the uniform mixture of the per-position bit channels. This is the view of Chapter 5 and it produces the capacity formula by reading on the mixture channel.
Picture B — parallel independent binary channels (Wachsmann-Fischer-Huber 1999). Equivalently — and this is the viewpoint we adopt throughout this chapter — we imagine the BICM system as separate binary channels running in parallel, one per label position. Each parallel channel is ; each has its own capacity ; the code, after de-interleaving, sees the parallel channels selected uniformly in time. The sum is then a direct information-theoretic consequence of the parallel-channel decomposition: the capacity of equally-used parallel channels in a time-sharing sense is the average of their capacities, and the per-symbol information rate is times that.
The two pictures are mathematically equivalent — they represent the same conditional joint law decomposed in two different ways. Picture A is cleaner for defining the BICM capacity as a mutual information on a single scalar channel. Picture B is cleaner for the error-exponent / cutoff-rate / PEP analyses that we do in §4 and §5 — there the natural object is the product-metric Gallager function , which is defined through the parallel-channel decomposition.
This section makes Picture B precise, proves the sum-capacity formula directly from parallel channels, and sets up the notation we use in §2 to expose the mismatched-decoding structure hiding inside BICM.
Definition: BICM as Parallel Binary Channels
BICM as Parallel Binary Channels
Fix a constellation of size , a labelling , and a memoryless channel law with uniform inputs. The BICM parallel-channel model represents the BICM system as the time-shared use of independent binary-input channels , where the -th channel has:
- Input: with uniform prior.
- Output: in the original output alphabet of .
- Transition law: with .
A BICM encoder that emits code bits through a mapper to symbols is, under the ideal-interleaver assumption, equivalent in law to a system that splits the code bits round-robin into streams of bits each, sends the -th stream through the binary channel , and reassembles the outputs at the receiver.
Wachsmann, Fischer, and Huber (1999) introduced this viewpoint in the context of MLC, where the parallel channels are the conditional bit channels indexed by . For BICM, the parallel channels are the marginal bit channels — conditioning has been removed by the average over the other bits. This is exactly the extra entropy that Gray labelling manages to almost completely hide in the sum , and that set-partition labelling does not.
Theorem: BICM Capacity from Parallel Binary Channels
The information rate (in bits per channel use) achievable over the BICM parallel-channel model of Definition Parallel Binary Channels" data-ref-type="definition">DBICM as Parallel Binary Channels with a product-form random code whose -th component is capacity-achieving for is This coincides with the BICM capacity formula derived in Chapter 5 (Thm. TThe BICM Capacity Decomposition) through the mixture viewpoint.
parallel channels carry, in parallel, the sum of their individual capacities. This is a one-line consequence of the independence of the channels: the code splits naturally into sub-codes, each operating on its own binary channel, and they do not interfere because the bit positions are marginally independent at the channel output (different symbols at different times).
Split the information rate into parallel streams; each stream sees channel and is decoded independently.
Shannon's theorem on each gives the achievable rate per stream; sum over streams.
Identify the result with the Chapter 5 formula by noting that the mixture-channel mutual information is , and the per-symbol information rate is times the per-bit rate.
Parallel-channel coding theorem
For independent parallel discrete memoryless channels with transition laws and uniform inputs, the sum capacity achievable with independent codebooks on each sub-channel is This is a standard result: the parallel channel is a product channel, its joint mutual information factorises because the output of sub-channel depends only on its input , and the single-letterisation of Shannon's theorem gives the sum.
Apply to BICM
By Definition Parallel Binary Channels" data-ref-type="definition">DBICM as Parallel Binary Channels, BICM — under ideal interleaving and uniform inputs — is equivalent to the parallel channel . Applying the parallel-channel coding theorem gives the achievable sum rate per vector use, i.e., per constellation symbol.
Match the Chapter 5 formula
In Chapter 5 the BICM capacity was defined through the mixture channel seen by the binary decoder after ideal interleaving — one output per code bit — with transition law . The mutual information of this mixture equals (average of the per-position MIs, because the mixing is uniform and the bits are marginally independent across positions). The per-code-bit rate is thus , and the per-symbol rate is times that — exactly .
Pictures A and B therefore give the same numerical rate; they differ only in which channel you call "the BICM channel".
Example: QPSK with Gray Labelling: Two Identical Parallel BI-AWGN Channels
Consider QPSK with Gray labelling (, ) on an AWGN channel with SNR dB. Show that the two BICM bit channels are both equivalent to a BI-AWGN channel with per-bit SNR , compute their common capacity , and verify that .
Gray QPSK as two parallel BPSK channels
Under Gray labelling (up to a scale), the first label bit controls only the real (in-phase) part of the transmitted symbol and the second bit controls only the imaginary (quadrature) part. Because the noise on the in-phase and quadrature components is independent, the mapping is channel-symmetric and the two bit channels decouple exactly into two BI-AWGN channels with equal per-bit SNR equal to the symbol SNR .
Compute the BI-AWGN capacity
The BI-AWGN capacity at SNR is Numerically at dB linear, the integral evaluates to bits/use.
Sum over parallel channels
Therefore bits/symbol. Since Gray QPSK is exactly the same as two independent BPSK transmissions, it is also the case that bits/symbol — i.e. Gray QPSK has zero BICM-to-CM capacity gap, for all SNRs.
This is the only constellation-and-labelling pair in practical use that is BICM-optimal exactly. For the labelling induces a strictly positive (but small under Gray) gap — the subject of §2.
Sanity check via parallel-channel theorem
Thm. TBICM Capacity from Parallel Binary Channels predicts . Direct evaluation gives the same bits/symbol. The parallel-channel decomposition is tight for Gray-QPSK because the two bit channels are both marginally and jointly independent given — no chain-rule residual.
Cutoff Rate vs SNR for CM and BICM (Gray)
The cutoff rate is the Gallager parameter at — a tighter rate threshold than capacity for sequential and list decoders. This plot compares (full -ary ML decoder) with (product bit metric) for -QAM with Gray labelling, side-by-side with the respective capacities and . The gap is the practical "margin for implementation" of any non-ML decoder; the gap is the labelling-induced cost of BICM as discussed in Chapter 5. The cutoff rate, being a tighter proxy for decoder work per bit, is the object of §5 of this chapter.
Parameters
Picture A vs Picture B: Two Views of the BICM Channel
| Aspect | Picture A — Mixture (Ch. 5) | Picture B — Parallel (this Chapter) |
|---|---|---|
| Effective channel | One binary channel with law | parallel binary channels each with law |
| MI formula | bits per code bit | bits per symbol (= code bits) |
| Capacity in bits/symbol | ||
| Natural for | Defining BICM capacity as MI on a single scalar channel (Ch. 5) | Error-exponent and cutoff-rate analyses (this chapter, §4–5) |
| Relation to MLC | Marginalised bit metric (loses chain-rule terms) | MLC parallel channels are conditional ; BICM parallel channels are marginal |
| Decoding implication | Single binary decoder on the mixture LLR stream | A product-metric decoder — same decoder, different bookkeeping |
Where the Labelling Enters
Both pictures compute the same number: . What depends on the labelling is how close this sum is to the CM capacity . The chain rule (Ch. 5 Thm. [?ch05:thm-cm-bicm-ordering]) gives each term being the "conditional coupling" that BICM throws away by marginalising. For Gray on QAM at high SNR every term is exponentially small; for SP on QAM they are and dominate the gap. This chapter's task is to translate the chain-rule arithmetic into error-probability arithmetic via Gallager's random-coding exponent. The key insight is that the chain-rule residual reappears, but in the form of a mismatched-decoding penalty — which is the topic of §2.
Historical Note: Wachsmann, Fischer, Huber: Parallel Channels in MLC and BICM
1999Udo Wachsmann, Robert Fischer, and Johannes Huber's 1999 IEEE Trans. IT paper, "Multilevel codes: theoretical concepts and practical design rules" (vol. 45, no. 5, pp. 1361–1391), is the canonical exposition of the parallel-channel framework for multilevel coding (MLC). The three authors, working at Erlangen, gave the definitive analysis of MLC under multi-stage decoding (MSD) and parallel-independent decoding (PID). The parallel framework, in which each bit level is treated as its own channel with its own code, is due to them in its modern rigorous form.
BICM, as Caire-Taricco-Biglieri had shown the previous year (1998), is the PID-without-conditioning version of MLC: the bit levels are decoded in parallel but the demapper averages over the other bits rather than conditioning on them. In the Wachsmann–Fischer–Huber taxonomy BICM is therefore "MLC with maximal marginalisation" — and the per-position capacities of BICM are exactly the marginal versions of the MLC conditional capacities .
Putting BICM inside the parallel-channel picture — which Guillén-Martínez- Caire formalised in 2008 — is what allows the error-exponent machinery of Gallager to carry over almost unchanged: random coding, union bounds, Bhattacharyya factors, and cutoff rates all lift from the binary channels to BICM one level at a time, with the labelling entering only through the per-position channel laws. This is why §4 of this chapter is structurally so clean.
Common Mistake: Parallel Independent Given
Mistake:
Reading "BICM is parallel independent binary channels" as the claim that the label bits of a single symbol are jointly independent conditional on — and therefore concluding that the BICM decoder is as good as the joint-ML decoder.
Correction:
The label bits are marginally independent (and uniform) at the input, because they are the outputs of an ideal interleaver applied to a uniform code. At the output , the posteriors generally do not factorise into a product: they share the dependence on the same through the joint . The BICM decoder pretends they do factorise — it multiplies per-bit posteriors. That pretence is the product-metric mismatch of §2.
So the parallel-channel model is an operational picture of how the encoder and the decoder talk to each other; it is not a claim about the conditional joint distribution of bits given . Confusing the two collapses the BICM-to-CM gap and misses the entire mismatched-decoding analysis of this chapter.
Quick Check
In the parallel-channel picture of BICM, the per-symbol information rate achievable by a product-form random code is
times the capacity of the mixture channel
The CM capacity
The maximum of the individual bit capacities,
The mixture channel has capacity per code bit. Multiplying by code bits per symbol gives bits per symbol — exactly the parallel-channel sum capacity. This is the content of Thm. TBICM Capacity from Parallel Binary Channels.
Parallel Channel (BICM)
The information-theoretic model in which a BICM system is represented as independent binary-input channels operating in parallel, with being the marginal bit channel of label position . Due to Wachsmann-Fischer-Huber (1999). Equivalent in rate to the mixture-channel viewpoint of Chapter 5.
Related: BICM as Parallel Binary Channels, Bit Channel (BICM), Wachsmann, Fischer, Huber: Parallel Channels in MLC and BICM, Mixture Channel (BICM)
Mixture Channel (BICM)
The effective binary-input channel seen by the BICM binary decoder after ideal interleaving, whose transition law is the uniform mixture of the per-position bit channels. Introduced in Ch. 5; equivalent in rate to the parallel-channel model of §1 of this chapter.
Related: Bit Channel (BICM), Parallel Channel (BICM), BICM Capacity