Ferkans — Interactive Telecom Tutor

Decoding Level by Level

The capacity rule of s02 is an achievability statement: there exist codes at rates $R_i = C_i$ that, together with the right receiver, approach the CM capacity. What is the right receiver?

An optimal joint decoder would weigh all $2^{nR_0} \times 2^{nR_1} \times \cdots \times 2^{nR_{L-1}}$ possible label-bit sequences against the received signal — exponentially many in the total rate. That is out of reach in practice. Multistage decoding (MSD) instead exploits the conditioning structure of the capacity rule directly: decode the levels one at a time, passing the decoded bits as side information to the next stage. The receiver's work at stage $i$ is a single-level binary decode of complexity $O(2^{nR_i})$ (already exponential, but managed by the binary-decoder pipeline), and the total complexity is the sum, not the product.

The price of this simplification is conceptually subtle: if a stage makes a decoding error, the next stage receives a wrong history and operates on a shifted coset. This is error propagation, and understanding it is what this section is about.

,

Multistage Decoding (MSD) for Multilevel Codes

Complexity:

O\bigl(\sum_{i=0}^{L-1} n \cdot 2^{L-i} + D_i(n, R_i)\bigr)

, where

D_i(n, R_i)

is the decoding cost of the binary code

\mathcal{C}_i

. The first term sums per-symbol LLR computations over the shrinking coset sizes

2^{L-i}

; the second term is the per-stage binary decode.

Input: Received signal

\mathbf{y} = (y^{(1)}, \ldots, y^{(n)})

,

n

channel uses,

L

binary codes

\mathcal{C}_0, \ldots, \mathcal{C}_{L-1}

of rates

R_0, \ldots, R_{L-1}

, partition-based labelling

\mu

.

Output: Estimates

\hat{b}_0, \hat{b}_1, \ldots, \hat{b}_{L-1}

of the

binary codewords at all levels.

1. Stage 0 (top level, coarsest partition).

2.

\quad

For each

t = 1, \ldots, n

, compute the bit-level log-likelihood ratio

3.

\quad\quad \Lambda_0^{(t)} \;=\; \log \dfrac{\sum_{x \in \mathcal{A}_1^{(0)}} p(y^{(t)} \mid x)}{\sum_{x \in \mathcal{A}_1^{(1)}} p(y^{(t)} \mid x)}

4.

\quad

Feed

\{\Lambda_0^{(t)}\}

to the decoder of

\mathcal{C}_0

; obtain

\hat{b}_0 = (\hat b_0^{(1)}, \ldots, \hat b_0^{(n)})

.

5. For

i = 1, 2, \ldots, L - 1

do:

6.

\quad

For each

t = 1, \ldots, n

, compute the conditional LLR

7.

\quad\quad \Lambda_i^{(t)} \;=\; \log \dfrac{\sum_{x \in \mathcal{A}_{i+1}^{(\hat b_0^{(t)}, \ldots, \hat b_{i-1}^{(t)}, 0)}} p(y^{(t)} \mid x)}{\sum_{x \in \mathcal{A}_{i+1}^{(\hat b_0^{(t)}, \ldots, \hat b_{i-1}^{(t)}, 1)}} p(y^{(t)} \mid x)}

8.

\quad

Feed

\{\Lambda_i^{(t)}\}

to the decoder of

\mathcal{C}_i

; obtain

\hat{b}_i = (\hat b_i^{(1)}, \ldots, \hat b_i^{(n)})

.

9. End for.

10. Return

(\hat{b}_0, \hat{b}_1, \ldots, \hat{b}_{L-1})

.

At stage $i$ the LLR computation involves a sum over the $2^{L-i}$ points inside the current coset — a natural consequence of marginalising over the still-unknown levels $i+1, \ldots, L-1$ . The MLC/MSD combination therefore has linear complexity in $L$ , compared with joint decoding's exponential complexity in the total rate.

,

Effective BER at MSD Stages with Error Propagation

The plot shows the effective bit-error rate at each MSD stage as a function of SNR, assuming the previous stage completes with a given residual BER. For small prior-stage BER (say $10^{-4}$ ) the penalty is negligible. As the prior BER grows, the later-stage curve shifts rightward: with some probability the wrong coset is assumed and the effective squared distance is halved, costing up to $3$ dB. This is the quantitative statement of error propagation.

Parameters

Prior stage BER0.001

Theorem: MSD Achieves the CM Capacity

Let $\{(C_i)\}_{i=0}^{L-1}$ be the binary sub-channel capacities defined in s02. For every $\epsilon > 0$ there exist binary codes of lengths $n_i$ and rates $R_i = C_i - \epsilon$ such that the MSD algorithm above recovers the transmitted label bits with aggregate error probability going to zero as $\min_i n_i \to \infty$ . The total rate $\sum_i R_i \to \sum_i C_i = C_{\rm CM}$ . Consequently, MSD achieves the full CM capacity, and does so with per-stage complexity determined by binary decoding alone.

At stage 0 the decoder operates on a binary channel of capacity $C_0$ at rate $C_0 - \epsilon$ , so its error probability vanishes. At stage $i > 0$ the decoder operates on a channel whose capacity is $C_i$ conditional on correct history. Because history errors vanish by the previous stages' reliability, the stage- $i$ error probability vanishes as well. A union bound over $L$ stages then gives a vanishing total error probability.

Show Hint

Use the binary noisy-channel coding theorem at each stage.

Condition on the event 'all previous stages decoded correctly' and bound its complement by a sum of per-stage error probabilities.

A union bound over $L$ stages preserves exponential decay of error probability.

Proof

Per-stage error probability decays exponentially

At stage $i$ , given correct decoded history $(\hat b_0, \ldots, \hat b_{i-1}) = (b_0, \ldots, b_{i-1})$ , the LLR computation produces likelihoods for a binary channel of capacity $C_i$ . Using a capacity-achieving binary code at rate $R_i = C_i - \epsilon$ with length $n$ , the stage- $i$ conditional error probability satisfies $P_{e,i}^{\rm cond} \le 2^{-n E_i(\epsilon)}$ for some $E_i(\epsilon) > 0$ (Gallager random-coding exponent).

Union bound across stages

Let $\mathcal{E}_i$ be the event that stage $i$ produces a wrong codeword. The aggregate error event is $\mathcal{E} = \bigcup_{i=0}^{L-1} \mathcal{E}_i$ . By the chain $P_{e, i}^{\rm cond} = \Pr(\mathcal{E}_i \mid \bigcap_{j < i} \overline{\mathcal{E}}_j)$ ,

$\Pr(\mathcal{E}) \;\le\; \sum_{i=0}^{L-1} \Pr\bigl(\mathcal{E}_i \;\Big|\; \bigcap_{j<i} \overline{\mathcal{E}}_j\bigr) \;\le\; L \cdot \max_i 2^{-n E_i(\epsilon)}.$

This tends to zero as $n \to \infty$ because $L$ is a constant (depending only on the constellation) and $\min_i E_i(\epsilon) > 0$ .

Total rate approaches capacity

The total information rate is $\sum_i R_i = \sum_i (C_i - \epsilon) = C_{\rm CM} - L\epsilon$ . As $\epsilon \to 0$ , the aggregate rate approaches $C_{\rm CM}$ , with vanishing aggregate error probability. $\blacksquare$

,

Theorem: Error-Propagation Bound at Stage $i$

Let $p_{<i} = \Pr(\text{at least one of } \hat b_0, \ldots, \hat b_{i-1} \text{ is in error on a given symbol})$ and let $P_{e,i}^{\rm genie}$ be the stage- $i$ bit-error probability when the history is delivered by a genie. The actual stage- $i$ bit-error probability in MSD satisfies

$P_{e,i} \;\le\; (1 - p_{<i}) \, P_{e,i}^{\rm genie} \;+\; p_{<i} \, P_{e,i}^{\rm wrong\text{-}coset} \;\le\; P_{e,i}^{\rm genie} + p_{<i},$

where $P_{e,i}^{\rm wrong\text{-}coset} \le 1$ is the worst-case bit-error probability when the stage- $i$ decoder operates on the wrong coset.

On a given symbol, either the history is correct (probability $1 - p_{<i}$ , genie-like performance) or it is wrong (probability $p_{<i}$ , at most worst-case error). The bound says the extra error due to propagation is at most the probability of a history error.

Show Hint

Condition on the event $H_i = \{(\hat b_0, \ldots, \hat b_{i-1}) = (b_0, \ldots, b_{i-1})\}$ .

Apply the law of total probability.

Bound the conditional error under $\overline{H}_i$ by $1$ (worst-case).

Proof

Conditioning on the history event

Let $H_i$ be the event that the decoded history on a given symbol matches the true history. By the law of total probability,

$P_{e,i} \;=\; \Pr(H_i) \, P(\text{bit-}i \text{ wrong} \mid H_i) + \Pr(\overline{H}_i) \, P(\text{bit-}i \text{ wrong} \mid \overline{H}_i).$

Under $H_i$ the stage- $i$ decoder sees the genie channel, giving $P_{e,i}^{\rm genie}$ .

Worst-case bound on the propagated term

Under $\overline{H}_i$ , the stage- $i$ decoder operates on a shifted coset and its bit-error probability is at most $1$ (a trivial upper bound) and at least $P_{e,i}^{\rm wrong\text{-}coset}$ . Using $\Pr(\overline{H}_i) = p_{<i}$ and bounding the worst case by $1$ ,

$P_{e,i} \;\le\; (1 - p_{<i}) P_{e,i}^{\rm genie} + p_{<i} \cdot 1 \;\le\; P_{e,i}^{\rm genie} + p_{<i}.$

$\blacksquare$

Example: Quantifying the Propagation Penalty at Level 1

Suppose an MSD receiver for 8-PSK operates with stage-0 residual BER $p_{<1} = 10^{-3}$ . The genie stage-1 BER at some SNR is $P_{e,1}^{\rm genie} = 10^{-4}$ (hypothetical). Under a pessimistic model, a history error at stage 0 causes stage 1 to operate with effective squared distance halved (from $d_1^2 = 2$ to $d_1^2 / 2 = 1$ ). What is the total stage-1 BER, and by how many dB is stage 1 degraded compared with the genie case?

Solution

Apply the propagation bound

By Theorem thm-msd-error-propagation,

$P_{e,1} \;\approx\; (1 - 10^{-3}) \cdot 10^{-4} + 10^{-3} \cdot P_{e,1}^{\rm wrong\text{-}coset}.$

Estimate the wrong-coset BER

With squared distance halved the $Q$ -function argument is divided by $\sqrt{2}$ , costing about $3$ dB. At $\text{SNR}$ around the genie operating point, $P_{e,1}^{\rm wrong\text{-}coset}$ is typically an order of magnitude larger — say $10^{-3}$ for this numerical example.

Combine

Substituting,

$P_{e,1} \;\approx\; 10^{-4} + 10^{-3} \cdot 10^{-3} = 10^{-4} + 10^{-6} \;\approx\; 10^{-4}.$

The propagation term is four orders of magnitude smaller than the genie term at this operating point — negligible. The rule of thumb: as long as $p_{<i} \ll \bigl(P_{e,i}^{\rm genie} / P_{e,i}^{\rm wrong\text{-}coset}\bigr) \approx 0.1$ , propagation causes less than a $1$ dB shift. For the capacity-rule rate allocation (where each stage operates at its own threshold) this condition is typically satisfied by a wide margin.

MSD as Outer-Inner Code Concatenation

The MSD structure is best understood as a concatenation where the constellation plays the role of the inner code and each level's binary code plays the role of an outer code.

At stage 0 the inner "code" is the partition $\mathcal{A}_0 = \mathcal{X}$ viewed as a binary-input channel ( $b_0$ selects one of the two sub-constellations $\mathcal{A}_1^{(0)}$ and $\mathcal{A}_1^{(1)}$ ); the outer code $\mathcal{C}_0$ protects $b_0$ . At stage 1, given $\hat b_0$ , the inner code is the binary partition of $\mathcal{A}_1^{(\hat b_0)}$ into two halves; $\mathcal{C}_1$ protects $b_1$ . And so on.

This is the same pattern as Reed–Solomon over Galois fields combined with a binary inner code — except here the "Galois field" structure is the Euclidean-distance structure of the constellation. The point is that the concatenation turns the non-binary CM problem into a sequence of binary coding problems, one per level — and that is why we can re-use the entire binary coding toolchain.

,

Common Mistake: MSD is not joint ML, but achieves the same capacity

Mistake:

Believing that because MSD is a sequential decoder it cannot achieve the CM capacity — surely a joint ML decoder over all levels simultaneously would do better.

Correction:

Joint ML over all levels minimises block error probability at a given finite code length. But from an information-theoretic standpoint both receivers achieve the same capacity $C_{\rm CM} = \sum_i C_i$ . The chain rule of mutual information is blind to how the decoder operates — it only describes what rates are achievable, given some decoder that approaches the binary-channel thresholds at each stage. MSD does exactly that. The finite-length gap between MSD and joint ML is usually small and can be closed by iterative feedback between stages (turbo-MSD).

Why This Matters: From MSD to Iterative Bit-Level Decoders

Modern receivers for coded modulation — whether MLC-style or BICM-style — rarely run pure MSD any more. Instead they iterate between the detector (LLR computation) and the outer code, passing extrinsic information back and forth. This is the bit-interleaved iterative decoding (BIID) pattern that powers every LDPC/turbo-code-based wireless modem from 3G onwards.

The MSD analysis of this section is the information-theoretic skeleton of that iterative receiver: the capacity rule still dictates the asymptotic rates, but at finite length the iterations recover the joint-ML performance (or most of it) at modest complexity. We revisit this pattern in the BICM chapters (Chs 5–9) and in the MIMO detection chapters of book MIMO.

,

Multistage decoding (MSD)

A sequential decoder for multilevel codes that decodes one binary code per level in order, using the decoded bits from previous stages as side information for the next. Achieves the CM capacity when paired with the capacity-rule rate allocation.

Error propagation (in MSD)

The phenomenon whereby decoding errors at an early MSD stage are passed to later stages as incorrect history, causing them to operate on the wrong coset. Quantified by the bound $P_{e,i} \le P_{e,i}^{\rm genie} + p_{<i}$ .

Quick Check

How does the computational complexity of MSD scale with the number of levels $L = \log_2 M$ , compared with a joint ML decoder over all levels?

Both scale the same way — linearly in $L$

MSD scales linearly in $L$ ; joint ML scales exponentially in $\sum_i R_i$

MSD is exponential in $L$ ; joint ML is linear

Both are exponential, but MSD has a smaller exponent