Achievability and Converse Bounds

Beyond the Normal Approximation: Exact Bounds

The normal approximation is an asymptotic expansion: it becomes accurate as nn grows. But for very short blocklengths (n∼50n \sim 50-200200), the O(log⁔n/n)O(\log n / n) remainder can be significant. We need non-asymptotic bounds that are tight for any nn.

Polyanskiy, Poor, and Verdu provided two such bounds: the random coding union (RCU) bound for achievability and the meta-converse (κβ\kappa\beta bound) for the converse. Together, these bounds sandwich Rāˆ—(n,ϵ)R^*(n, \epsilon) to within a fraction of a bit for most channels of practical interest.

The key insight is to replace typicality-based arguments (which are inherently asymptotic) with hypothesis testing arguments (which work for any nn).

RCU Bound and Meta-Converse: Tight Sandwich

The RCU bound (achievability, blue) and meta-converse (converse, red) sandwich the true maximum coding rate Rāˆ—(n,ε)R^*(n, \varepsilon). The gap between them is remarkably small even at moderate blocklengths.

Definition:

Random Coding Union (RCU) Bound

For a DMC pY∣Xp_{Y|X} with input distribution pXp_X, the average error probability of a random code with MM codewords drawn i.i.d. from pXnp_X^n satisfies:

ϵ≤E ⁣[min⁔ ⁣(1,ā€…ā€Š(Māˆ’1) Pr⁔ ⁣[ι(X′;Y)≄ι(X;Y)ā€…ā€Šāˆ£ā€…ā€ŠX,Y])]\epsilon \le \mathbb{E}\!\left[\min\!\left(1,\; (M-1)\, \Pr\!\left[\iota(X'; Y) \ge \iota(X; Y) \;\big|\; X, Y\right]\right)\right]

where (X,Y)∼pXā‹…pY∣X(X, Y) \sim p_X \cdot p_{Y|X} is the transmitted codeword and received signal, and Xā€²āˆ¼pXX' \sim p_X is an independent codeword. Equivalently:

ϵRCU(n,M)=E ⁣[min⁔ ⁣(1,ā€…ā€Š(Māˆ’1) Pr⁔ ⁣[ι(Xˉn;Yn)≄ι(Xn;Yn)])]\epsilon_{\text{RCU}}(n, M) = \mathbb{E}\!\left[\min\!\left(1,\; (M-1)\, \Pr\!\left[\iota(\bar{X}^n; Y^n) \ge \iota(X^n; Y^n)\right]\right)\right]

where Xˉn\bar{X}^n is drawn independently from the same codebook distribution.

The RCU bound implies the existence of an (n,M,ϵRCU)(n, M, \epsilon_{\text{RCU}})-code.

The RCU bound is a refinement of the classical random coding bound. Instead of using the union bound over all Māˆ’1M-1 incorrect codewords (which is loose), it uses the exact probability that any incorrect codeword has higher information density than the correct one. This is tight because the dominant error event is typically a single confusion, not multiple.

Theorem: RCU Achievability Bound

For any DMC pY∣Xp_{Y|X} and any input distribution pXp_X, there exists an (n,M,ϵ)(n, M, \epsilon)-code with:

ϵ≤E ⁣[min⁔ ⁣(1,ā€…ā€Š(Māˆ’1)āˆ‘ynpYn(yn)1 ⁣[ι(Xn;yn)≤ι(Xˉn;yn)])].\epsilon \le \mathbb{E}\!\left[\min\!\left(1,\; (M-1) \sum_{y^n} p_{Y^n}(y^n) \mathbf{1}\!\left[\iota(X^n; y^n) \le \iota(\bar{X}^n; y^n)\right]\right)\right].

For the AWGN channel with SNR\text{SNR} and i.i.d. N(0,P)\mathcal{N}(0, P) inputs, this simplifies to:

ϵRCU=E ⁣[min⁔ ⁣(1,ā€…ā€Š(Māˆ’1) Q ⁣(ι(Xn;Yn)āˆ’nCnV+nV(Cāˆ’R)))]\epsilon_{\text{RCU}} = \mathbb{E}\!\left[\min\!\left(1,\; (M-1)\, Q\!\left(\frac{\iota(X^n; Y^n) - nC}{\sqrt{nV}} + \sqrt{\frac{n}{V}}(C - R)\right)\right)\right]

where the expectation is over the random information density ι(Xn;Yn)\iota(X^n; Y^n).

The RCU bound says: "draw a random code, compute the probability that any wrong codeword looks better than the right one, and average over the channel randomness." This is tight because it captures the exact pairwise confusion probability, not an upper bound on it.

Definition:

The β\beta Function (Hypothesis Testing)

For two probability distributions PP and QQ on the same alphabet, the minimum type-II error at significance level ϵ\epsilon is:

β1āˆ’Ļµ(P,Q)=min⁔T:EP[T]≄1āˆ’ĻµEQ[T]\beta_{1-\epsilon}(P, Q) = \min_{\substack{T: \\ \mathbb{E}_P[T] \ge 1 - \epsilon}} \mathbb{E}_Q[T]

where the minimum is over all randomized tests T:Y→[0,1]T: \mathcal{Y} \to [0, 1]. By the Neyman-Pearson lemma, the optimal test is the likelihood ratio test:

Tāˆ—(y)={1ifĀ dPdQ(y)>τγifĀ dPdQ(y)=Ļ„0ifĀ dPdQ(y)<Ļ„T^*(y) = \begin{cases} 1 & \text{if } \frac{dP}{dQ}(y) > \tau \\ \gamma & \text{if } \frac{dP}{dQ}(y) = \tau \\ 0 & \text{if } \frac{dP}{dQ}(y) < \tau \end{cases}

where Ļ„\tau and γ\gamma are chosen so that EP[Tāˆ—]=1āˆ’Ļµ\mathbb{E}_P[T^*] = 1 - \epsilon.

Theorem: The Meta-Converse (κβ\kappa\beta Bound)

For any DMC pY∣Xp_{Y|X} and any (n,M,ϵ)(n, M, \epsilon)-code:

M≤sup⁔QYn1β1āˆ’Ļµ(pYn∣Xn(ā‹…āˆ£xn),ā€…ā€ŠQYn)M \le \sup_{Q_{Y^n}} \frac{1}{\beta_{1-\epsilon}(p_{Y^n|X^n}(\cdot|x^n),\; Q_{Y^n})}

for every codeword xnx^n in the code, where the supremum is over all output distributions QYnQ_{Y^n}.

For the average error probability formulation:

M≤sup⁔QYn11Māˆ‘m=1Mβ1āˆ’Ļµm(pYn∣Xn(ā‹…āˆ£xm),ā€…ā€ŠQYn)M \le \sup_{Q_{Y^n}} \frac{1}{\frac{1}{M}\sum_{m=1}^M \beta_{1-\epsilon_m}(p_{Y^n|X^n}(\cdot|\mathbf{x}_m),\; Q_{Y^n})}

where ϵm\epsilon_m is the conditional error probability for message mm and 1Māˆ‘mϵm≤ϵ\frac{1}{M}\sum_m \epsilon_m \le \epsilon.

The meta-converse connects coding to hypothesis testing. The idea is: if a code can reliably distinguish MM codewords, then each codeword-output distribution pYn∣Xn(ā‹…āˆ£xn)p_{Y^n|X^n}(\cdot|x^n) must be "far" from the background distribution QYnQ_{Y^n}. The β\beta function measures this distance. The larger MM is, the harder it is for all codewords to be far from QQ, which gives the converse.

The point is that this bound works for any nn, not just asymptotically. It replaces Fano's inequality with a hypothesis-testing argument that is tight to second order.

RCU Achievability vs Meta-Converse

Compare the RCU bound (achievability), the meta-converse (converse), and the normal approximation. For many channels, the gap between RCU and meta-converse is remarkably small, even at short blocklengths.

Parameters
5
0.001

Common Mistake: Confusing Dispersion with Error Exponent

Mistake:

Believing that the error exponent E(R)=āˆ’lim⁔nā†’āˆž1nlog⁔PeE(R) = -\lim_{n \to \infty} \frac{1}{n}\log P_e and the channel dispersion VV capture the same information about finite-blocklength performance.

Correction:

The error exponent describes how PeP_e decays at a fixed rate below capacity as nā†’āˆžn \to \infty. The dispersion describes the rate penalty at a fixed PeP_e as nn varies. They answer different questions:

  • Error exponent: "At rate R<CR < C, how fast does PeP_e go to zero?"
  • Normal approximation: "At error probability ϵ\epsilon, what rate can we achieve at blocklength nn?"

For system design, the normal approximation is usually more useful because the designer specifies (n,ϵ)(n, \epsilon) and wants to know the achievable rate, not the other way around.

Common Mistake: Trusting the Normal Approximation at Very Short Blocklengths

Mistake:

Using Rāˆ—(n,ϵ)ā‰ˆCāˆ’V/n Qāˆ’1(ϵ)R^*(n, \epsilon) \approx C - \sqrt{V/n}\,Q^{-1}(\epsilon) for n<50n < 50, where the CLT approximation of the information density may be inaccurate.

Correction:

For n<50n < 50-100100, use the exact RCU and meta-converse bounds, which do not rely on CLT. The normal approximation has an O(log⁔n/n)O(\log n/n) error term that can be 0.1-0.3 bits/use at n=50n = 50, which is significant relative to the V/n\sqrt{V/n} term. The Berry-Esseen refinement helps: Rāˆ—(n,ϵ)=Cāˆ’V/n Qāˆ’1(ϵ)+12log⁔nn+O(1/n)R^*(n, \epsilon) = C - \sqrt{V/n}\,Q^{-1}(\epsilon) + \frac{1}{2}\frac{\log n}{n} + O(1/n).

Example: Computing the RCU Bound for BSC

For the BSC with crossover probability p=0.11p = 0.11 and blocklength n=128n = 128, compute the RCU bound on the maximum code size MM at error probability ϵ=10āˆ’3\epsilon = 10^{-3}.

šŸ”§Engineering Note

How Close Do Practical Codes Get to the PPV Bounds?

Modern channel codes approach the finite-blocklength bounds remarkably well. For the AWGN channel at n=128n = 128 and ϵ=10āˆ’3\epsilon = 10^{-3}:

  • Polar codes (SCL decoding, list size 32): within 0.5 dB of the RCU bound
  • LDPC codes (5G NR base graph): within 0.7 dB of the RCU bound at n=1024n = 1024, but degrade faster at shorter blocklengths
  • Turbo codes: within 0.3 dB at n=1000n = 1000, but with higher decoding complexity
  • Tail-biting convolutional codes (used in LTE control channels): within 1 dB at n=128n = 128

The PPV bounds serve as the ultimate benchmark: if a code is within 0.5 dB of the meta-converse, there is very little room for improvement at that blocklength.

Practical Constraints
  • •

    5G NR uses polar codes for control channels (n = 32-1024) and LDPC for data (n = 256-8448)

  • •

    Decoding latency scales with list size in SCL: L=8 is practical, L=32 is costly

  • •

    At n < 64, algebraic codes (Reed-Muller, BCH) can outperform capacity-approaching codes

Key Takeaway

RCU + meta-converse = tight sandwich. The random coding union bound (achievability) and the meta-converse (converse) together characterize Rāˆ—(n,ϵ)R^*(n, \epsilon) to within a fraction of a bit for most channels. They replace the asymptotic achievability-converse pair (random coding + Fano) with non-asymptotic hypothesis-testing arguments that work at any blocklength. For practical system design, the normal approximation Rāˆ—ā‰ˆCāˆ’V/n Qāˆ’1(ϵ)R^* \approx C - \sqrt{V/n}\,Q^{-1}(\epsilon) is accurate for n≳100n \gtrsim 100.

Random coding union bound

A non-asymptotic achievability bound that upper-bounds the error probability of a random code by the probability that any incorrect codeword has higher information density than the correct one.

Related: Meta-converse

Meta-converse

A non-asymptotic converse bound that lower-bounds the minimum error probability using hypothesis testing between the channel output distribution and a reference distribution. Also called the κβ\kappa\beta bound.

Related: Random coding union bound