Ferkans — Interactive Telecom Tutor

Beyond the Normal Approximation: Exact Bounds

The normal approximation is an asymptotic expansion: it becomes accurate as $n$ grows. But for very short blocklengths ( $n \sim 50$ - $200$ ), the $O(\log n / n)$ remainder can be significant. We need non-asymptotic bounds that are tight for any $n$ .

Polyanskiy, Poor, and Verdu provided two such bounds: the random coding union (RCU) bound for achievability and the meta-converse ( $\kappa\beta$ bound) for the converse. Together, these bounds sandwich $R^*(n, \epsilon)$ to within a fraction of a bit for most channels of practical interest.

The key insight is to replace typicality-based arguments (which are inherently asymptotic) with hypothesis testing arguments (which work for any $n$ ).

RCU Bound and Meta-Converse: Tight Sandwich

The RCU bound (achievability, blue) and meta-converse (converse, red) sandwich the true maximum coding rate

R^*(n, \varepsilon)

. The gap between them is remarkably small even at moderate blocklengths.

Definition:
Random Coding Union (RCU) Bound

For a DMC $p_{Y|X}$ with input distribution $p_X$ , the average error probability of a random code with $M$ codewords drawn i.i.d. from $p_X^n$ satisfies:

$\epsilon \le \mathbb{E}\!\left[\min\!\left(1,\; (M-1)\, \Pr\!\left[\iota(X'; Y) \ge \iota(X; Y) \;\big|\; X, Y\right]\right)\right]$

where $(X, Y) \sim p_X \cdot p_{Y|X}$ is the transmitted codeword and received signal, and $X' \sim p_X$ is an independent codeword. Equivalently:

$\epsilon_{\text{RCU}}(n, M) = \mathbb{E}\!\left[\min\!\left(1,\; (M-1)\, \Pr\!\left[\iota(\bar{X}^n; Y^n) \ge \iota(X^n; Y^n)\right]\right)\right]$

where $\bar{X}^n$ is drawn independently from the same codebook distribution.

The RCU bound implies the existence of an $(n, M, \epsilon_{\text{RCU}})$ -code.

The RCU bound is a refinement of the classical random coding bound. Instead of using the union bound over all $M-1$ incorrect codewords (which is loose), it uses the exact probability that any incorrect codeword has higher information density than the correct one. This is tight because the dominant error event is typically a single confusion, not multiple.

Theorem: RCU Achievability Bound

For any DMC $p_{Y|X}$ and any input distribution $p_X$ , there exists an $(n, M, \epsilon)$ -code with:

$\epsilon \le \mathbb{E}\!\left[\min\!\left(1,\; (M-1) \sum_{y^n} p_{Y^n}(y^n) \mathbf{1}\!\left[\iota(X^n; y^n) \le \iota(\bar{X}^n; y^n)\right]\right)\right].$

For the AWGN channel with $\text{SNR}$ and i.i.d. $\mathcal{N}(0, P)$ inputs, this simplifies to:

$\epsilon_{\text{RCU}} = \mathbb{E}\!\left[\min\!\left(1,\; (M-1)\, Q\!\left(\frac{\iota(X^n; Y^n) - nC}{\sqrt{nV}} + \sqrt{\frac{n}{V}}(C - R)\right)\right)\right]$

where the expectation is over the random information density $\iota(X^n; Y^n)$ .

The RCU bound says: "draw a random code, compute the probability that any wrong codeword looks better than the right one, and average over the channel randomness." This is tight because it captures the exact pairwise confusion probability, not an upper bound on it.

Proof

Error event decomposition

The ML decoder makes an error if there exists $m' \ne m$ such that $\iota(\mathbf{x}_{m'}; \mathbf{y}) \ge \iota(\mathbf{x}_m; \mathbf{y})$ , where $m$ is the transmitted message. By the union bound: $P_e(m) \le \sum_{m' \ne m} \Pr[\iota(\mathbf{x}_{m'}; \mathbf{y}) \ge \iota(\mathbf{x}_m; \mathbf{y})]$ . The RCU improves this by taking the min with 1: $P_e(m) \le \mathbb{E}_{\mathbf{y}}\!\left[\min(1, \sum_{m'} \Pr[\iota(\bar{X}^n; \mathbf{y}) \ge \iota(\mathbf{x}_m; \mathbf{y})])\right]$ .

Averaging over the random code

Since all codewords are drawn i.i.d., the inner sum becomes $(M-1)$ times the probability for a single independent codeword $\bar{X}^n$ . Averaging over the transmitted codeword $X^n$ and the channel output $Y^n$ yields the RCU bound.

Existence argument

The RCU bound is an average over the random code ensemble. Therefore, there exists at least one deterministic code achieving error probability no worse than $\epsilon_{\text{RCU}}$ .

Definition:
The $\beta$ Function (Hypothesis Testing)

For two probability distributions $P$ and $Q$ on the same alphabet, the minimum type-II error at significance level $\epsilon$ is:

$\beta_{1-\epsilon}(P, Q) = \min_{\substack{T: \\ \mathbb{E}_P[T] \ge 1 - \epsilon}} \mathbb{E}_Q[T]$

where the minimum is over all randomized tests $T: \mathcal{Y} \to [0, 1]$ . By the Neyman-Pearson lemma, the optimal test is the likelihood ratio test:

$T^*(y) = \begin{cases} 1 & \text{if } \frac{dP}{dQ}(y) > \tau \\ \gamma & \text{if } \frac{dP}{dQ}(y) = \tau \\ 0 & \text{if } \frac{dP}{dQ}(y) < \tau \end{cases}$

where $\tau$ and $\gamma$ are chosen so that $\mathbb{E}_P[T^*] = 1 - \epsilon$ .

Theorem: The Meta-Converse ( $\kappa\beta$ Bound)

For any DMC $p_{Y|X}$ and any $(n, M, \epsilon)$ -code:

$M \le \sup_{Q_{Y^n}} \frac{1}{\beta_{1-\epsilon}(p_{Y^n|X^n}(\cdot|x^n),\; Q_{Y^n})}$

for every codeword $x^n$ in the code, where the supremum is over all output distributions $Q_{Y^n}$ .

For the average error probability formulation:

$M \le \sup_{Q_{Y^n}} \frac{1}{\frac{1}{M}\sum_{m=1}^M \beta_{1-\epsilon_m}(p_{Y^n|X^n}(\cdot|\mathbf{x}_m),\; Q_{Y^n})}$

where $\epsilon_m$ is the conditional error probability for message $m$ and $\frac{1}{M}\sum_m \epsilon_m \le \epsilon$ .

The meta-converse connects coding to hypothesis testing. The idea is: if a code can reliably distinguish $M$ codewords, then each codeword-output distribution $p_{Y^n|X^n}(\cdot|x^n)$ must be "far" from the background distribution $Q_{Y^n}$ . The $\beta$ function measures this distance. The larger $M$ is, the harder it is for all codewords to be far from $Q$ , which gives the converse.

The point is that this bound works for any $n$ , not just asymptotically. It replaces Fano's inequality with a hypothesis-testing argument that is tight to second order.

Proof

Hypothesis testing formulation

Consider testing $H_0: Y^n \sim p_{Y^n|X^n}(\cdot|\mathbf{x}_m)$ vs $H_1: Y^n \sim Q_{Y^n}$ for each message $m$ . The decoder's decision region $\mathcal{D}_m$ acts as a test: it accepts $H_0$ with probability $\ge 1 - \epsilon_m$ (correct decoding) and has type-II error $Q_{Y^n}(\mathcal{D}_m) \ge \beta_{1-\epsilon_m}$ .

Union of decision regions

Since the decision regions are disjoint and cover the output space: $\sum_{m=1}^M Q_{Y^n}(\mathcal{D}_m) \le 1$ . Therefore: $\sum_{m=1}^M \beta_{1-\epsilon_m}(p_{Y^n|X^n}(\cdot|\mathbf{x}_m), Q_{Y^n}) \le 1$ .

Bound on $M$

By Jensen's inequality (or direct averaging): $M \cdot \frac{1}{M}\sum_m \beta_{1-\epsilon_m} \le 1$ , giving the bound on $M$ . The supremum over $Q_{Y^n}$ makes the bound as tight as possible.

RCU Achievability vs Meta-Converse

Compare the RCU bound (achievability), the meta-converse (converse), and the normal approximation. For many channels, the gap between RCU and meta-converse is remarkably small, even at short blocklengths.

Parameters

Channel type

SNR (dB) for AWGN, or crossover prob for BSC/BEC5

Error probability

\epsilon

0.001

Common Mistake: Confusing Dispersion with Error Exponent

Mistake:

Believing that the error exponent $E(R) = -\lim_{n \to \infty} \frac{1}{n}\log P_e$ and the channel dispersion $V$ capture the same information about finite-blocklength performance.

Correction:

The error exponent describes how $P_e$ decays at a fixed rate below capacity as $n \to \infty$ . The dispersion describes the rate penalty at a fixed $P_e$ as $n$ varies. They answer different questions:

Error exponent: "At rate $R < C$ , how fast does $P_e$ go to zero?"
Normal approximation: "At error probability $\epsilon$ , what rate can we achieve at blocklength $n$ ?"

For system design, the normal approximation is usually more useful because the designer specifies $(n, \epsilon)$ and wants to know the achievable rate, not the other way around.

Common Mistake: Trusting the Normal Approximation at Very Short Blocklengths

Mistake:

Using $R^*(n, \epsilon) \approx C - \sqrt{V/n}\,Q^{-1}(\epsilon)$ for $n < 50$ , where the CLT approximation of the information density may be inaccurate.

Correction:

For $n < 50$ - $100$ , use the exact RCU and meta-converse bounds, which do not rely on CLT. The normal approximation has an $O(\log n/n)$ error term that can be 0.1-0.3 bits/use at $n = 50$ , which is significant relative to the $\sqrt{V/n}$ term. The Berry-Esseen refinement helps: $R^*(n, \epsilon) = C - \sqrt{V/n}\,Q^{-1}(\epsilon) + \frac{1}{2}\frac{\log n}{n} + O(1/n)$ .

Example: Computing the RCU Bound for BSC

For the BSC with crossover probability $p = 0.11$ and blocklength $n = 128$ , compute the RCU bound on the maximum code size $M$ at error probability $\epsilon = 10^{-3}$ .

Solution

Setup

With uniform input and BSC, the information density for a pair $(x^n, y^n)$ with $d$ bit errors is: $\iota(x^n; y^n) = n\log 2 - nh(d/n)$ (in nats) where $h(\cdot)$ is the binary entropy function.

RCU bound evaluation

The RCU bound is $\epsilon_{\text{RCU}} = \mathbb{E}[\min(1, (M-1)P_{\text{conf}})]$ where $P_{\text{conf}} = \Pr[\iota(\bar{X}^n; Y^n) \ge \iota(X^n; Y^n)]$ .

For a BSC, $P_{\text{conf}}$ depends on the number of errors $d$ in the true pair: $P_{\text{conf}}(d) = \sum_{d'=0}^{d} \binom{n}{d'} 2^{-n}$ (probability that a random codeword has at most $d$ disagreements with $Y^n$ ).

The RCU bound is then: $\epsilon_{\text{RCU}} = \sum_d \binom{n}{d}p^d(1-p)^{n-d} \min(1, (M-1)P_{\text{conf}}(d))$ .

Numerical result

Evaluating numerically with $n = 128$ , $p = 0.11$ : $C = 0.5$ bits, $V = 0.89$ bits $^2$ . Normal approximation: $\log_2 M \approx 128(0.5 - \sqrt{0.89/128} \times 3.09) = 128 \times 0.242 \approx 31$ . Exact RCU: $\log_2 M \approx 33$ (slightly tighter than the normal approximation).

🔧Engineering Note

How Close Do Practical Codes Get to the PPV Bounds?

Modern channel codes approach the finite-blocklength bounds remarkably well. For the AWGN channel at $n = 128$ and $\epsilon = 10^{-3}$ :

Polar codes (SCL decoding, list size 32): within 0.5 dB of the RCU bound
LDPC codes (5G NR base graph): within 0.7 dB of the RCU bound at $n = 1024$ , but degrade faster at shorter blocklengths
Turbo codes: within 0.3 dB at $n = 1000$ , but with higher decoding complexity
Tail-biting convolutional codes (used in LTE control channels): within 1 dB at $n = 128$

The PPV bounds serve as the ultimate benchmark: if a code is within 0.5 dB of the meta-converse, there is very little room for improvement at that blocklength.

Practical Constraints

•
5G NR uses polar codes for control channels (n = 32-1024) and LDPC for data (n = 256-8448)
•
Decoding latency scales with list size in SCL: L=8 is practical, L=32 is costly
•
At n < 64, algebraic codes (Reed-Muller, BCH) can outperform capacity-approaching codes

Key Takeaway

RCU + meta-converse = tight sandwich. The random coding union bound (achievability) and the meta-converse (converse) together characterize $R^*(n, \epsilon)$ to within a fraction of a bit for most channels. They replace the asymptotic achievability-converse pair (random coding + Fano) with non-asymptotic hypothesis-testing arguments that work at any blocklength. For practical system design, the normal approximation $R^* \approx C - \sqrt{V/n}\,Q^{-1}(\epsilon)$ is accurate for $n \gtrsim 100$ .

Random coding union bound

A non-asymptotic achievability bound that upper-bounds the error probability of a random code by the probability that any incorrect codeword has higher information density than the correct one.

Related: Meta-converse

Meta-converse

A non-asymptotic converse bound that lower-bounds the minimum error probability using hypothesis testing between the channel output distribution and a reference distribution. Also called the $\kappa\beta$ bound.

Related: Random coding union bound

Achievability and Converse Bounds