The Finite Blocklength Regime

Beyond the Shannon Limit: Why Blocklength Matters

Shannon's channel coding theorem guarantees that rates arbitrarily close to capacity CC are achievable with vanishing error probability --- provided the blocklength nn \to \infty. In classical broadband communications with large packets (n104n \gtrsim 10^4), this asymptotic result is an excellent approximation.

However, URLLC services in 5G NR demand end-to-end latencies of 1 ms or less, which restricts the blocklength to n100n \approx 100--500500 channel uses. At such short blocklengths, the Shannon limit is overly optimistic: there is a significant penalty for operating at finite nn. This section develops the precise characterisation of that penalty through the normal approximation of Polyanskiy, Poor, and Verd'{u} (2010).

Definition:

Information Density

Consider a memoryless channel with transition kernel PYXP_{Y|X}. For input XPXX \sim P_X and output YPYY \sim P_Y, the information density is the random variable

i(X;Y)=logPYX(YX)PY(Y),i(X; Y) = \log \frac{P_{Y|X}(Y | X)}{P_Y(Y)},

where the logarithm is base 2 (so the units are bits). Its expectation is the mutual information:

I(X;Y)=E[i(X;Y)].I(X; Y) = \mathbb{E}[i(X; Y)].

The information density captures the per-sample information content, which fluctuates from one channel use to the next.

Definition:

Channel Dispersion

The channel dispersion of a memoryless channel, at the capacity-achieving input distribution PXP_X^*, is the variance of the information density:

V=Var[i(X;Y)]PX=PX.V = \mathrm{Var}\bigl[i(X; Y)\bigr] \Big|_{P_X = P_X^*}.

For the real AWGN channel with SNR γ\gamma:

V=γ(2+γ)2(1+γ)2(log2e)2(bits2/channel use).V = \frac{\gamma(2 + \gamma)}{2(1 + \gamma)^2} \cdot (\log_2 e)^2 \quad \text{(bits}^2\text{/channel use)}.

For the complex AWGN channel:

V=γ2+γ(1+γ)2(log2e)2.V = \gamma \cdot \frac{2 + \gamma}{(1 + \gamma)^2} \cdot (\log_2 e)^2.

Channel dispersion is the information-theoretic analogue of variance in the central limit theorem. A high-dispersion channel pays a larger finite-blocklength penalty because the accumulated information is "noisier" across nn uses.

Theorem: Normal Approximation (Polyanskiy-Poor-Verd'u)

Interpreting the Dispersion Penalty

The normal approximation reveals a clean structure:

  • Capacity CC is the asymptotic rate, approached only as nn \to \infty.
  • Dispersion penalty V/nQ1(ε)\sqrt{V/n}\, Q^{-1}(\varepsilon) decreases as O(1/n)O(1/\sqrt{n}) --- convergence to capacity is slow. Halving the gap requires quadrupling the blocklength.
  • Reliability cost: since Q1(ε)Q^{-1}(\varepsilon) increases as ε0\varepsilon \to 0, demanding higher reliability (smaller ε\varepsilon) further reduces the achievable rate.

For URLLC with ε=105\varepsilon = 10^{-5} and n=200n = 200, the rate loss compared to Shannon capacity can exceed 30%.

Example: Rate Loss at Short Blocklength over AWGN

Consider a complex AWGN channel at SNR γ=10\gamma = 10 dB. Compute the maximum achievable rate R(n,ε)R^*(n, \varepsilon) using the normal approximation for: (a) n=100n = 100, ε=103\varepsilon = 10^{-3}; (b) n=100n = 100, ε=105\varepsilon = 10^{-5}; (c) n=1000n = 1000, ε=105\varepsilon = 10^{-5}. Express results as a fraction of the Shannon capacity CC.

Finite Blocklength: Rate Penalty as Error Probability Varies

Watch how the achievable rate R(n,ε)R^*(n, \varepsilon) converges to Shannon capacity as blocklength nn grows, with curves for different target error probabilities from 10110^{-1} to 10710^{-7}.
The dispersion penalty V/nQ1(ε)\sqrt{V/n}\,Q^{-1}(\varepsilon) creates a significant rate gap at short blocklength, especially at stringent reliability targets (ε=105\varepsilon = 10^{-5}--10710^{-7}).

Rate vs Blocklength under Normal Approximation

Explore how the maximum achievable rate R(n,ε)R^*(n, \varepsilon) varies with blocklength nn for different target error probabilities ε\varepsilon. The plot shows the normal approximation curves alongside the Shannon capacity (dashed horizontal line). Observe that (i) convergence to CC is slow (O(1/n)O(1/\sqrt{n})), (ii) stricter reliability requirements (smaller ε\varepsilon) shift the curve downward, and (iii) at short blocklengths (n<200n < 200), the rate penalty can exceed 20--40% of capacity.

Parameters
10
-5

Theorem: Finite Blocklength under Quasi-Static Fading

,

Quick Check

The channel dispersion VV of a complex AWGN channel at high SNR (γ1\gamma \gg 1) behaves approximately as V(log2e)2V \approx (\log_2 e)^2. What does this imply about the finite blocklength penalty at high SNR?

The penalty V/nQ1(ε)\sqrt{V/n}\, Q^{-1}(\varepsilon) becomes independent of SNR, so increasing SNR only shifts the capacity but does not reduce the rate gap

The penalty vanishes at high SNR because the channel becomes deterministic

The penalty grows without bound because VV increases linearly with SNR

The penalty depends on VV only through the ratio V/C2V/C^2, which vanishes

Achievability and Converse Bounds

The normal approximation is sandwiched between two non-asymptotic bounds:

  1. Random Coding Union (RCU) bound (achievability): for any codebook size MM and optimal ML decoder,

    εE ⁣[min ⁣(1,  (M1)Pr ⁣[i(Xˉn;Yn)i(Xn;Yn)Xn,Yn])],\varepsilon \leq \mathbb{E}\!\left[\min\!\left(1,\; (M-1)\, \Pr\!\big[i(\bar{X}^n; Y^n) \geq i(X^n; Y^n) \,\big|\, X^n, Y^n\big]\right)\right],

    where Xˉn\bar{X}^n is an independent codeword drawn from the same distribution.

  2. Meta-converse bound (converse): the error probability of any (n,M,ε)(n, M, \varepsilon) code is bounded below via a binary hypothesis testing formulation:

    εsupP~Yn{11M1β1ε(PYnXn,P~Yn)},\varepsilon \geq \sup_{\tilde{P}_{Y^n}} \left\{ 1 - \frac{1}{M} \cdot \frac{1}{\beta_{1-\varepsilon} (P_{Y^n|X^n}, \tilde{P}_{Y^n})} \right\},

    where β1ε\beta_{1-\varepsilon} is the minimum type-II error probability at level 1ε1-\varepsilon.

For the AWGN channel, these bounds are remarkably tight: the gap between achievability and converse is less than 0.5 dB even at n=100n = 100.

Quick Check

To halve the gap between R(n,ε)R^*(n, \varepsilon) and the Shannon capacity CC (at a fixed ε\varepsilon), how must the blocklength nn change?

Double it (2n2n)

Quadruple it (4n4n)

Square it (n2n^2)

It depends on ε\varepsilon and cannot be determined

Historical Note: From Shannon to Polyanskiy: 62 Years to the Finite Blocklength Answer

1948--2010

Shannon's 1948 paper established channel capacity as the supremum of achievable rates with vanishing error probability as nn \to \infty. But the question "how fast does the achievable rate approach CC?" proved remarkably difficult.

Strassen (1962) showed that the convergence rate is O(1/n)O(1/\sqrt{n}) and identified the channel dispersion for the DMC, but the result attracted limited attention at the time. Hayashi (2009) independently developed second-order asymptotics using the information spectrum method.

The breakthrough came with Polyanskiy, Poor, and Verdú (2010), who provided tight achievability (random coding union bound) and converse (meta-converse) bounds that pinned down R(n,ε)R^*(n, \varepsilon) to within fractions of a dB for the AWGN channel. Their normal approximation RCV/nQ1(ε)R^* \approx C - \sqrt{V/n}\,Q^{-1}(\varepsilon) became the standard engineering formula for short-packet design and directly enabled the analytical framework for 5G URLLC.

The finite blocklength paradigm fundamentally changed how wireless engineers think about latency: it showed that reliability at short blocklength has a precise, quantifiable cost in rate, governed by the channel dispersion.

Common Mistake: Using Shannon Capacity to Design Short-Packet Systems

Mistake:

"The AWGN channel at SNR=0\text{SNR} = 0 dB has capacity C=1C = 1 bit/channel use, so I can reliably transmit at rate R=0.9R = 0.9 bits/use with a code of blocklength n=100n = 100."

Correction:

At n=100n = 100 and target error probability ε=105\varepsilon = 10^{-5}, the normal approximation gives:

R(100,105)1V100Q1(105)R^*(100, 10^{-5}) \approx 1 - \sqrt{\frac{V}{100}} \cdot Q^{-1}(10^{-5})

For the AWGN channel at SNR=0\text{SNR} = 0 dB, the dispersion is V1.0V \approx 1.0 bit2^2/channel use, and Q1(105)4.26Q^{-1}(10^{-5}) \approx 4.26. Therefore:

R10.014.26=10.426=0.574 bits/use.R^* \approx 1 - \sqrt{0.01} \cdot 4.26 = 1 - 0.426 = 0.574 \text{ bits/use.}

The actual achievable rate is 0.574 bits/use, not 0.9. The 0.426 bits/use gap is the dispersion penalty, which is enormous at short blocklength. Designing at R=0.9R = 0.9 would result in error probability ε101\varepsilon \gg 10^{-1}, not 10510^{-5}. The Shannon limit is only a useful design target when n>10,000n > 10{,}000.

Channel Dispersion

The variance of the information density ı(X;Y)=logpYX(YX)pY(Y)\imath(X; Y) = \log \frac{p_{Y|X}(Y|X)}{p_Y(Y)} under the capacity-achieving input distribution. Denoted VV, it governs the O(1/n)O(1/\sqrt{n}) penalty in the normal approximation. For the AWGN channel at SNR γ\gamma: V=γ(2+γ)2(1+γ)2(log2e)2V = \frac{\gamma(2 + \gamma)}{2(1+\gamma)^2} (\log_2 e)^2.

Related: Channel Dispersion, Normal Approximation (Polyanskiy-Poor-Verd'u)

Normal Approximation (Polyanskiy-Poor-Verdú)

The second-order asymptotic expansion R(n,ε)CV/nQ1(ε)+O(logn/n)R^*(n, \varepsilon) \approx C - \sqrt{V/n}\,Q^{-1}(\varepsilon) + O(\log n / n). Accurate to within 0.5 dB for n100n \geq 100 on the AWGN channel.

Related: Normal Approximation (Polyanskiy-Poor-Verd'u), Channel Dispersion

Blocklength

The number of channel uses nn allocated to encoding a single message. In OFDM-based systems, nn equals the number of resource elements assigned to the codeword. Shorter blocklength reduces latency but increases the rate penalty V/nQ1(ε)\sqrt{V/n}\,Q^{-1}(\varepsilon).

Related: Information Density

Key Takeaway

The dispersion penalty is the price of low latency. The normal approximation R(n,ε)CV/nQ1(ε)R^*(n, \varepsilon) \approx C - \sqrt{V/n}\,Q^{-1}(\varepsilon) shows that short-packet communication operates fundamentally below Shannon capacity. The gap scales as 1/n1/\sqrt{n}: halving the gap requires quadrupling the blocklength. For URLLC with ε=105\varepsilon = 10^{-5} and n=100n = 100, the rate loss can exceed 40% of capacity — a fact that Shannon's asymptotic theory cannot predict.