Key Achievability Lemmas
The Reusable Building Blocks
Every achievability proof in information theory β channel coding, source coding, multiuser settings β can be built from a small number of modular lemmas. We now state the two most important: the packing lemma (for channel coding) and the covering lemma (for source coding). Once you understand these two lemmas, you can construct achievability proofs for channels and sources you have never seen before. This is the "proof pattern philosophy" that makes information theory so powerful: the same architecture β random codebook + typicality argument β works across dozens of settings.
Theorem: The Packing Lemma
Let , , be codewords drawn i.i.d. , independently of each other. Let be a sequence such that is jointly typical (i.e., message was sent). Then the probability that some incorrect codeword , , is also jointly typical with satisfies:
The packing lemma says: if the code rate is below the mutual information, there is enough "room" in the jointly typical set to pack all codewords without confusion. Each incorrect codeword has probability of being jointly typical with . With codewords, the union bound gives error probability when .
Union bound
U^n(m)m \neq 1Y^n\doteq 2^{-nI(U;Y)}$ by the JTL.
Count the terms
There are incorrect codewords. So:
as , provided .
Packing lemma
If the code rate , then with high probability no incorrect codeword is jointly typical with the channel output. This is the core achievability tool for channel coding.
Related: Covering lemma, Jointly typical sequences
Theorem: The Covering Lemma
Let , , be drawn i.i.d. . Let be a typical source sequence. Then:
The covering lemma says: if the codebook is large enough (rate above mutual information), there will be a codeword that is jointly typical with the source sequence. This is the "covering" condition β we need enough codewords to "cover" all typical source sequences. Each codeword covers a fraction of typical source sequences, so we need codewords.
Probability of failure
For each , .
By independence across :
when .
Covering lemma
If the codebook rate , then with high probability at least one codeword is jointly typical with any given typical source sequence. This is the core achievability tool for lossy source coding (rate-distortion).
Related: Packing lemma
Packing vs Covering: A Duality
The packing and covering lemmas are in a sense dual to each other:
- Packing (channel coding): We need the rate to be below the mutual information to avoid confusion β .
- Covering (source coding): We need the rate to be above the mutual information to ensure coverage β .
In both cases, the critical threshold is a mutual information. The channel coding theorem says is achievable. The rate-distortion theorem says is necessary and sufficient for distortion .
This "same proof, different direction" pattern extends to multiuser settings. The MAC uses packing with multiple senders. The broadcast channel uses covering with multiple receivers. Dirty-paper coding combines both.
Random Codes vs Practical Codes
The packing and covering lemmas prove existence of good codes via the random coding argument: a random codebook works with high probability. But random codes are impractical β encoding and decoding both require exponential complexity ( codewords, each of length ).
Practical codes that approach the Shannon limit include:
- LDPC codes (1960, rediscovered 1996): sparse graph structure enables iterative decoding in per iteration
- Turbo codes (1993): parallel concatenation with interleaving
- Polar codes (2009): deterministic construction with encoding/decoding, provably capacity-achieving
The gap between random coding theory and practical code design is one of the most fruitful areas in communications engineering.
- β’
Random codes require exponential storage and decoding complexity
- β’
Practical codes sacrifice some rate for polynomial-time encoding/decoding
- β’
The coding gap decreases with blocklength but never reaches zero at finite
Why This Matters: From Lemmas to 5G NR
Modern wireless standards implement the ideas of this section directly. 5G NR uses LDPC codes for data channels (approaching the packing lemma guarantee) and polar codes for control channels. The code rates are chosen based on the estimated channel mutual information (obtained via channel state information feedback), and adaptive modulation and coding (AMC) selects the combination that operates closest to capacity.
The covering lemma appears in the form of vector quantization of CSI feedback β the codebook must "cover" the space of possible channels with enough precision to enable effective beamforming.
Common Mistake: Random Coding Proves Existence, Not Construction
Mistake:
Interpreting the random coding argument as an algorithm. For example, claiming that "we draw a random codebook and it works" provides a practical coding scheme.
Correction:
Random coding is an existential argument: it shows that good codes exist (because the average over random codebooks is good, so at least one specific codebook must be good). Finding that specific codebook, or constructing an equivalent one efficiently, is a separate engineering problem. Polar codes are the first known construction that provably achieves capacity with polynomial complexity.
Quick Check
In the packing lemma, what happens if the rate exceeds ?
The union bound no longer decays to zero β too many codewords cause confusion
The code still works but with higher latency
The covering lemma takes over
We need a larger alphabet
When , the union bound gives . This does not prove errors occur (the bound may be loose), but the converse theorem (via Fano's inequality) confirms that reliable communication is indeed impossible above capacity.
Random Coding and the Packing Lemma
Packing vs Covering Duality
Finite-Blocklength Penalty
The packing and covering lemmas establish achievability in the limit . For finite blocklength , the achievable rate is approximately:
where is the channel dispersion (variance of the information density), is the target error probability, and is the inverse Q-function. For the BSC with at and : the rate loss is about bits/use below capacity. This "square-root penalty" is a fundamental feature of finite-blocklength communication (Chapter 26).
- β’
Rate penalty scales as β doubling blocklength halves the gap only by
- β’
Channel dispersion depends on the channel β BSC has
- β’
Low-latency systems (URLLC) operate at short blocklengths where this penalty dominates
Common Mistake: The AEP Requires i.i.d. (or at Least Stationarity and Ergodicity)
Mistake:
Applying the AEP to sequences that are not i.i.d. without verifying the required stationarity and ergodicity conditions. For example, assuming typicality arguments work for sources with long-range dependencies or non-stationary channels.
Correction:
The weak AEP as stated requires i.i.d. symbols. Extensions exist for stationary ergodic sources (Shannon-McMillan-Breiman theorem) but the convergence can be much slower. For non-stationary or non-ergodic settings, the notion of typicality must be generalized β and the simple typical set size no longer applies. In wireless channels, quasi-static fading (where the channel is constant for one codeword but varies across codewords) requires outage-capacity analysis instead of the ergodic typicality approach.