The Channel Coding Theorem — Achievability
The Random Coding Argument
We now prove that any rate is achievable. The proof is one of the most elegant arguments in mathematics: we show that a randomly generated codebook, with high probability, achieves vanishing error probability. No clever construction is needed — randomness itself provides enough structure.
The argument has three steps: (1) generate the codebook randomly, (2) decode by joint typicality, (3) show that the error probability vanishes for rates below . The packing lemma from Chapter 3 does the heavy lifting in step 3.
Random Coding and the Packing Argument
Theorem: Channel Coding Theorem — Achievability
For any and any , there exists an -code with for all sufficiently large .
Think of the codebook as points scattered in . When codeword is transmitted, the output is "near" it in the sense of joint typicality. As long as no other codeword is also near (also jointly typical), the decoder succeeds. The packing lemma guarantees this for : there is enough "room" in for non-confusable codewords.
Step 1: Random codebook generation
Fix an input distribution and .
Generate a random codebook: draw codewords independently, each with i.i.d. entries .
The codebook is revealed to both the encoder and the decoder before communication begins. The message is uniform on .
Step 2: Joint typicality decoding
The decoder, upon receiving , looks for a unique message such that .
Step 3: Error analysis (ensemble average)
By the symmetry of the random codebook (all codewords drawn from the same distribution), the ensemble average error probability satisfies:
We condition on (WLOG by symmetry) and define error events:
- : the true pair is not jointly typical.
- : a wrong codeword is jointly typical with .
By the union bound: .
Step 4: Bounding $\Pr(\mathcal{E}_1)$
Since was sent, are jointly distributed as .
By the law of large numbers (AEP property of typical sequences):
Step 5: Bounding $\Pr(\mathcal{E}_2)$ via the Packing Lemma
For , the codeword is independent of (since depends only on , and the codewords are drawn independently). So — the product distribution.
By the Packing Lemma (Chapter 3, Lemma 18):
By the union bound over the wrong codewords:
This vanishes as provided .
Step 6: From ensemble average to existence
We have shown for large .
Since this is the average over random codebooks, there exists at least one deterministic codebook with .
Choose (the capacity-achieving distribution) to get , and let (which makes ).
Step 7: Expurgation (average to maximal error)
The code has (average error). Sort codewords by their individual error probabilities: .
The expurgated code keeps only the best half: codewords .
Since the average error is , the worst codeword in the best half satisfies .
The new rate is , which is still below for large . So .
The Power of Random Coding
The random coding argument is sometimes called the "greatest trick in information theory." We never construct a specific good code — we show that a randomly drawn code works with high probability. This is profoundly useful for proving existence but does not directly yield practical codes.
The challenge of coding theory for the next 70 years has been to find explicit codes that approach capacity: turbo codes (1993), LDPC codes (rediscovered 1996), and polar codes (2009) each represent major steps toward this goal. See Book telecom, Ch. 12 and Chapter 11 of this book for these constructions.
Common Mistake: Random Coding is Not a Code Construction
Mistake:
Treating the random coding argument as a practical code design method. In practice, one cannot generate a random codebook of size for reasonable block lengths — the number of codewords grows exponentially.
Correction:
Random coding is a proof technique that establishes the existence of good codes. Practical codes use structure (algebraic, graphical, or recursive) to enable efficient encoding and decoding. The gap between the random coding bound and practical code performance has been nearly closed by modern codes (LDPC, polar).
Historical Note: Shannon's 1948 Proof
1948Shannon's original 1948 paper introduced the random coding argument, though in a somewhat different form than presented here. The cleanest version of the proof, using joint typicality decoding, was developed later by Wolfowitz (1961) and refined by Csiszar and Korner (1981). The Orlitsky-Roche strong typicality framework, which we follow in this book, provides the most versatile version of the argument.
It is worth pausing to appreciate the audacity of Shannon's approach: he proved that reliable communication is possible by showing that a code chosen at random works — without ever exhibiting a specific code. This probabilistic method, borrowed from combinatorics (Erdos), was radical in the engineering context of 1948.
Quick Check
In the achievability proof, which step relies on the independence of the wrong codewords from the channel output?
Bounding : the true pair must be jointly typical
Bounding via the packing lemma: wrong codewords are independent of
The expurgation step: removing bad codewords
Correct! For , and are independent (since they are drawn independently). This makes (the product distribution), which is exactly the condition needed for the packing lemma to bound the joint typicality probability.
Key Takeaway
The achievability proof shows that rates below are achievable via random coding and joint typicality decoding. The key ingredients are: (1) the AEP guarantees the true pair is jointly typical, (2) the packing lemma bounds the probability of confusion with wrong codewords, and (3) expurgation converts average error to maximal error. This proof pattern — random coding for achievability — will be reused throughout the book.