The Channel Coding Theorem β Converse
Why Rates Above Capacity Are Impossible
The achievability proof showed that rates below are achievable. Now we prove the converse: rates above are not achievable. Equivalently, if any code sequence has vanishing error probability, its rate must be at most .
The proof is shorter than the achievability proof and uses only two tools: Fano's inequality (which converts small error probability into an entropy bound) and the chain rule for mutual information (which decomposes the block mutual information into single-letter terms). This is the "converse proof pattern" that we will reuse for every channel model.
Theorem: Channel Coding Theorem β Converse
If a sequence of -codes has as , then .
Equivalently: for any , every sequence of -codes has .
The channel can convey at most bits per use. Over uses, the total information conveyed is at most bits. If the message has bits, then for , there are more message bits than the channel can handle, and errors must occur. Fano's inequality makes this argument precise.
Setup and Fano's inequality
Let be uniform on . The encoder produces and the decoder produces .
By Fano's inequality:
where as (since ).
Bound the rate via mutual information
$
Single-letter bound via the memoryless property
For each term:
The first inequality adds extra conditioning variables (which can only increase mutual information). The equality uses two facts:
- is a function of (possibly also of if feedback is available, but for now is determined by alone)
- depends on only through (memoryless property):
Therefore: .
Conclude
Combining:
Dividing by and letting (so ):
The Converse Proof Pattern
The converse proof has a clean three-step structure that we will reuse for every channel model in the rest of the book:
- Fano: Convert into
- Chain rule: Decompose
- Single-letter bound: Use the memoryless property to bound each term by
For more complex channels (MAC, broadcast, interference), the chain rule step becomes more involved, but the overall structure remains the same. The reader should internalize this pattern: Fano chain rule single-letter bound.
Example: A Weaker Converse via Counting
Give a simple (non-Fano) argument for why for any achievable rate. Why is this weaker than the capacity bound ?
Counting argument
The decoder observes and must identify the correct message from . The decoder function can map to at most distinct values. For reliable decoding, we need , giving .
Why this is weaker
This bound ignores the noise entirely! It says the rate cannot exceed the output alphabet size, but it does not account for the channel's noise structure. For the BSC with close to , we have but . The capacity bound is much tighter because it accounts for the noise through .
Common Mistake: Weak vs. Strong Converse
Mistake:
Assuming the converse proves that for . The standard converse only shows (the error does not vanish), not that it approaches 1.
Correction:
The weak converse shows: if , then . The strong converse (proved separately for DMCs by Wolfowitz) shows: if , then . For DMCs, both hold, but the strong converse requires additional arguments beyond Fano's inequality.
Quick Check
In the converse proof, Fano's inequality gives . What role does this play?
It shows that the encoder must use the capacity-achieving distribution
It converts small error probability into a bound on conditional entropy, which then bounds the rate
It provides the random coding argument for the achievability
Correct! Fano's inequality says: if is small, then is small, which means is close to . This forces the channel to convey nearly bits, which requires .
Key Takeaway
The converse shows that is necessary for reliable communication. The proof uses Fano's inequality to convert into , then the chain rule to decompose into single-letter terms, each bounded by . This "Fano chain rule single-letter bound" pattern is the universal converse technique in information theory.