The Channel Coding Theorem β€” Converse

Why Rates Above Capacity Are Impossible

The achievability proof showed that rates below CC are achievable. Now we prove the converse: rates above CC are not achievable. Equivalently, if any code sequence has vanishing error probability, its rate must be at most CC.

The proof is shorter than the achievability proof and uses only two tools: Fano's inequality (which converts small error probability into an entropy bound) and the chain rule for mutual information (which decomposes the block mutual information into single-letter terms). This is the "converse proof pattern" that we will reuse for every channel model.

Theorem: Channel Coding Theorem β€” Converse

If a sequence of (R,n)(R, n)-codes has Pe(n)β†’0P_e^{(n)} \to 0 as nβ†’βˆžn \to \infty, then R≀C=max⁑PXI(X;Y)R \leq C = \max_{P_X} I(X; Y).

Equivalently: for any R>CR > C, every sequence of (R,n)(R, n)-codes has lim inf⁑nβ†’βˆžPe(n)>0\liminf_{n \to \infty} P_e^{(n)} > 0.

The channel can convey at most I(Xi;Yi)≀CI(X_i; Y_i) \leq C bits per use. Over nn uses, the total information conveyed is at most nCnC bits. If the message has nRnR bits, then for R>CR > C, there are more message bits than the channel can handle, and errors must occur. Fano's inequality makes this argument precise.

,

The Converse Proof Pattern

The converse proof has a clean three-step structure that we will reuse for every channel model in the rest of the book:

  1. Fano: Convert Pe(n)β†’0P_e^{(n)} \to 0 into H(M∣Yn)≀nΟ΅nH(M | Y^n) \leq n\epsilon_n
  2. Chain rule: Decompose I(M;Yn)=βˆ‘i=1nI(M;Yi∣Yiβˆ’1)I(M; Y^n) = \sum_{i=1}^n I(M; Y_i | Y^{i-1})
  3. Single-letter bound: Use the memoryless property to bound each term by CC

For more complex channels (MAC, broadcast, interference), the chain rule step becomes more involved, but the overall structure remains the same. The reader should internalize this pattern: Fano β†’\to chain rule β†’\to single-letter bound.

Example: A Weaker Converse via Counting

Give a simple (non-Fano) argument for why R≀log⁑∣Y∣R \leq \log|\mathcal{Y}| for any achievable rate. Why is this weaker than the capacity bound R≀CR \leq C?

Common Mistake: Weak vs. Strong Converse

Mistake:

Assuming the converse proves that Pe(n)β†’1P_e^{(n)} \to 1 for R>CR > C. The standard converse only shows lim inf⁑Pe(n)>0\liminf P_e^{(n)} > 0 (the error does not vanish), not that it approaches 1.

Correction:

The weak converse shows: if R>CR > C, then Pe(n)β†’ΜΈ0P_e^{(n)} \not\to 0. The strong converse (proved separately for DMCs by Wolfowitz) shows: if R>CR > C, then Pe(n)β†’1P_e^{(n)} \to 1. For DMCs, both hold, but the strong converse requires additional arguments beyond Fano's inequality.

Quick Check

In the converse proof, Fano's inequality gives H(M∣Yn)≀1+Pe(n)nRH(M | Y^n) \leq 1 + P_e^{(n)} nR. What role does this play?

It shows that the encoder must use the capacity-achieving distribution

It converts small error probability into a bound on conditional entropy, which then bounds the rate

It provides the random coding argument for the achievability

Key Takeaway

The converse shows that R≀CR \leq C is necessary for reliable communication. The proof uses Fano's inequality to convert Peβ†’0P_e \to 0 into H(M∣Yn)β†’0H(M|Y^n) \to 0, then the chain rule to decompose I(M;Yn)I(M; Y^n) into single-letter terms, each bounded by CC. This "Fano β†’\to chain rule β†’\to single-letter bound" pattern is the universal converse technique in information theory.