Ferkans — Interactive Telecom Tutor

Why Rates Above Capacity Are Impossible

The achievability proof showed that rates below $C$ are achievable. Now we prove the converse: rates above $C$ are not achievable. Equivalently, if any code sequence has vanishing error probability, its rate must be at most $C$ .

The proof is shorter than the achievability proof and uses only two tools: Fano's inequality (which converts small error probability into an entropy bound) and the chain rule for mutual information (which decomposes the block mutual information into single-letter terms). This is the "converse proof pattern" that we will reuse for every channel model.

Theorem: Channel Coding Theorem — Converse

If a sequence of $(R, n)$ -codes has $P_e^{(n)} \to 0$ as $n \to \infty$ , then $R \leq C = \max_{P_X} I(X; Y)$ .

Equivalently: for any $R > C$ , every sequence of $(R, n)$ -codes has $\liminf_{n \to \infty} P_e^{(n)} > 0$ .

The channel can convey at most $I(X_i; Y_i) \leq C$ bits per use. Over $n$ uses, the total information conveyed is at most $nC$ bits. If the message has $nR$ bits, then for $R > C$ , there are more message bits than the channel can handle, and errors must occur. Fano's inequality makes this argument precise.

Proof

Setup and Fano's inequality

Let $M$ be uniform on $\mathcal{M} = [1 : 2^{nR}]$ . The encoder produces $X^n = f(M)$ and the decoder produces $\hat{M} = g(Y^n)$ .

By Fano's inequality: $H(M | Y^n) \leq 1 + P_e^{(n)} \cdot nR \triangleq n\epsilon_n$

where $\epsilon_n = \frac{1}{n} + P_e^{(n)} R \to 0$ as $n \to \infty$ (since $P_e^{(n)} \to 0$ ).

Bound the rate via mutual information

$nR = H(M) = I(M; Y^n) + H(M | Y^n)KATEXPLACEHOLDER0END\leq I(M; Y^n) + n\epsilon_nKATEXPLACEHOLDER1ENDI(M; Y^n) = \sum_{i=1}^n I(M; Y_i | Y^{i-1})$ $

Single-letter bound via the memoryless property

For each term: $I(M; Y_i | Y^{i-1}) \leq I(M, Y^{i-1}; Y_i) = I(X_i; Y_i)$

The first inequality adds extra conditioning variables (which can only increase mutual information). The equality uses two facts:

$X_i = f_i(M)$ is a function of $M$ (possibly also of $Y^{i-1}$ if feedback is available, but for now $X_i$ is determined by $M$ alone)
$Y_i$ depends on $(M, Y^{i-1})$ only through $X_i$ (memoryless property): $I(M, Y^{i-1}; Y_i | X_i) = 0$

Therefore: $I(M; Y_i | Y^{i-1}) \leq I(X_i; Y_i) \leq C$ .

Conclude

Combining: $nR \leq \sum_{i=1}^n I(X_i; Y_i) + n\epsilon_n \leq nC + n\epsilon_n$

Dividing by $n$ and letting $n \to \infty$ (so $\epsilon_n \to 0$ ): $R \leq C$

,

The Converse Proof Pattern

The converse proof has a clean three-step structure that we will reuse for every channel model in the rest of the book:

Fano: Convert $P_e^{(n)} \to 0$ into $H(M | Y^n) \leq n\epsilon_n$
Chain rule: Decompose $I(M; Y^n) = \sum_{i=1}^n I(M; Y_i | Y^{i-1})$
Single-letter bound: Use the memoryless property to bound each term by $C$

For more complex channels (MAC, broadcast, interference), the chain rule step becomes more involved, but the overall structure remains the same. The reader should internalize this pattern: Fano $\to$ chain rule $\to$ single-letter bound.

Example: A Weaker Converse via Counting

Give a simple (non-Fano) argument for why $R \leq \log|\mathcal{Y}|$ for any achievable rate. Why is this weaker than the capacity bound $R \leq C$ ?

Solution

Counting argument

The decoder observes $y^n \in \mathcal{Y}^n$ and must identify the correct message from $\mathcal{M} = [1 : 2^{nR}]$ . The decoder function $g : \mathcal{Y}^n \to \mathcal{M}$ can map to at most $|\mathcal{Y}|^n$ distinct values. For reliable decoding, we need $2^{nR} \leq |\mathcal{Y}|^n$ , giving $R \leq \log|\mathcal{Y}|$ .

Why this is weaker

This bound ignores the noise entirely! It says the rate cannot exceed the output alphabet size, but it does not account for the channel's noise structure. For the BSC with $p$ close to $1/2$ , we have $\log|\mathcal{Y}| = 1$ but $C = 1 - \mathcal{H}_2(p) \approx 0$ . The capacity bound is much tighter because it accounts for the noise through $H(Y|X)$ .

Common Mistake: Weak vs. Strong Converse

Mistake:

Assuming the converse proves that $P_e^{(n)} \to 1$ for $R > C$ . The standard converse only shows $\liminf P_e^{(n)} > 0$ (the error does not vanish), not that it approaches 1.

Correction:

The weak converse shows: if $R > C$ , then $P_e^{(n)} \not\to 0$ . The strong converse (proved separately for DMCs by Wolfowitz) shows: if $R > C$ , then $P_e^{(n)} \to 1$ . For DMCs, both hold, but the strong converse requires additional arguments beyond Fano's inequality.

Quick Check

In the converse proof, Fano's inequality gives $H(M | Y^n) \leq 1 + P_e^{(n)} nR$ . What role does this play?

It shows that the encoder must use the capacity-achieving distribution

It converts small error probability into a bound on conditional entropy, which then bounds the rate

It provides the random coding argument for the achievability

Correction:

It converts small error probability into a bound on conditional entropy, which then bounds the rate

Correct! Fano's inequality says: if $P_e$ is small, then $\H(M|Y^n)$ is small, which means $\I(M; Y^n)$ is close to $n\R$ . This forces the channel to convey nearly $n\R$ bits, which requires $n\R \\leq n\C$ .

Key Takeaway

The converse shows that $R \leq C$ is necessary for reliable communication. The proof uses Fano's inequality to convert $P_e \to 0$ into $H(M|Y^n) \to 0$ , then the chain rule to decompose $I(M; Y^n)$ into single-letter terms, each bounded by $C$ . This "Fano $\to$ chain rule $\to$ single-letter bound" pattern is the universal converse technique in information theory.

The Channel Coding Theorem — Converse

Why Rates Above Capacity Are Impossible

Theorem: Channel Coding Theorem — Converse

Setup and Fano's inequality

Bound the rate via mutual information

Single-letter bound via the memoryless property

Conclude

The Converse Proof Pattern

Example: A Weaker Converse via Counting

Counting argument

Why this is weaker

Common Mistake: Weak vs. Strong Converse

Quick Check

Key Takeaway