Channel Capacity

The Channel Coding Theorem

We now state the central result of this chapter and, arguably, the most important theorem in information theory. The channel coding theorem connects the operational definition of capacity (the supremum of achievable rates) to an information-theoretic formula involving mutual information. The proof occupies the next two sections: achievability (Section 3) and converse (Section 4).

Theorem: The Channel Coding Theorem (Shannon, 1948)

The capacity of the DMC (X,PY∣X,Y)(\mathcal{X}, P_{Y|X}, \mathcal{Y}) is:

C=max⁑PXI(X;Y)C = \max_{P_X} I(X; Y)

Specifically:

  1. (Achievability) For any R<CR < C, there exists a sequence of (R,n)(R, n)-codes with Pe,max⁑→0P_{e,\max} \to 0 as nβ†’βˆžn \to \infty.
  2. (Converse) If a sequence of (R,n)(R, n)-codes has Pe(n)β†’0P_e^{(n)} \to 0, then R≀CR \leq C.

Why is max⁑PXI(X;Y)\max_{P_X} I(X; Y) the right formula? There are two ways to see this:

From the source coding side: The channel output YnY^n looks like a noisy version of the input XnX^n. The mutual information I(X;Y)I(X; Y) measures how many bits per symbol the channel can actually convey β€” the rest is lost to noise. Maximizing over PXP_X finds the input distribution that makes the most of the channel.

From the coding side: We need our 2nR2^{nR} codewords to be distinguishable at the decoder. The packing lemma tells us that about 2nI(X;Y)2^{nI(X;Y)} codewords can coexist without confusion. So the maximum rate is I(X;Y)I(X;Y), and we choose PXP_X to maximize this.

, ,

Properties of Channel Capacity

Several important properties follow from the formula C=max⁑PXI(X;Y)C = \max_{P_X} I(X; Y):

  1. Cβ‰₯0C \geq 0: Since I(X;Y)β‰₯0I(X; Y) \geq 0 for any PXP_X (with equality when XX and YY are independent, e.g., a completely noisy channel).

  2. C≀log⁑∣X∣C \leq \log|\mathcal{X}|: The input carries at most log⁑∣X∣\log|\mathcal{X}| bits.

  3. C≀log⁑∣Y∣C \leq \log|\mathcal{Y}|: The output can distinguish at most ∣Y∣|\mathcal{Y}| values.

  4. The maximum exists: I(X;Y)I(X; Y) is a continuous function of PXP_X over the compact simplex, so the maximum is achieved.

  5. The maximum is unique: I(X;Y)I(X; Y) is concave in PXP_X (for fixed PY∣XP_{Y|X}), so the maximizing PXβˆ—P_X^* is the unique global optimum. This is a convex optimization problem β€” we can compute capacity efficiently.

Definition:

Capacity-Achieving Input Distribution

The capacity-achieving input distribution PXβˆ—P_X^* is the probability distribution over X\mathcal{X} that maximizes I(X;Y)I(X; Y):

PXβˆ—=arg⁑max⁑PXI(X;Y)=arg⁑max⁑PX[H(Y)βˆ’H(Y∣X)]P_X^* = \arg\max_{P_X} I(X; Y) = \arg\max_{P_X} [H(Y) - H(Y|X)]

Since H(Y∣X)=βˆ‘xPX(x)H(Y∣X=x)H(Y|X) = \sum_x P_X(x) H(Y|X=x) is linear in PXP_X, maximizing I(X;Y)I(X;Y) reduces to maximizing H(Y)H(Y) subject to the constraint that the output distribution PYP_Y is determined by PXP_X through the channel law.

For symmetric channels, PXβˆ—P_X^* is the uniform distribution. For general channels, PXβˆ—P_X^* can be found via the Blahut-Arimoto algorithm (Section 6) or the KKT conditions.

Common Mistake: Capacity is Not Always log⁑∣X∣\log|\mathcal{X}|

Mistake:

Assuming that the capacity of every DMC is log⁑∣X∣\log|\mathcal{X}| (the maximum input entropy), or that a uniform input distribution always achieves capacity.

Correction:

The capacity is max⁑PX[H(Y)βˆ’H(Y∣X)]\max_{P_X} [H(Y) - H(Y|X)], not max⁑PXH(X)\max_{P_X} H(X). The noise reduces the information conveyed per channel use. For example, the BSC(pp) has capacity 1βˆ’H2(p)<1=log⁑21 - \mathcal{H}_2(p) < 1 = \log 2. The uniform input happens to be optimal for symmetric channels, but not in general.

Quick Check

Why is it important that I(X;Y)I(X; Y) is concave in PXP_X for fixed PY∣XP_{Y|X}?

It means every local maximum is the global maximum

It means the capacity is always positive

It means the capacity-achieving distribution is always uniform

Key Takeaway

The channel capacity C=max⁑PXI(X;Y)C = \max_{P_X} I(X;Y) is the single most important formula in information theory. It equates an operational quantity (the maximum reliable communication rate) with an information-theoretic quantity (the maximum mutual information). The concavity of I(X;Y)I(X;Y) in PXP_X makes the capacity computable as a convex optimization problem.