Chapter Summary
Chapter Summary
Key Points
- 1.
Entropy quantifies average uncertainty. For a discrete RV with PMF , is bounded by , with the upper bound achieved by the uniform distribution. Entropy equals the minimum average description length for an i.i.d. source.
- 2.
The chain rule decomposes joint entropy. , and more generally . This telescoping structure is the backbone of most converse proofs.
- 3.
Mutual information measures the information one variable provides about another. It is symmetric, non-negative, and equals zero iff . Channel capacity is .
- 4.
KL divergence is the mother inequality. Non-negativity of mutual information, the entropy upper bound, and the data processing inequality all follow from .
- 5.
Concavity of in makes capacity computable. The capacity optimization is a concave maximization β any local maximum is global, and algorithms like Blahut-Arimoto converge to it.
- 6.
The data processing inequality says processing cannot create information. For : . Equality holds iff is a sufficient statistic.
- 7.
Fano's inequality converts error probability to entropy bounds. . This is the key tool for proving that rates above capacity lead to unavoidable errors.
- 8.
Maximum entropy distributions under constraints form exponential families. Uniform maximizes entropy without constraints; geometric under a mean constraint; discrete Gaussian under mean and variance constraints. This Lagrangian technique reappears as waterfilling in Gaussian channels.
Looking Ahead
Chapter 2 extends these information measures to continuous random variables, where "differential entropy" replaces entropy and the Gaussian distribution plays a starring role as the continuous maximum-entropy distribution under a variance constraint. The fundamental result β that Gaussian noise is the worst-case additive noise β leads directly to the capacity formula for the AWGN channel.