Chapter Summary

Key Points

1.
Differential entropy $h(X) = -\int f \log f\,dx$ extends entropy to continuous RVs. Unlike discrete entropy, it can be negative, depends on coordinates, and has no direct operational meaning. But mutual information $I(X;Y) = h(X) - h(X|Y) \geq 0$ is well-defined and coordinate-invariant.
2.
The Gaussian maximizes differential entropy under a variance constraint: $h(X) \leq \frac{1}{2}\log(2\pi e \sigma^2)$ , with equality iff $X \sim \mathcal{N}(\mu, \sigma^2)$ . This is the foundation of all Gaussian channel results.
3.
For Gaussian vectors, $h(\mathbf{X}) \leq \frac{1}{2}\log((2\pi e)^n \det(\boldsymbol{\Sigma}))$ . The covariance determinant measures the uncertainty volume. Hadamard's inequality $\det(\boldsymbol{\Sigma}) \leq \prod K_{ii}$ follows as a corollary.
4.
The entropy power inequality (EPI): $N(X+Y) \geq N(X) + N(Y)$ . Entropy powers are superadditive for independent summands. This proves Gaussian noise is worst-case and underpins converse proofs for Gaussian channels and broadcast channels.
5.
AWGN capacity is $C = \frac{1}{2}\log(1 + \text{SNR})$ . Achieved by Gaussian input $X \sim \mathcal{N}(0, P)$ . The achievability uses the maximum entropy property; the converse uses the EPI.
6.
Differential entropy equals quantization entropy minus quantization cost: $h(X) = \lim_{\Delta \to 0}[H(X^\Delta) + \log \Delta]$ . This explains why $h$ can be negative and connects continuous information theory to practical quantization.

Looking Ahead

Chapter 3 introduces the concept of typicality — the key technical tool for proving coding theorems. The asymptotic equipartition property (AEP) shows that for long i.i.d. sequences, only about $2^{nH(X)}$ sequences out of $|\mathcal{X}|^n$ carry appreciable probability. This "concentration of measure" phenomenon is the engine behind both source coding and channel coding.

Connections Between Discrete and Continuous Entropy Exercises