Chapter Summary
Chapter 8 Summary
Key Points
- 1.
Source coding with a helper bridges standard compression and Slepian-Wolf coding: the helper sends a rate-limited description of side information to reduce the rate needed for . Wyner's common information quantifies the shared structure.
- 2.
Practical distributed video coding uses LDPC syndromes for Slepian-Wolf coding, shifting complexity from encoder to decoder — ideal for power-constrained sensors and multi-camera systems.
- 3.
The information bottleneck is a rate-distortion problem where distortion measures relevance to a target variable: . It provides the right framework for understanding representation learning.
- 4.
The VAE loss is a variational upper bound on the rate-distortion function with log-loss distortion. The KL term bounds (rate) and the reconstruction loss is the distortion.
- 5.
The connections between information theory and machine learning are exact mathematical relationships, not merely analogies. Rate-distortion theory guides the design of learned compression, neural codecs, and semantic communication systems.
Looking Ahead
With Parts I and II complete, we have developed the full machinery of source coding: from entropy and typicality through lossless and lossy compression to distributed source coding and its connections to modern machine learning. Part III turns to the complementary problem: how much information can be reliably communicated over a noisy channel? Chapter 9 develops the channel coding theorem for discrete memoryless channels, establishing the other pillar of Shannon's theory.
Evidence lower bound (ELBO)
A variational lower bound on the log-marginal likelihood : . Maximizing the ELBO is equivalent to minimizing the VAE loss, which is a rate-distortion objective.
Related: VAE as a Rate-Distortion Upper Bound, Rate-distortion theory and the information bottlen…, Kullback-Leibler Divergence (Relative Entropy)
-VAE
A variant of the variational autoencoder that weights the KL term by : . Varying sweeps the rate-distortion curve, with encouraging disentangled representations (more compression) and favoring reconstruction quality.
Related: VAE as a Rate-Distortion Upper Bound, Information bottleneck, Rate-distortion theory and the information bottlen…
The Information Bottleneck Method
Tishby, Pereira, and Bialek introduced the information bottleneck as a principled method for extracting relevant information from data. By framing representation learning as a rate-distortion problem with relevance-based distortion, they connected lossy source coding to statistical learning in a way that has influenced both fields. The IB method was later applied to deep learning theory, clustering, and feature selection, and remains a cornerstone of the information-theoretic approach to machine learning.