Chapter Summary

Chapter 8 Summary

Key Points

  • 1.

    Source coding with a helper bridges standard compression and Slepian-Wolf coding: the helper sends a rate-limited description of side information YY to reduce the rate needed for XX. Wyner's common information quantifies the shared structure.

  • 2.

    Practical distributed video coding uses LDPC syndromes for Slepian-Wolf coding, shifting complexity from encoder to decoder — ideal for power-constrained sensors and multi-camera systems.

  • 3.

    The information bottleneck is a rate-distortion problem where distortion measures relevance to a target variable: minI(X;T)βI(T;Y)\min I(X; T) - \beta \cdot I(T; Y). It provides the right framework for understanding representation learning.

  • 4.

    The VAE loss is a variational upper bound on the rate-distortion function with log-loss distortion. The KL term bounds I(X;Z)I(X; Z) (rate) and the reconstruction loss is the distortion.

  • 5.

    The connections between information theory and machine learning are exact mathematical relationships, not merely analogies. Rate-distortion theory guides the design of learned compression, neural codecs, and semantic communication systems.

Looking Ahead

With Parts I and II complete, we have developed the full machinery of source coding: from entropy and typicality through lossless and lossy compression to distributed source coding and its connections to modern machine learning. Part III turns to the complementary problem: how much information can be reliably communicated over a noisy channel? Chapter 9 develops the channel coding theorem for discrete memoryless channels, establishing the other pillar of Shannon's theory.

Evidence lower bound (ELBO)

A variational lower bound on the log-marginal likelihood logp(x)\log p(x): ELBO=Eq(zx)[logp(xz)]D(q(zx)p(z))\text{ELBO} = \mathbb{E}_{q(z|x)}[\log p(x|z)] - D(q(z|x) \| p(z)). Maximizing the ELBO is equivalent to minimizing the VAE loss, which is a rate-distortion objective.

Related: VAE as a Rate-Distortion Upper Bound, Rate-distortion theory and the information bottlen…, Kullback-Leibler Divergence (Relative Entropy)

β\beta-VAE

A variant of the variational autoencoder that weights the KL term by β\beta: L=reconstruction loss+βD(q(zx)p(z))\mathcal{L} = \text{reconstruction loss} + \beta \cdot D(q(z|x) \| p(z)). Varying β\beta sweeps the rate-distortion curve, with β>1\beta > 1 encouraging disentangled representations (more compression) and β<1\beta < 1 favoring reconstruction quality.

Related: VAE as a Rate-Distortion Upper Bound, Information bottleneck, Rate-distortion theory and the information bottlen…

🎓CommIT Contribution(1999)

The Information Bottleneck Method

N. Tishby, F. C. Pereira, W. BialekProc. 37th Allerton Conference on Communication, Control, and Computing

Tishby, Pereira, and Bialek introduced the information bottleneck as a principled method for extracting relevant information from data. By framing representation learning as a rate-distortion problem with relevance-based distortion, they connected lossy source coding to statistical learning in a way that has influenced both fields. The IB method was later applied to deep learning theory, clustering, and feature selection, and remains a cornerstone of the information-theoretic approach to machine learning.

information-bottleneckmachine-learningrate-distortion