Prerequisites & Notation

Before You Begin

This chapter connects classical information theory to modern machine learning. We assume familiarity with entropy, mutual information, rate-distortion theory, and the multiple access channel. The reader should also have basic exposure to statistical learning theory (empirical risk minimization, generalization error).

Entropy and mutual information(Review ita/ch01)
Self-check: Can you compute $I$ for a jointly Gaussian pair?
Rate-distortion theory(Review ita/ch06)
Self-check: Can you state the rate-distortion function for a Gaussian source under MSE?
Multiple access channel capacity(Review ita/ch14)
Self-check: Can you sketch the MAC capacity region for a two-user Gaussian MAC?
KL divergence and variational characterizations(Review ita/ch01)
Self-check: Can you state the Donsker-Varadhan representation of KL divergence?
Typicality and the AEP(Review ita/ch03)
Self-check: Can you explain the role of typical sequences in random coding proofs?

Notation for This Chapter

In addition to the standard information-theoretic notation from earlier chapters, we introduce notation specific to machine learning and distributed computation.

Symbol	Meaning	Introduced
$I$	Mutual information	ch01
$H$	Shannon entropy	ch01
$D$	KL divergence	ch01
$R$	Rate-distortion function	ch06
$\beta$	Lagrange multiplier (IB tradeoff parameter)	s01
$T$	Bottleneck (compressed) representation	s01
$\mathcal{L}_{\text{IB}}$	Information bottleneck Lagrangian	s01
$\text{gen}(W, S)$	Generalization error of hypothesis $W$ on data $S$	s02
$K$	Number of distributed users/workers	s03
$Z$	Additive noise random variable	ch01

← Ch 27 The Information Bottleneck