Prerequisites & Notation

Before You Begin

This chapter connects classical information theory to modern machine learning. We assume familiarity with entropy, mutual information, rate-distortion theory, and the multiple access channel. The reader should also have basic exposure to statistical learning theory (empirical risk minimization, generalization error).

  • Entropy and mutual information(Review ita/ch01)

    Self-check: Can you compute II for a jointly Gaussian pair?

  • Rate-distortion theory(Review ita/ch06)

    Self-check: Can you state the rate-distortion function for a Gaussian source under MSE?

  • Multiple access channel capacity(Review ita/ch14)

    Self-check: Can you sketch the MAC capacity region for a two-user Gaussian MAC?

  • KL divergence and variational characterizations(Review ita/ch01)

    Self-check: Can you state the Donsker-Varadhan representation of KL divergence?

  • Typicality and the AEP(Review ita/ch03)

    Self-check: Can you explain the role of typical sequences in random coding proofs?

Notation for This Chapter

In addition to the standard information-theoretic notation from earlier chapters, we introduce notation specific to machine learning and distributed computation.

SymbolMeaningIntroduced
IIMutual informationch01
HHShannon entropych01
DDKL divergencech01
RRRate-distortion functionch06
β\betaLagrange multiplier (IB tradeoff parameter)s01
TTBottleneck (compressed) representations01
LIB\mathcal{L}_{\text{IB}}Information bottleneck Lagrangians01
gen(W,S)\text{gen}(W, S)Generalization error of hypothesis WW on data SSs02
KKNumber of distributed users/workerss03
ZZAdditive noise random variablech01