Prerequisites & Notation
Before You Begin
This chapter connects classical information theory to modern machine learning. We assume familiarity with entropy, mutual information, rate-distortion theory, and the multiple access channel. The reader should also have basic exposure to statistical learning theory (empirical risk minimization, generalization error).
- Entropy and mutual information(Review ita/ch01)
Self-check: Can you compute for a jointly Gaussian pair?
- Rate-distortion theory(Review ita/ch06)
Self-check: Can you state the rate-distortion function for a Gaussian source under MSE?
- Multiple access channel capacity(Review ita/ch14)
Self-check: Can you sketch the MAC capacity region for a two-user Gaussian MAC?
- KL divergence and variational characterizations(Review ita/ch01)
Self-check: Can you state the Donsker-Varadhan representation of KL divergence?
- Typicality and the AEP(Review ita/ch03)
Self-check: Can you explain the role of typical sequences in random coding proofs?
Notation for This Chapter
In addition to the standard information-theoretic notation from earlier chapters, we introduce notation specific to machine learning and distributed computation.
| Symbol | Meaning | Introduced |
|---|---|---|
| Mutual information | ch01 | |
| Shannon entropy | ch01 | |
| KL divergence | ch01 | |
| Rate-distortion function | ch06 | |
| Lagrange multiplier (IB tradeoff parameter) | s01 | |
| Bottleneck (compressed) representation | s01 | |
| Information bottleneck Lagrangian | s01 | |
| Generalization error of hypothesis on data | s02 | |
| Number of distributed users/workers | s03 | |
| Additive noise random variable | ch01 |