References & Further Reading

References

N. Tishby, F. C. Pereira, and W. Bialek, The Information Bottleneck Method, 1999
The foundational paper introducing the IB framework. Derives the self-consistent equations and the connection to rate-distortion theory.
R. Shwartz-Ziv and N. Tishby, Opening the Black Box of Deep Neural Networks via Information, 2017
Proposes the information plane hypothesis for deep learning: networks first fit then compress. Sparked intense debate about the role of compression in deep learning.
A. Xu and M. Raginsky, Information-Theoretic Analysis of Generalization Capability of Learning Algorithms, 2017
Derives the mutual information bound on generalization error, connecting learning theory to information theory via the Donsker-Varadhan representation.
Y. Bu, S. Zou, and V. V. Veeravalli, Tightening Mutual Information-Based Bounds on Generalization Error, 2020
Introduces the individual-sample MI bound, which is tighter than the Xu-Raginsky bound by capturing per-sample memorization.
Y. Zhang, J. Duchi, M. I. Jordan, and M. J. Wainwright, Information-Theoretic Lower Bounds for Distributed Statistical Estimation with Communication Constraints, 2013
Establishes the minimax lower bounds for distributed estimation under communication constraints, showing the phase transition between statistical and communication regimes.
A. T. Suresh, F. X. Yu, S. Kumar, and H. B. McMahan, Distributed Mean Estimation with Limited Communication, 2017
Designs practical quantization schemes for distributed mean estimation and proves near-optimal communication complexity.
B. Nazer and M. Gastpar, Computation over Multiple-Access Channels, 2007
Foundational work on function computation over the MAC. Introduces the computation capacity and characterizes it for structured functions.
H. B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, Communication-Efficient Learning of Deep Networks from Decentralized Data, 2017
Introduces the FedAvg algorithm and the federated learning framework. The paper that launched modern federated learning research.
T. M. Cover and J. A. Thomas, Elements of Information Theory, Wiley, 2nd ed., 2006
The standard reference for information theory. Chapters on rate-distortion theory and the data processing inequality are prerequisites for this chapter.
A. El Gamal and Y.-H. Kim, Network Information Theory, Cambridge University Press, 2011
Comprehensive treatment of multi-user information theory including the MAC capacity region and distributed source coding, which are foundations for the AirComp analysis.