References & Further Reading
References
- T. M. Cover and J. A. Thomas, Elements of Information Theory, Wiley-Interscience, 2nd ed., 2006
The standard graduate textbook on information theory. Chapters 2-4 cover the material of this chapter with exceptional clarity. The exercises are particularly valuable.
- C. E. Shannon, A Mathematical Theory of Communication, 1948
The founding paper of information theory. Shannon introduces entropy, mutual information, and channel capacity, and proves the source and channel coding theorems. Still remarkably readable after 75 years.
- R. G. Gallager, Information Theory and Reliable Communication, Wiley, 1968
Gallager's textbook emphasizes the engineering perspective and provides rigorous proofs with excellent geometric intuition. The treatment of error exponents (Chapter 5) goes beyond Cover and Thomas.
- I. Csiszár and J. Körner, Information Theory: Coding Theorems for Discrete Memoryless Systems, Cambridge University Press, 2nd ed., 2011
The definitive reference for the method of types and error exponents. More advanced than Cover and Thomas, with a rigorous combinatorial approach. Essential for Chapters 3-4 of this book.
- S. Kullback and R. A. Leibler, On Information and Sufficiency, 1951
The original paper introducing KL divergence. Establishes the connection between information theory and statistical sufficiency.
- R. E. Blahut, Computation of Channel Capacity and Rate-Distortion Functions, 1972
Introduces the Blahut-Arimoto algorithm for computing channel capacity via alternating optimization, exploiting the concavity established in this chapter.
- E. T. Jaynes, Information Theory and Statistical Mechanics, 1957
Jaynes' seminal paper connecting maximum entropy to statistical mechanics. Establishes the maximum entropy principle as a general method for statistical inference.
- A. El Gamal and Y.-H. Kim, Network Information Theory, Cambridge University Press, 2011
The modern reference for multiuser information theory. Chapter 2 provides an alternative treatment of the fundamentals with emphasis on the tools needed for network settings.
- R. W. Yeung, A First Course in Information Theory, Springer, 2002
Develops information inequalities systematically using the entropy function region framework. Particularly strong on non-Shannon inequalities and their applications.
- A. Rényi, On Measures of Entropy and Information, 1961
Introduces the Rényi entropy family, generalizing Shannon entropy. Foundational for one-shot information theory and security analysis.
Further Reading
Resources for deepening your understanding of discrete information measures.
Historical context of information theory
J. R. Pierce, An Introduction to Information Theory: Symbols, Signals and Noise, Dover, 1980
A non-technical introduction written by a Bell Labs colleague of Shannon. Provides the historical context and physical intuition that formal textbooks often lack.
Information-theoretic inequalities
R. W. Yeung, A First Course in Information Theory, Springer, 2002
Yeung develops information inequalities systematically using the entropy function region framework. Particularly useful for understanding when information inequalities are "Shannon-type" and when non-Shannon inequalities are needed.
Connections to statistical inference
Book FSI, Ch. 1 — Hypothesis Testing Fundamentals
KL divergence governs the error exponents in hypothesis testing (Stein's lemma). This connection between information theory and statistics is fundamental and bidirectional.
Entropy in physics and computation
C. H. Bennett, 'The Thermodynamics of Computation — A Review,' Int. J. Theor. Physics, 1982
Explores the deep connection between Shannon entropy and thermodynamic entropy, including Landauer's principle that erasing one bit of information dissipates at least $k_B T \ln 2$ joules.