References & Further Reading

References

  1. T. M. Cover and J. A. Thomas, Elements of Information Theory, Wiley-Interscience, 2nd ed., 2006

    The standard graduate textbook on information theory. Chapters 2-4 cover the material of this chapter with exceptional clarity. The exercises are particularly valuable.

  2. C. E. Shannon, A Mathematical Theory of Communication, 1948

    The founding paper of information theory. Shannon introduces entropy, mutual information, and channel capacity, and proves the source and channel coding theorems. Still remarkably readable after 75 years.

  3. R. G. Gallager, Information Theory and Reliable Communication, Wiley, 1968

    Gallager's textbook emphasizes the engineering perspective and provides rigorous proofs with excellent geometric intuition. The treatment of error exponents (Chapter 5) goes beyond Cover and Thomas.

  4. I. Csiszár and J. Körner, Information Theory: Coding Theorems for Discrete Memoryless Systems, Cambridge University Press, 2nd ed., 2011

    The definitive reference for the method of types and error exponents. More advanced than Cover and Thomas, with a rigorous combinatorial approach. Essential for Chapters 3-4 of this book.

  5. S. Kullback and R. A. Leibler, On Information and Sufficiency, 1951

    The original paper introducing KL divergence. Establishes the connection between information theory and statistical sufficiency.

  6. R. E. Blahut, Computation of Channel Capacity and Rate-Distortion Functions, 1972

    Introduces the Blahut-Arimoto algorithm for computing channel capacity via alternating optimization, exploiting the concavity established in this chapter.

  7. E. T. Jaynes, Information Theory and Statistical Mechanics, 1957

    Jaynes' seminal paper connecting maximum entropy to statistical mechanics. Establishes the maximum entropy principle as a general method for statistical inference.

  8. A. El Gamal and Y.-H. Kim, Network Information Theory, Cambridge University Press, 2011

    The modern reference for multiuser information theory. Chapter 2 provides an alternative treatment of the fundamentals with emphasis on the tools needed for network settings.

  9. R. W. Yeung, A First Course in Information Theory, Springer, 2002

    Develops information inequalities systematically using the entropy function region framework. Particularly strong on non-Shannon inequalities and their applications.

  10. A. Rényi, On Measures of Entropy and Information, 1961

    Introduces the Rényi entropy family, generalizing Shannon entropy. Foundational for one-shot information theory and security analysis.

Further Reading

Resources for deepening your understanding of discrete information measures.

  • Historical context of information theory

    J. R. Pierce, An Introduction to Information Theory: Symbols, Signals and Noise, Dover, 1980

    A non-technical introduction written by a Bell Labs colleague of Shannon. Provides the historical context and physical intuition that formal textbooks often lack.

  • Information-theoretic inequalities

    R. W. Yeung, A First Course in Information Theory, Springer, 2002

    Yeung develops information inequalities systematically using the entropy function region framework. Particularly useful for understanding when information inequalities are "Shannon-type" and when non-Shannon inequalities are needed.

  • Connections to statistical inference

    Book FSI, Ch. 1 — Hypothesis Testing Fundamentals

    KL divergence governs the error exponents in hypothesis testing (Stein's lemma). This connection between information theory and statistics is fundamental and bidirectional.

  • Entropy in physics and computation

    C. H. Bennett, 'The Thermodynamics of Computation — A Review,' Int. J. Theor. Physics, 1982

    Explores the deep connection between Shannon entropy and thermodynamic entropy, including Landauer's principle that erasing one bit of information dissipates at least $k_B T \ln 2$ joules.