References & Further Reading

References

  1. S. M. Ross, A First Course in Probability, Pearson, 10th ed., 2019

    The primary reference for this chapter. Ross covers discrete random variables, expectation, variance, and all named distributions with clear examples and a wealth of exercises. Chapters 4-5 map directly to our Sections 5.1-5.4.

  2. G. R. Grimmett and D. R. Stirzaker, Probability and Random Processes, Oxford University Press, 4th ed., 2020

    A more rigorous treatment than Ross, suitable for readers who want the measure-theoretic foundations. Excellent for the CDF properties and convergence theorems.

  3. W. Feller, An Introduction to Probability Theory and Its Applications, Vol. I, Wiley, 3rd ed., 1968

    A classic that develops the combinatorial and generating-function approach to discrete probability in extraordinary depth. The historical notes are a treasure.

  4. A. Papoulis and S. U. Pillai, Probability, Random Variables, and Stochastic Processes, McGraw-Hill, 4th ed., 2002

    The standard engineering-oriented probability textbook. Strong on moment generating functions and connections to signal processing.

  5. P. Billingsley, Probability and Measure, Wiley, 3rd ed., 1995

    The definitive graduate text on measure-theoretic probability. Consult for rigorous foundations of random variables, measurability, and integration.

  6. T. M. Cover and J. A. Thomas, Elements of Information Theory, Wiley-Interscience, 2nd ed., 2006

    The standard reference for Shannon entropy and information measures. Chapter 2 covers discrete entropy in depth, extending what we introduce in Section 5.5.

  7. C. E. Shannon, A Mathematical Theory of Communication, 1948

    The founding paper of information theory. Shannon introduces entropy and proves the source and channel coding theorems.

  8. G. Caire and S. Shamai (Shitz), On the Achievable Throughput of a Multiantenna Gaussian Broadcast Channel, 2003

    Landmark paper on MIMO broadcast channel capacity using dirty paper coding. Demonstrates how entropy-based arguments characterize multiuser capacity regions.

  9. N. L. Johnson, A. W. Kemp, and S. Kotz, Univariate Discrete Distributions, Wiley, 3rd ed., 2005

    The encyclopedia of discrete distributions. Comprehensive reference for properties, relationships, and applications of every named discrete distribution.

  10. R. B. Ash, Information Theory, Dover, 1965

    A concise introduction to information theory with a clear treatment of entropy properties. Useful as a complement to Cover and Thomas for readers seeking a shorter exposition.

Further Reading

Resources for deepening your understanding of discrete random variables and entropy.

  • Generating functions for discrete distributions

    W. Feller, An Introduction to Probability Theory and Its Applications, Vol. I, Wiley, 1968, Chapters XI-XII

    Feller develops the probability generating function approach in extraordinary depth, providing elegant proofs of limit theorems and distribution relationships that complement the MGF approach used here.

  • Information theory and entropy

    Book ITA, Chapter 1 — Information Measures for Discrete Random Variables

    Our Section 5.5 is a preview. The ITA book develops entropy, mutual information, KL divergence, and all fundamental inequalities in full depth.

  • Discrete distributions in network modeling

    L. Kleinrock, Queueing Systems, Vol. I: Theory, Wiley, 1975

    Shows how the Poisson, geometric, and negative binomial distributions arise naturally in queueing and network traffic analysis.

  • Concentration inequalities beyond variance

    Book FSP, Chapter 10 — Concentration Inequalities and Large Deviations

    The variance tells us about typical fluctuations, but concentration inequalities (Markov, Chebyshev, Chernoff, Hoeffding) provide exponential tail bounds that go far beyond what the variance alone can tell us.