References & Further Reading
References
- S. M. Ross, A First Course in Probability, Pearson, 10th ed., 2019
The primary reference for this chapter. Ross covers discrete random variables, expectation, variance, and all named distributions with clear examples and a wealth of exercises. Chapters 4-5 map directly to our Sections 5.1-5.4.
- G. R. Grimmett and D. R. Stirzaker, Probability and Random Processes, Oxford University Press, 4th ed., 2020
A more rigorous treatment than Ross, suitable for readers who want the measure-theoretic foundations. Excellent for the CDF properties and convergence theorems.
- W. Feller, An Introduction to Probability Theory and Its Applications, Vol. I, Wiley, 3rd ed., 1968
A classic that develops the combinatorial and generating-function approach to discrete probability in extraordinary depth. The historical notes are a treasure.
- A. Papoulis and S. U. Pillai, Probability, Random Variables, and Stochastic Processes, McGraw-Hill, 4th ed., 2002
The standard engineering-oriented probability textbook. Strong on moment generating functions and connections to signal processing.
- P. Billingsley, Probability and Measure, Wiley, 3rd ed., 1995
The definitive graduate text on measure-theoretic probability. Consult for rigorous foundations of random variables, measurability, and integration.
- T. M. Cover and J. A. Thomas, Elements of Information Theory, Wiley-Interscience, 2nd ed., 2006
The standard reference for Shannon entropy and information measures. Chapter 2 covers discrete entropy in depth, extending what we introduce in Section 5.5.
- C. E. Shannon, A Mathematical Theory of Communication, 1948
The founding paper of information theory. Shannon introduces entropy and proves the source and channel coding theorems.
- G. Caire and S. Shamai (Shitz), On the Achievable Throughput of a Multiantenna Gaussian Broadcast Channel, 2003
Landmark paper on MIMO broadcast channel capacity using dirty paper coding. Demonstrates how entropy-based arguments characterize multiuser capacity regions.
- N. L. Johnson, A. W. Kemp, and S. Kotz, Univariate Discrete Distributions, Wiley, 3rd ed., 2005
The encyclopedia of discrete distributions. Comprehensive reference for properties, relationships, and applications of every named discrete distribution.
- R. B. Ash, Information Theory, Dover, 1965
A concise introduction to information theory with a clear treatment of entropy properties. Useful as a complement to Cover and Thomas for readers seeking a shorter exposition.
Further Reading
Resources for deepening your understanding of discrete random variables and entropy.
Generating functions for discrete distributions
W. Feller, An Introduction to Probability Theory and Its Applications, Vol. I, Wiley, 1968, Chapters XI-XII
Feller develops the probability generating function approach in extraordinary depth, providing elegant proofs of limit theorems and distribution relationships that complement the MGF approach used here.
Information theory and entropy
Book ITA, Chapter 1 — Information Measures for Discrete Random Variables
Our Section 5.5 is a preview. The ITA book develops entropy, mutual information, KL divergence, and all fundamental inequalities in full depth.
Discrete distributions in network modeling
L. Kleinrock, Queueing Systems, Vol. I: Theory, Wiley, 1975
Shows how the Poisson, geometric, and negative binomial distributions arise naturally in queueing and network traffic analysis.
Concentration inequalities beyond variance
Book FSP, Chapter 10 — Concentration Inequalities and Large Deviations
The variance tells us about typical fluctuations, but concentration inequalities (Markov, Chebyshev, Chernoff, Hoeffding) provide exponential tail bounds that go far beyond what the variance alone can tell us.