Chapter Summary

Key Points

1.
A random variable is a measurable function $X : \Omega \to \mathbb{R}$ . The PMF $P(x) = \mathbb{P}(X = x)$ and CDF $F(x) = \mathbb{P}(X \leq x)$ fully characterize the distribution of a discrete RV. Once we have these, we can forget the underlying sample space.
2.
Expectation $\mathbb{E}[X] = \sum_x x \, P(x)$ is the probability-weighted average. Its most powerful property is linearity: $\mathbb{E}[\sum a_i X_i] = \sum a_i \mathbb{E}[X_i]$ , which holds without any independence assumption. LOTUS lets us compute $\mathbb{E}[g(X)]$ directly from the PMF of $X$ .
3.
Variance $\text{Var}(X) = \mathbb{E}[X^2] - (\mathbb{E}[X])^2$ measures spread around the mean. The shortcut formula is almost always easier than the definition. Variance of a sum equals the sum of variances only for independent (or uncorrelated) RVs.
4.
Seven named distributions form a core toolkit. Bernoulli (single trial), Binomial (count of successes), Geometric (wait for first success), Negative Binomial (wait for $r$ -th success), Poisson (rare events), Discrete Uniform (equal likelihood), and Hypergeometric (sampling without replacement). Each is characterized by its PMF, mean, variance, and MGF.
5.
The Poisson distribution arises as a limit of the binomial ( $n \to \infty$ , $p \to 0$ , $np \to \lambda$ ). Its signature property is that mean equals variance. It is the default model for counting rare events and the starting point for queueing theory.
6.
The geometric distribution is the only memoryless discrete distribution. Given that you have waited $m$ trials without success, the remaining wait has the same distribution as starting fresh.
7.
Shannon entropy $H(X) = -\sum p(x) \log_2 p(x)$ quantifies average uncertainty. It is bounded by $0 \leq H(X) \leq \log_2 |\mathcal{X}|$ , with the uniform distribution achieving the maximum. Entropy depends only on probabilities, not on numerical values, and it equals the minimum average description length for a source.

Looking Ahead

Chapter 6 extends the theory to continuous random variables, where the PMF is replaced by a probability density function (PDF) and sums become integrals. The key distributions — Gaussian, exponential, gamma — are the continuous counterparts of the discrete distributions studied here. The Gaussian distribution will play a central role in the remainder of this book, just as the Poisson distribution dominates discrete modeling.

Entropy of Discrete Random Variables Exercises