Chapter Summary

Key Points

1.
Conditional probability is a probability measure. $\mathbb{P}(A \mid B) = \mathbb{P}(A \cap B)/\mathbb{P}(B)$ satisfies the Kolmogorov axioms for fixed $B$ . The multiplication rule and chain rule follow directly: every joint probability factors into a telescoping product of conditional probabilities.
2.
Bayes' theorem inverts the conditioning direction. Given a partition $\{B_i\}$ and likelihoods $\mathbb{P}(A \mid B_i)$ , the posterior $\mathbb{P}(B_k \mid A) \propto \mathbb{P}(A \mid B_k)\mathbb{P}(B_k)$ . This prior-likelihood-posterior update is the mathematical foundation of MAP detection, Bayesian channel estimation, and belief propagation.
3.
Independence means no information flows. $A \perp B$ iff $\mathbb{P}(A \mid B) = \mathbb{P}(A)$ . Mutual independence for $n$ events requires $2^n - n - 1$ product conditions — pairwise independence is strictly weaker. Disjointness is the opposite of independence for events with positive probability.
4.
Conditional independence is the language of graphical models. $A \perp B \mid C$ means $C$ screens off all information flow between $A$ and $B$ . Conditioning on a common cause removes correlation; conditioning on a common effect (collider) introduces it. The Markov chain $A \multimap B \multimap C$ encodes $A \perp C \mid B$ .
5.
Three distributions from one experiment. Binomial $\text{Bin}(n,p)$ , geometric $\text{Geom}(p)$ , and negative binomial $\text{NegBin}(r,p)$ all arise from repeated independent Bernoulli $(p)$ trials by asking different questions about the count of successes. The geometric distribution's memoryless property mirrors the exponential distribution.

Looking Ahead

Chapter 3 applies the tools of this chapter to reliability and combinatorial probability. Chapter 5 lifts the framework from events to random variables, where conditional probability becomes conditional distribution and independence becomes the factorisation of joint PMFs. The Markov chain concept returns in Chapter 13 as a model for stochastic processes evolving in discrete time, and Bayes' theorem reappears in Book FSI as the optimal detection rule.

Repeated Independent Trials Exercises