Chapter Summary

Chapter Summary

Key Points

  • 1.

    Conditional probability is a probability measure. P(A∣B)=P(A∩B)/P(B)\mathbb{P}(A \mid B) = \mathbb{P}(A \cap B)/\mathbb{P}(B) satisfies the Kolmogorov axioms for fixed BB. The multiplication rule and chain rule follow directly: every joint probability factors into a telescoping product of conditional probabilities.

  • 2.

    Bayes' theorem inverts the conditioning direction. Given a partition {Bi}\{B_i\} and likelihoods P(A∣Bi)\mathbb{P}(A \mid B_i), the posterior P(Bk∣A)∝P(A∣Bk)P(Bk)\mathbb{P}(B_k \mid A) \propto \mathbb{P}(A \mid B_k)\mathbb{P}(B_k). This prior-likelihood-posterior update is the mathematical foundation of MAP detection, Bayesian channel estimation, and belief propagation.

  • 3.

    Independence means no information flows. AβŠ₯BA \perp B iff P(A∣B)=P(A)\mathbb{P}(A \mid B) = \mathbb{P}(A). Mutual independence for nn events requires 2nβˆ’nβˆ’12^n - n - 1 product conditions β€” pairwise independence is strictly weaker. Disjointness is the opposite of independence for events with positive probability.

  • 4.

    Conditional independence is the language of graphical models. AβŠ₯B∣CA \perp B \mid C means CC screens off all information flow between AA and BB. Conditioning on a common cause removes correlation; conditioning on a common effect (collider) introduces it. The Markov chain A⊸B⊸CA \multimap B \multimap C encodes AβŠ₯C∣BA \perp C \mid B.

  • 5.

    Three distributions from one experiment. Binomial Bin(n,p)\text{Bin}(n,p), geometric Geom(p)\text{Geom}(p), and negative binomial NegBin(r,p)\text{NegBin}(r,p) all arise from repeated independent Bernoulli(p)(p) trials by asking different questions about the count of successes. The geometric distribution's memoryless property mirrors the exponential distribution.

Looking Ahead

Chapter 3 applies the tools of this chapter to reliability and combinatorial probability. Chapter 5 lifts the framework from events to random variables, where conditional probability becomes conditional distribution and independence becomes the factorisation of joint PMFs. The Markov chain concept returns in Chapter 13 as a model for stochastic processes evolving in discrete time, and Bayes' theorem reappears in Book FSI as the optimal detection rule.