The Law of Total Probability and Bayes' Theorem
Partitioning the Unknown
Many probabilities are hard to compute directly but become tractable when we condition on an exhaustive set of mutually exclusive scenarios. If we know the probability of an event under each scenario, and we know the probability of each scenario, we can recover by a weighted average. This is the law of total probability β one of the workhorses of applied probability.
Bayes' theorem is the other side of the same coin: given that occurred, it tells us how to update our beliefs about which scenario was in play. This prior-to-posterior update is the mathematical engine of Bayesian inference, detection theory, and all probabilistic decoding algorithms.
Definition: Partition of the Sample Space
Partition of the Sample Space
A finite (or countable) collection of events is a partition of if:
- Exhaustive: .
- Mutually exclusive: for all .
Every outcome belongs to exactly one .
Theorem: Law of Total Probability
Let be a partition of with for all . For any event :
The event is split into disjoint pieces , each contained in exactly one scenario . The probability of each piece is , and summing over all scenarios recovers .
Decompose $A$ using the partition
Since partitions : The events are pairwise disjoint (they inherit the disjointness of the ).
Apply countable additivity
By countable additivity of :
Apply multiplication rule
For each , (since ). Substituting:
Theorem: Bayes' Theorem
Let be a partition of with for all . For any event with : The terms have canonical names in Bayesian inference:
- β the prior probability of scenario .
- β the likelihood of observation under scenario .
- β the posterior probability of scenario given .
- β the evidence (normalizing constant).
Bayes' theorem reverses the direction of conditioning. We know how to go from scenario to observation (the forward channel ). Bayes tells us how to go the other direction: from observation back to scenario. The prior encodes what we believed before observing ; the posterior encodes what we believe after.
Apply definition of conditional probability
$
Use multiplication rule for numerator
.
Expand denominator by total probability
. Substituting both expressions yields Bayes' theorem.
Historical Note: Thomas Bayes and the Inverse Probability Problem
1763Thomas Bayes (1702β1761), an English minister and amateur mathematician, posed the following question: given that an event has occurred some number of times, what can be inferred about the underlying probability? His posthumous 1763 essay, edited and communicated by Richard Price to the Royal Society, introduced what we now call Bayes' theorem in the context of a billiard-ball model on a square table.
Bayes' contribution was primarily philosophical: the idea that probability could represent degree of belief rather than mere frequency, and that this belief should be updated rationally in response to evidence. The formalization was refined by Pierre-Simon Laplace, who independently developed the same ideas around 1774. The modern Bayesian-versus-frequentist debate can be traced directly to this 18th-century dispute over the nature of probability.
Example: Binary Symmetric Channel: Posterior Computation
A binary symmetric channel flips each transmitted bit with probability . The transmitter sends or with equal prior probabilities . The receiver observes . Compute the posterior .
Identify the partition and likelihoods
Let and , forming a partition of . The channel gives:
Compute the evidence
By total probability:
Apply Bayes' theorem
\mathbb{P}(X=0 \mid Y=1) = \epsilon\epsilonY=1X=1$ was sent. This is exactly the Bayes-optimal decoder for this channel.
Example: Two Factories and a Defective Chip
Factory A produces 60% of all chips; factory B produces 40%. Factory A's defect rate is 2%; factory B's defect rate is 5%. A randomly chosen chip is found to be defective. What is the probability that it came from factory A?
Set up partition and priors
Let = "chip from A" and = "chip from B". , . Let = "defective".
Likelihoods
, .
Total probability (evidence)
$
Posterior via Bayes
0.625$) because B's defect rate is higher. This illustrates how the likelihood can reverse the ranking implied by the prior.
Bayesian Posterior Updating
Explore how the posterior evolves as the prior and likelihoods , vary (two-hypothesis model).
Parameters
Bayesian Updating: Prior Posterior
Law of Total Probability: Partition Visualization
Visualize how is decomposed over a partition . Adjust the scenario probabilities and the conditional probabilities to see the weighted average.
Parameters
Why This Matters: Bayes' Theorem in Digital Detection
In digital communications, the receiver observes and must decide which symbol was transmitted. Bayes' theorem gives the maximum a posteriori (MAP) decoder: where the total probability cancels in the . When symbols are equally likely ( for all ), MAP reduces to maximum likelihood (ML): . Bayes' theorem is the precise reason why equal priors make MAP and ML coincide.
Common Mistake: The Prosecutor's Fallacy
Mistake:
In forensic science (and sometimes in wireless network analysis), evidence is presented as: "The probability of observing this evidence if the defendant is innocent is only ." This is then (incorrectly) interpreted as: "The probability that the defendant is innocent given this evidence is ."
Correction:
The first quantity is β the likelihood. The second is β the posterior. They are related by Bayes' theorem: If the prior is high (most people are not criminals), the posterior can remain large even when the likelihood is small. The base rate (prior) is crucial and must not be ignored.
Prior and Posterior
In Bayesian inference, the prior encodes belief about scenario before observing any data. The posterior encodes belief after observing event . Bayes' theorem is the update rule that converts prior into posterior via the likelihood .
Related: Conditional Probability, Bayes' Theorem
Quick Check
A medical test has sensitivity and specificity . The disease prevalence is . A patient tests positive. Which expression correctly gives ?
Using Bayes: numerator . Denominator . Posterior . Despite the high sensitivity, the low prevalence makes most positives false alarms.
Key Takeaway
Bayes' theorem converts likelihoods into posteriors. The prior is what we believed before; the likelihood is how consistent the observation is with each hypothesis; the posterior is what we believe after. In detection theory (Book FSI), this update rule is the MAP decoder. In channel estimation (Book MIMO), it is the Bayesian estimator. In message-passing algorithms (belief propagation), it runs on every edge of the factor graph. Bayes' theorem is not a formula β it is a way of thinking.