Conditional Probability
Why Conditioning?
Every practical inference problem involves incomplete information. A wireless receiver does not observe the transmitted symbol directly β it observes a noisy version of it and must reason about what was sent. A diagnostic test returns positive or negative, and the clinician must reason about the underlying condition. In each case, new information restricts the sample space, and probability must be reassigned accordingly.
Conditional probability is the mathematical mechanism for this reassignment. It is not merely a definition β it is the operational foundation of Bayes' theorem, the Markov property, factor graphs, and belief propagation. Every major result in detection theory (Book FSI) reduces, at its core, to a computation with conditional probabilities.
Definition: Conditional Probability
Conditional Probability
Let be a probability space. For events with , the conditional probability of given is
The mapping is itself a probability measure on : it is non-negative, , and it is countably additive.
When the ratio is undefined. Conditional probability given a zero-probability event requires a more delicate treatment (regular conditional distributions) covered in Chapter 12.
Geometric Interpretation
Conditioning on is equivalent to restricting the experiment to runs in which occurred and then renormalizing. The factor ensures the new measure still sums to 1. The fraction of those runs that also fall in is exactly .
Pictorially: becomes the new sample space; shrinks to ; probabilities are rescaled proportionally.
Theorem: Multiplication Rule
For any events with : (the second equality holds when ).
First equality
Rearrange : multiply both sides by .
Second equality
By symmetry, when . Rearranging gives .
Theorem: Chain Rule (Telescoping Product)
Let with . Then:
The chain rule unpacks the joint probability of events into a sequence of conditional probabilities, each adding one more event. It is the probabilistic analogue of the product rule in calculus and is the basis for the factorisation of joint distributions β a building block of graphical models.
Base case ($n=2$)
This is the multiplication rule: .
Inductive step
Assume the formula holds for events. Write . Apply the induction hypothesis to the first factor. The result follows by substitution.
Example: Two Dice: Conditioning on a Sum
Roll two fair dice. Let be the event that the first die shows 3, and let be the event that the total exceeds 6. Compute .
Sample space
, , uniform probability per outcome.
Event $B$
, so .
Event $A \cap B$
Total with first die requires second die : , so .
Apply definition
\mathbb{P}(\text{second die} > 3) = 3/6 = 1/2$.
Example: Two-Children Problem
A family has two children. Assume each child is equally likely to be a boy (B) or a girl (G), independently. (a) Given that at least one child is a boy, what is ? (b) Given that the younger child is a boy, what is ?
Setup
, each with probability . (First child listed first, second child second.)
Part (a): at least one boy
, . , .
Part (b): younger child is a boy
"Younger is a boy" restricts to outcomes with second child B: , . , .
Lesson
Parts (a) and (b) ask what appears to be the same question but condition on different information. The more specific information in (b) leaves fewer residual cases and raises the conditional probability from to . This illustrates that conditioning is sensitive to the exact information given.
Conditional Probability
The probability of event given that event has occurred, defined as when . Conditional probability is itself a probability measure on .
Related: Independence of Events, Bayes' Theorem, Law of Total Probability
Common Mistake: in General
Mistake:
Confusing with is one of the most common errors in applied probability, statistics, and machine learning. For instance: the probability that a person has a disease given a positive test is NOT the same as the probability of a positive test given the disease.
Correction:
The correct relationship is Bayes' theorem: The difference is scaled by the ratio , the prior odds of relative to . When diseases are rare (), this ratio can make much smaller than .
Quick Check
A card is drawn uniformly at random from a standard 52-card deck. Given that the card is red (hearts or diamonds), what is the probability it is a heart?
There are 26 red cards, 13 of which are hearts. .
Conditional Probability as Renormalization
Visualize by adjusting the sizes and overlap of events and .
Parameters
Fraction of $\min(\mathbb{P}(A),\mathbb{P}(B))$ that forms $A \cap B$
Key Takeaway
Conditional probability is a probability measure. For fixed , the map satisfies all three Kolmogorov axioms. This means every theorem about probability measures β countable additivity, continuity, inclusion-exclusion β also holds for conditional probability. Conditioning is not a special operation; it is a change of measure.