Conditional Probability

Why Conditioning?

Every practical inference problem involves incomplete information. A wireless receiver does not observe the transmitted symbol directly β€” it observes a noisy version of it and must reason about what was sent. A diagnostic test returns positive or negative, and the clinician must reason about the underlying condition. In each case, new information restricts the sample space, and probability must be reassigned accordingly.

Conditional probability is the mathematical mechanism for this reassignment. It is not merely a definition β€” it is the operational foundation of Bayes' theorem, the Markov property, factor graphs, and belief propagation. Every major result in detection theory (Book FSI) reduces, at its core, to a computation with conditional probabilities.

Definition:

Conditional Probability

Let (Ξ©,F,P)(\Omega, \mathcal{F}, \mathbb{P}) be a probability space. For events A,B∈FA, B \in \mathcal{F} with P(B)>0\mathbb{P}(B) > 0, the conditional probability of AA given BB is P(A∣B)β€…β€Šβ‰œβ€…β€ŠP(A∩B)P(B).\mathbb{P}(A \mid B) \;\triangleq\; \frac{\mathbb{P}(A \cap B)}{\mathbb{P}(B)}.

The mapping A↦P(A∣B)A \mapsto \mathbb{P}(A \mid B) is itself a probability measure on (Ξ©,F)(\Omega, \mathcal{F}): it is non-negative, P(Ω∣B)=1\mathbb{P}(\Omega \mid B) = 1, and it is countably additive.

When P(B)=0\mathbb{P}(B) = 0 the ratio is undefined. Conditional probability given a zero-probability event requires a more delicate treatment (regular conditional distributions) covered in Chapter 12.

,

Geometric Interpretation

Conditioning on BB is equivalent to restricting the experiment to runs in which BB occurred and then renormalizing. The factor 1/P(B)1/\mathbb{P}(B) ensures the new measure still sums to 1. The fraction of those runs that also fall in AA is exactly P(A∩B)/P(B)\mathbb{P}(A \cap B)/\mathbb{P}(B).

Pictorially: BB becomes the new sample space; AA shrinks to A∩BA \cap B; probabilities are rescaled proportionally.

Theorem: Multiplication Rule

For any events A,BA, B with P(B)>0\mathbb{P}(B) > 0: P(A∩B)β€…β€Š=β€…β€ŠP(B) P(A∣B)β€…β€Š=β€…β€ŠP(A) P(B∣A)\mathbb{P}(A \cap B) \;=\; \mathbb{P}(B)\,\mathbb{P}(A \mid B) \;=\; \mathbb{P}(A)\,\mathbb{P}(B \mid A) (the second equality holds when P(A)>0\mathbb{P}(A) > 0).

Theorem: Chain Rule (Telescoping Product)

Let A1,A2,…,An∈FA_1, A_2, \ldots, A_n \in \mathcal{F} with P(A1∩A2βˆ©β‹―βˆ©Anβˆ’1)>0\mathbb{P}(A_1 \cap A_2 \cap \cdots \cap A_{n-1}) > 0. Then: P(A1∩A2βˆ©β‹―βˆ©An)=P(A1) P(A2∣A1) P(A3∣A1∩A2)β‹―P(An∣A1βˆ©β‹―βˆ©Anβˆ’1).\mathbb{P}(A_1 \cap A_2 \cap \cdots \cap A_n) = \mathbb{P}(A_1)\,\mathbb{P}(A_2 \mid A_1)\, \mathbb{P}(A_3 \mid A_1 \cap A_2) \cdots \mathbb{P}(A_n \mid A_1 \cap \cdots \cap A_{n-1}).

The chain rule unpacks the joint probability of nn events into a sequence of conditional probabilities, each adding one more event. It is the probabilistic analogue of the product rule in calculus and is the basis for the factorisation of joint distributions β€” a building block of graphical models.

,

Example: Two Dice: Conditioning on a Sum

Roll two fair dice. Let BB be the event that the first die shows 3, and let AA be the event that the total exceeds 6. Compute P(A∣B)\mathbb{P}(A \mid B).

Example: Two-Children Problem

A family has two children. Assume each child is equally likely to be a boy (B) or a girl (G), independently. (a) Given that at least one child is a boy, what is P(bothΒ boys)\mathbb{P}(\text{both boys})? (b) Given that the younger child is a boy, what is P(bothΒ boys)\mathbb{P}(\text{both boys})?

Conditional Probability

The probability of event AA given that event BB has occurred, defined as P(A∣B)=P(A∩B)/P(B)\mathbb{P}(A \mid B) = \mathbb{P}(A \cap B)/\mathbb{P}(B) when P(B)>0\mathbb{P}(B) > 0. Conditional probability is itself a probability measure on (Ω,F)(\Omega, \mathcal{F}).

Related: Independence of Events, Bayes' Theorem, Law of Total Probability

Common Mistake: P(A∣B)β‰ P(B∣A)\mathbb{P}(A \mid B) \neq \mathbb{P}(B \mid A) in General

Mistake:

Confusing P(A∣B)\mathbb{P}(A \mid B) with P(B∣A)\mathbb{P}(B \mid A) is one of the most common errors in applied probability, statistics, and machine learning. For instance: the probability that a person has a disease given a positive test is NOT the same as the probability of a positive test given the disease.

Correction:

The correct relationship is Bayes' theorem: P(A∣B)=P(B∣A) P(A)P(B).\mathbb{P}(A \mid B) = \frac{\mathbb{P}(B \mid A)\,\mathbb{P}(A)}{\mathbb{P}(B)}. The difference is scaled by the ratio P(A)/P(B)\mathbb{P}(A)/\mathbb{P}(B), the prior odds of AA relative to BB. When diseases are rare (P(A)β‰ͺP(B)\mathbb{P}(A) \ll \mathbb{P}(B)), this ratio can make P(A∣B)\mathbb{P}(A \mid B) much smaller than P(B∣A)\mathbb{P}(B \mid A).

Quick Check

A card is drawn uniformly at random from a standard 52-card deck. Given that the card is red (hearts or diamonds), what is the probability it is a heart?

1/41/4

1/21/2

1/131/13

13/5213/52

Conditional Probability as Renormalization

Visualize P(A∣B)=P(A∩B)/P(B)\mathbb{P}(A \mid B) = \mathbb{P}(A \cap B)/\mathbb{P}(B) by adjusting the sizes and overlap of events AA and BB.

Parameters
0.5
0.4
0.3

Fraction of $\min(\mathbb{P}(A),\mathbb{P}(B))$ that forms $A \cap B$

Key Takeaway

Conditional probability is a probability measure. For fixed BB, the map A↦P(A∣B)A \mapsto \mathbb{P}(A \mid B) satisfies all three Kolmogorov axioms. This means every theorem about probability measures β€” countable additivity, continuity, inclusion-exclusion β€” also holds for conditional probability. Conditioning is not a special operation; it is a change of measure.