Ferkans — Interactive Telecom Tutor

Why Conditioning?

Every practical inference problem involves incomplete information. A wireless receiver does not observe the transmitted symbol directly — it observes a noisy version of it and must reason about what was sent. A diagnostic test returns positive or negative, and the clinician must reason about the underlying condition. In each case, new information restricts the sample space, and probability must be reassigned accordingly.

Conditional probability is the mathematical mechanism for this reassignment. It is not merely a definition — it is the operational foundation of Bayes' theorem, the Markov property, factor graphs, and belief propagation. Every major result in detection theory (Book FSI) reduces, at its core, to a computation with conditional probabilities.

Definition:
Conditional Probability

Let $(\Omega, \mathcal{F}, \mathbb{P})$ be a probability space. For events $A, B \in \mathcal{F}$ with $\mathbb{P}(B) > 0$ , the conditional probability of $A$ given $B$ is $\mathbb{P}(A \mid B) \;\triangleq\; \frac{\mathbb{P}(A \cap B)}{\mathbb{P}(B)}.$

The mapping $A \mapsto \mathbb{P}(A \mid B)$ is itself a probability measure on $(\Omega, \mathcal{F})$ : it is non-negative, $\mathbb{P}(\Omega \mid B) = 1$ , and it is countably additive.

When $\mathbb{P}(B) = 0$ the ratio is undefined. Conditional probability given a zero-probability event requires a more delicate treatment (regular conditional distributions) covered in Chapter 12.

,

Geometric Interpretation

Conditioning on $B$ is equivalent to restricting the experiment to runs in which $B$ occurred and then renormalizing. The factor $1/\mathbb{P}(B)$ ensures the new measure still sums to 1. The fraction of those runs that also fall in $A$ is exactly $\mathbb{P}(A \cap B)/\mathbb{P}(B)$ .

Pictorially: $B$ becomes the new sample space; $A$ shrinks to $A \cap B$ ; probabilities are rescaled proportionally.

Theorem: Multiplication Rule

For any events $A, B$ with $\mathbb{P}(B) > 0$ : $\mathbb{P}(A \cap B) \;=\; \mathbb{P}(B)\,\mathbb{P}(A \mid B) \;=\; \mathbb{P}(A)\,\mathbb{P}(B \mid A)$ (the second equality holds when $\mathbb{P}(A) > 0$ ).

Proof

First equality

Rearrange $\mathbb{P}(A \mid B) = \mathbb{P}(A \cap B)/\mathbb{P}(B)$ : multiply both sides by $\mathbb{P}(B)$ .

Second equality

By symmetry, $\mathbb{P}(B \mid A) = \mathbb{P}(A \cap B)/\mathbb{P}(A)$ when $\mathbb{P}(A) > 0$ . Rearranging gives $\mathbb{P}(A \cap B) = \mathbb{P}(A)\,\mathbb{P}(B \mid A)$ . $\blacksquare$

Theorem: Chain Rule (Telescoping Product)

Let $A_1, A_2, \ldots, A_n \in \mathcal{F}$ with $\mathbb{P}(A_1 \cap A_2 \cap \cdots \cap A_{n-1}) > 0$ . Then: $\mathbb{P}(A_1 \cap A_2 \cap \cdots \cap A_n) = \mathbb{P}(A_1)\,\mathbb{P}(A_2 \mid A_1)\, \mathbb{P}(A_3 \mid A_1 \cap A_2) \cdots \mathbb{P}(A_n \mid A_1 \cap \cdots \cap A_{n-1}).$

The chain rule unpacks the joint probability of $n$ events into a sequence of conditional probabilities, each adding one more event. It is the probabilistic analogue of the product rule in calculus and is the basis for the factorisation of joint distributions — a building block of graphical models.

Proof

Base case ($n=2$)

This is the multiplication rule: $\mathbb{P}(A_1 \cap A_2) = \mathbb{P}(A_1)\,\mathbb{P}(A_2 \mid A_1)$ .

Inductive step

Assume the formula holds for $n-1$ events. Write $\mathbb{P}(A_1 \cap \cdots \cap A_n) = \mathbb{P}(A_1 \cap \cdots \cap A_{n-1})\cdot \mathbb{P}(A_n \mid A_1 \cap \cdots \cap A_{n-1})$ . Apply the induction hypothesis to the first factor. The result follows by substitution. $\blacksquare$

,

Example: Two Dice: Conditioning on a Sum

Roll two fair dice. Let $B$ be the event that the first die shows 3, and let $A$ be the event that the total exceeds 6. Compute $\mathbb{P}(A \mid B)$ .

Solution

Sample space

$\Omega = \{(i,j) : i,j \in \{1,2,3,4,5,6\}\}$ , $|\Omega| = 36$ , uniform probability $1/36$ per outcome.

Event $B$

$B = \{(3,1),(3,2),(3,3),(3,4),(3,5),(3,6)\}$ , so $\mathbb{P}(B) = 6/36 = 1/6$ .

Event $A \cap B$

Total $> 6$ with first die $= 3$ requires second die $\in \{4,5,6\}$ : $A \cap B = \{(3,4),(3,5),(3,6)\}$ , so $\mathbb{P}(A \cap B) = 3/36 = 1/12$ .

Apply definition

$\mathbb{P}(A \mid B) = \frac{\mathbb{P}(A \cap B)}{\mathbb{P}(B)} = \frac{1/12}{1/6} = \frac{1}{2}.$ $Equivalently: given the first die shows 3, the second must exceed 3 to produce a sum greater than 6, and$ \mathbb{P}(\text{second die} > 3) = 3/6 = 1/2$.

Example: Two-Children Problem

A family has two children. Assume each child is equally likely to be a boy (B) or a girl (G), independently. (a) Given that at least one child is a boy, what is $\mathbb{P}(\text{both boys})$ ? (b) Given that the younger child is a boy, what is $\mathbb{P}(\text{both boys})$ ?

Solution

Setup

$\Omega = \{\text{BB}, \text{BG}, \text{GB}, \text{GG}\}$ , each with probability $1/4$ . (First child listed first, second child second.)

Part (a): at least one boy

$B_a = \{\text{BB}, \text{BG}, \text{GB}\}$ , $\mathbb{P}(B_a) = 3/4$ . $A \cap B_a = \{\text{BB}\}$ , $\mathbb{P}(A \cap B_a) = 1/4$ . $\mathbb{P}(\text{both boys} \mid \text{at least one boy}) = \frac{1/4}{3/4} = \frac{1}{3}.$

Part (b): younger child is a boy

"Younger is a boy" restricts to outcomes with second child B: $B_b = \{\text{BB}, \text{GB}\}$ , $\mathbb{P}(B_b) = 1/2$ . $A \cap B_b = \{\text{BB}\}$ , $\mathbb{P}(A \cap B_b) = 1/4$ . $\mathbb{P}(\text{both boys} \mid \text{younger is boy}) = \frac{1/4}{1/2} = \frac{1}{2}.$

Lesson

Parts (a) and (b) ask what appears to be the same question but condition on different information. The more specific information in (b) leaves fewer residual cases and raises the conditional probability from $1/3$ to $1/2$ . This illustrates that conditioning is sensitive to the exact information given.

Conditional Probability

The probability of event $A$ given that event $B$ has occurred, defined as $\mathbb{P}(A \mid B) = \mathbb{P}(A \cap B)/\mathbb{P}(B)$ when $\mathbb{P}(B) > 0$ . Conditional probability is itself a probability measure on $(\Omega, \mathcal{F})$ .

Common Mistake: $\mathbb{P}(A \mid B) \neq \mathbb{P}(B \mid A)$ in General

Mistake:

Confusing $\mathbb{P}(A \mid B)$ with $\mathbb{P}(B \mid A)$ is one of the most common errors in applied probability, statistics, and machine learning. For instance: the probability that a person has a disease given a positive test is NOT the same as the probability of a positive test given the disease.

Correction:

The correct relationship is Bayes' theorem: $\mathbb{P}(A \mid B) = \frac{\mathbb{P}(B \mid A)\,\mathbb{P}(A)}{\mathbb{P}(B)}.$ The difference is scaled by the ratio $\mathbb{P}(A)/\mathbb{P}(B)$ , the prior odds of $A$ relative to $B$ . When diseases are rare ( $\mathbb{P}(A) \ll \mathbb{P}(B)$ ), this ratio can make $\mathbb{P}(A \mid B)$ much smaller than $\mathbb{P}(B \mid A)$ .

Quick Check

A card is drawn uniformly at random from a standard 52-card deck. Given that the card is red (hearts or diamonds), what is the probability it is a heart?

$1/4$

$1/2$

$1/13$

$13/52$

Correction:

1/2

There are 26 red cards, 13 of which are hearts. $\mathbb{P}(\text{heart} \mid \text{red}) = 13/26 = 1/2$ .

Conditional Probability as Renormalization

Visualize $\mathbb{P}(A \mid B) = \mathbb{P}(A \cap B)/\mathbb{P}(B)$ by adjusting the sizes and overlap of events $A$ and $B$ .

Parameters

\mathbb{P}(A)

0.5

\mathbb{P}(B)

0.4

Overlap fraction0.3

Fraction of $\min(\mathbb{P}(A),\mathbb{P}(B))$ that forms $A \cap B$

Key Takeaway

Conditional probability is a probability measure. For fixed $B$ , the map $A \mapsto \mathbb{P}(A \mid B)$ satisfies all three Kolmogorov axioms. This means every theorem about probability measures — countable additivity, continuity, inclusion-exclusion — also holds for conditional probability. Conditioning is not a special operation; it is a change of measure.

Conditional Probability

Why Conditioning?

Definition: Conditional Probability

Geometric Interpretation

Theorem: Multiplication Rule

First equality

Second equality

Theorem: Chain Rule (Telescoping Product)

Base case ($n=2$)

Inductive step

Example: Two Dice: Conditioning on a Sum

Sample space

Event $B$

Event $A \cap B$

Apply definition

Example: Two-Children Problem

Setup

Part (a): at least one boy

Part (b): younger child is a boy

Lesson

Conditional Probability

Common Mistake: P(A∣B)≠P(B∣A)\mathbb{P}(A \mid B) \neq \mathbb{P}(B \mid A)P(A∣B)=P(B∣A) in General

Quick Check

Conditional Probability as Renormalization

Parameters

Key Takeaway

Definition:
Conditional Probability

Common Mistake: $\mathbb{P}(A \mid B) \neq \mathbb{P}(B \mid A)$ in General