Ferkans — Interactive Telecom Tutor

Partitioning the Unknown

Many probabilities are hard to compute directly but become tractable when we condition on an exhaustive set of mutually exclusive scenarios. If we know the probability of an event $A$ under each scenario, and we know the probability of each scenario, we can recover $\mathbb{P}(A)$ by a weighted average. This is the law of total probability — one of the workhorses of applied probability.

Bayes' theorem is the other side of the same coin: given that $A$ occurred, it tells us how to update our beliefs about which scenario was in play. This prior-to-posterior update is the mathematical engine of Bayesian inference, detection theory, and all probabilistic decoding algorithms.

Definition:
Partition of the Sample Space

A finite (or countable) collection of events $\{B_1, B_2, \ldots, B_n\}$ is a partition of $\Omega$ if:

Exhaustive: $\bigcup_{i=1}^{n} B_i = \Omega$ .
Mutually exclusive: $B_i \cap B_j = \emptyset$ for all $i \neq j$ .

Every outcome $\omega \in \Omega$ belongs to exactly one $B_i$ .

Theorem: Law of Total Probability

Let $\{B_1, \ldots, B_n\}$ be a partition of $\Omega$ with $\mathbb{P}(B_i) > 0$ for all $i$ . For any event $A$ : $\mathbb{P}(A) = \sum_{i=1}^{n} \mathbb{P}(A \mid B_i)\,\mathbb{P}(B_i).$

The event $A$ is split into disjoint pieces $A \cap B_i$ , each contained in exactly one scenario $B_i$ . The probability of each piece is $\mathbb{P}(A \cap B_i) = \mathbb{P}(A \mid B_i)\mathbb{P}(B_i)$ , and summing over all scenarios recovers $\mathbb{P}(A)$ .

Proof

Decompose $A$ using the partition

Since $\{B_1, \ldots, B_n\}$ partitions $\Omega$ : $A = A \cap \Omega = A \cap \bigcup_{i=1}^{n} B_i = \bigcup_{i=1}^{n} (A \cap B_i).$ The events $A \cap B_i$ are pairwise disjoint (they inherit the disjointness of the $B_i$ ).

Apply countable additivity

By countable additivity of $\mathbb{P}$ : $\mathbb{P}(A) = \sum_{i=1}^{n} \mathbb{P}(A \cap B_i).$

Apply multiplication rule

For each $i$ , $\mathbb{P}(A \cap B_i) = \mathbb{P}(A \mid B_i)\mathbb{P}(B_i)$ (since $\mathbb{P}(B_i) > 0$ ). Substituting: $\mathbb{P}(A) = \sum_{i=1}^{n} \mathbb{P}(A \mid B_i)\,\mathbb{P}(B_i). \qquad\blacksquare$

,

Theorem: Bayes' Theorem

Let $\{B_1, \ldots, B_n\}$ be a partition of $\Omega$ with $\mathbb{P}(B_i) > 0$ for all $i$ . For any event $A$ with $\mathbb{P}(A) > 0$ : $\mathbb{P}(B_k \mid A) = \frac{\mathbb{P}(A \mid B_k)\,\mathbb{P}(B_k)} {\displaystyle\sum_{i=1}^{n} \mathbb{P}(A \mid B_i)\,\mathbb{P}(B_i)}.$ The terms have canonical names in Bayesian inference:

$\mathbb{P}(B_k)$ — the prior probability of scenario $k$ .
$\mathbb{P}(A \mid B_k)$ — the likelihood of observation $A$ under scenario $k$ .
$\mathbb{P}(B_k \mid A)$ — the posterior probability of scenario $k$ given $A$ .
$\mathbb{P}(A)$ — the evidence (normalizing constant).

Bayes' theorem reverses the direction of conditioning. We know how to go from scenario to observation (the forward channel $\mathbb{P}(A \mid B_k)$ ). Bayes tells us how to go the other direction: from observation back to scenario. The prior encodes what we believed before observing $A$ ; the posterior encodes what we believe after.

Proof

Apply definition of conditional probability

$\mathbb{P}(B_k \mid A) = \frac{\mathbb{P}(A \cap B_k)}{\mathbb{P}(A)}.$ $

Use multiplication rule for numerator

$\mathbb{P}(A \cap B_k) = \mathbb{P}(A \mid B_k)\,\mathbb{P}(B_k)$ .

Expand denominator by total probability

$\mathbb{P}(A) = \sum_{i=1}^n \mathbb{P}(A \mid B_i)\,\mathbb{P}(B_i)$ . Substituting both expressions yields Bayes' theorem. $\blacksquare$

,

Historical Note: Thomas Bayes and the Inverse Probability Problem

1763

Thomas Bayes (1702–1761), an English minister and amateur mathematician, posed the following question: given that an event has occurred some number of times, what can be inferred about the underlying probability? His posthumous 1763 essay, edited and communicated by Richard Price to the Royal Society, introduced what we now call Bayes' theorem in the context of a billiard-ball model on a square table.

Bayes' contribution was primarily philosophical: the idea that probability could represent degree of belief rather than mere frequency, and that this belief should be updated rationally in response to evidence. The formalization was refined by Pierre-Simon Laplace, who independently developed the same ideas around 1774. The modern Bayesian-versus-frequentist debate can be traced directly to this 18th-century dispute over the nature of probability.

Example: Binary Symmetric Channel: Posterior Computation

A binary symmetric channel flips each transmitted bit with probability $\epsilon \in (0, 1/2)$ . The transmitter sends $0$ or $1$ with equal prior probabilities $\mathbb{P}(X=0) = \mathbb{P}(X=1) = 1/2$ . The receiver observes $Y = 1$ . Compute the posterior $\mathbb{P}(X = 0 \mid Y = 1)$ .

Solution

Identify the partition and likelihoods

Let $B_0 = \{X=0\}$ and $B_1 = \{X=1\}$ , forming a partition of $\Omega$ . The channel gives: $\mathbb{P}(Y=1 \mid X=0) = \epsilon, \qquad \mathbb{P}(Y=1 \mid X=1) = 1 - \epsilon.$

Compute the evidence

By total probability: $\mathbb{P}(Y=1) = \mathbb{P}(Y=1 \mid X=0)\cdot\tfrac{1}{2} + \mathbb{P}(Y=1 \mid X=1)\cdot\tfrac{1}{2} = \tfrac{\epsilon}{2} + \tfrac{1-\epsilon}{2} = \tfrac{1}{2}.$

Apply Bayes' theorem

$\mathbb{P}(X=0 \mid Y=1) = \frac{\mathbb{P}(Y=1 \mid X=0)\,\mathbb{P}(X=0)} {\mathbb{P}(Y=1)} = \frac{\epsilon \cdot 1/2}{1/2} = \epsilon.$ $The posterior$ \mathbb{P}(X=0 \mid Y=1) = \epsilon $is small (close to 0) when$ \epsilon $is small — receiving$ Y=1 $is strong evidence that$ X=1$ was sent. This is exactly the Bayes-optimal decoder for this channel.

Example: Two Factories and a Defective Chip

Factory A produces 60% of all chips; factory B produces 40%. Factory A's defect rate is 2%; factory B's defect rate is 5%. A randomly chosen chip is found to be defective. What is the probability that it came from factory A?

Solution

Set up partition and priors

Let $B_A$ = "chip from A" and $B_B$ = "chip from B". $\mathbb{P}(B_A) = 0.6$ , $\mathbb{P}(B_B) = 0.4$ . Let $D$ = "defective".

Likelihoods

$\mathbb{P}(D \mid B_A) = 0.02$ , $\mathbb{P}(D \mid B_B) = 0.05$ .

Total probability (evidence)

$\mathbb{P}(D) = 0.02 \times 0.6 + 0.05 \times 0.4 = 0.012 + 0.020 = 0.032.$ $

Posterior via Bayes

$\mathbb{P}(B_A \mid D) = \frac{0.02 \times 0.6}{0.032} = \frac{0.012}{0.032} = \frac{3}{8} = 0.375.$ $Despite factory A producing the majority of chips, a defective chip is more likely to have come from factory B (probability$ 0.625$) because B's defect rate is higher. This illustrates how the likelihood can reverse the ranking implied by the prior.

Bayesian Posterior Updating

Explore how the posterior $\mathbb{P}(B_1 \mid A)$ evolves as the prior $\mathbb{P}(B_1)$ and likelihoods $\mathbb{P}(A \mid B_1)$ , $\mathbb{P}(A \mid B_2)$ vary (two-hypothesis model).

Parameters

Prior

\mathbb{P}(B_1)

0.5

Likelihood

\mathbb{P}(A \mid B_1)

0.8

Likelihood

\mathbb{P}(A \mid B_2)

0.2

Bayesian Updating: Prior $\\to$ Posterior

Watch the posterior distribution evolve as sequential observations arrive and the prior is updated one observation at a time.

A binary model with two hypotheses. Each observation shifts the posterior. After many observations the posterior concentrates on the true hypothesis — regardless of the initial prior (as long as it is non-zero).

Law of Total Probability: Partition Visualization

Visualize how $\mathbb{P}(A)$ is decomposed over a partition $\{B_1, B_2, B_3\}$ . Adjust the scenario probabilities and the conditional probabilities $\mathbb{P}(A \mid B_i)$ to see the weighted average.

Parameters

\mathbb{P}(B_1)

0.4

\mathbb{P}(B_2)

0.35

\mathbb{P}(A \mid B_1)

0.7

\mathbb{P}(A \mid B_2)

0.3

\mathbb{P}(A \mid B_3)

0.1

Why This Matters: Bayes' Theorem in Digital Detection

In digital communications, the receiver observes $y$ and must decide which symbol $s_k$ was transmitted. Bayes' theorem gives the maximum a posteriori (MAP) decoder: $\hat{k} = \arg\max_k \mathbb{P}(X = s_k \mid Y = y) = \arg\max_k f(y \mid X = s_k)\,\mathbb{P}(X = s_k),$ where the total probability $f(y)$ cancels in the $\arg\max$ . When symbols are equally likely ( $\mathbb{P}(X = s_k) = 1/m$ for all $k$ ), MAP reduces to maximum likelihood (ML): $\hat{k} = \arg\max_k f(y \mid X = s_k)$ . Bayes' theorem is the precise reason why equal priors make MAP and ML coincide.

Common Mistake: The Prosecutor's Fallacy

Mistake:

In forensic science (and sometimes in wireless network analysis), evidence is presented as: "The probability of observing this evidence if the defendant is innocent is only $0.001$ ." This is then (incorrectly) interpreted as: "The probability that the defendant is innocent given this evidence is $0.001$ ."

Correction:

The first quantity is $\mathbb{P}(\text{evidence} \mid \text{innocent})$ — the likelihood. The second is $\mathbb{P}(\text{innocent} \mid \text{evidence})$ — the posterior. They are related by Bayes' theorem: $\mathbb{P}(\text{innocent} \mid \text{evidence}) = \frac{\mathbb{P}(\text{evidence} \mid \text{innocent})\,\mathbb{P}(\text{innocent})} {\mathbb{P}(\text{evidence})}.$ If the prior $\mathbb{P}(\text{innocent})$ is high (most people are not criminals), the posterior can remain large even when the likelihood is small. The base rate (prior) is crucial and must not be ignored.

Prior and Posterior

In Bayesian inference, the prior $\mathbb{P}(B_k)$ encodes belief about scenario $k$ before observing any data. The posterior $\mathbb{P}(B_k \mid A)$ encodes belief after observing event $A$ . Bayes' theorem is the update rule that converts prior into posterior via the likelihood $\mathbb{P}(A \mid B_k)$ .

Quick Check

A medical test has sensitivity $\mathbb{P}(\text{positive} \mid \text{disease}) = 0.99$ and specificity $\mathbb{P}(\text{negative} \mid \text{no disease}) = 0.95$ . The disease prevalence is $\mathbb{P}(\text{disease}) = 0.01$ . A patient tests positive. Which expression correctly gives $\mathbb{P}(\text{disease} \mid \text{positive})$ ?

$0.99 \times 0.01 / (0.99 \times 0.01 + 0.05 \times 0.99)$

$0.99 \times 0.01 / (0.99 \times 0.01 + 0.05 \times 0.99) \approx 0.166$

Correction:

0.99 \times 0.01 / (0.99 \times 0.01 + 0.05 \times 0.99) \approx 0.166

Using Bayes: numerator $= \mathbb{P}(+ \mid D)\mathbb{P}(D) = 0.99\times0.01 = 0.0099$ . Denominator $= 0.99\times0.01 + 0.05\times0.99 = 0.0099 + 0.0495 = 0.0594$ . Posterior $\approx 0.0099/0.0594 \approx 0.166$ . Despite the high sensitivity, the low prevalence makes most positives false alarms.

Key Takeaway

Bayes' theorem converts likelihoods into posteriors. The prior is what we believed before; the likelihood is how consistent the observation is with each hypothesis; the posterior is what we believe after. In detection theory (Book FSI), this update rule is the MAP decoder. In channel estimation (Book MIMO), it is the Bayesian estimator. In message-passing algorithms (belief propagation), it runs on every edge of the factor graph. Bayes' theorem is not a formula — it is a way of thinking.

The Law of Total Probability and Bayes' Theorem

Partitioning the Unknown

Definition: Partition of the Sample Space

Theorem: Law of Total Probability

Decompose $A$ using the partition

Apply countable additivity

Apply multiplication rule

Theorem: Bayes' Theorem

Apply definition of conditional probability

Use multiplication rule for numerator

Expand denominator by total probability

Historical Note: Thomas Bayes and the Inverse Probability Problem

Example: Binary Symmetric Channel: Posterior Computation

Identify the partition and likelihoods

Compute the evidence

Apply Bayes' theorem

Example: Two Factories and a Defective Chip

Set up partition and priors

Likelihoods

Total probability (evidence)

Posterior via Bayes

Bayesian Posterior Updating

Parameters

Bayesian Updating: Prior to\\toto Posterior

Law of Total Probability: Partition Visualization

Parameters

Why This Matters: Bayes' Theorem in Digital Detection

Common Mistake: The Prosecutor's Fallacy

Prior and Posterior

Quick Check

Key Takeaway

Definition:
Partition of the Sample Space

Bayesian Updating: Prior $\\to$ Posterior