Ferkans — Interactive Telecom Tutor

Axioms: Formalizing Frequency as Measure

Before Kolmogorov's 1933 monograph, probability was defined via frequencies: run an experiment $N$ times, count occurrences $N(A)$ of event $A$ , define $\mathbb{P}(A) \approx N(A)/N$ . This frequentist intuition is correct but not a definition — it merely says what probability approximates, not what it is.

Kolmogorov's insight was that the empirical frequency satisfies a small set of algebraic rules that can be elevated to axioms. The result is a clean, axiomatic foundation: three rules that characterize every valid probability model, from coin flips to Rayleigh fading to quantum measurements.

The three axioms below are the ones you will use in every proof in this book. Everything else — conditional probability, independence, expectation, the law of large numbers — follows as a consequence.

Historical Note: Andrei Kolmogorov and the Foundations of Probability (1933)

1933

Andrei Nikolaevich Kolmogorov (1903–1987) was one of the most prolific mathematicians of the 20th century, with major contributions to analysis, topology, turbulence, information theory, and algorithmic complexity. His 1933 monograph Grundbegriffe der Wahrscheinlichkeitsrechnung (Foundations of the Theory of Probability) placed probability on the same rigorous axiomatic footing as geometry and algebra, grounding it in Lebesgue's measure theory.

The axioms he proposed — non-negativity, normalization, and countable additivity — are now so universally accepted that we simply call them "the axioms of probability." Kolmogorov later made foundational contributions to information theory (Kolmogorov complexity) and to the ergodic theory of stochastic processes.

Definition:
Probability Measure (Kolmogorov's Axioms)

A probability measure on $(\Omega, \mathcal{F})$ is a set function $\mathbb{P}: \mathcal{F} \to [0,1]$ satisfying:

Axiom 1 (Non-negativity): $\mathbb{P}(A) \geq 0$ for all $A \in \mathcal{F}$ .

Axiom 2 (Normalization): $\mathbb{P}(\Omega) = 1$ .

Axiom 3 (Countable Additivity): If $A_1, A_2, \ldots \in \mathcal{F}$ are pairwise disjoint ( $A_i \cap A_j = \emptyset$ for $i \neq j$ ), then $\mathbb{P}\!\left(\bigcup_{i=1}^{\infty} A_i\right) = \sum_{i=1}^{\infty} \mathbb{P}(A_i).$

The triple $(\Omega, \mathcal{F}, \mathbb{P})$ is called a probability space.

Axiom 3 with the union restricted to finitely many terms is called finite additivity. Countable additivity is the stronger requirement — it extends the sum to infinity, which is what makes continuity of probability (Lemma 2 below) possible. There are philosophical schools (notably de Finetti) that accept only finite additivity, but for the mathematical development in this book, countable additivity is indispensable.

,

Example: Constructing Probability Spaces

For each of the following, verify that the proposed $\mathbb{P}$ satisfies Kolmogorov's axioms: (a) $\Omega = \{H,T\}$ , $\mathbb{P}(H) = p$ , $\mathbb{P}(T) = 1-p$ for $p \in [0,1]$ . (b) $\Omega = \{T, HT, HHT, \ldots\}$ (toss until first Tail), $\mathbb{P}(H^k T) = (1-p)^k p$ . (c) $\Omega = [0,1]$ , $\mathcal{F} = \mathcal{B}([0,1])$ , $\mathbb{P}(A) = \mu(A)$ (Lebesgue measure).

Solution

(a) Biased coin

Non-negativity: $p \geq 0$ , $1-p \geq 0$ since $p \in [0,1]$ . $\checkmark$

Normalization: $\mathbb{P}(\Omega) = \mathbb{P}(H) + \mathbb{P}(T) = p + (1-p) = 1$ . $\checkmark$

Countable additivity (finite here): the four events $\emptyset$ , $\{H\}$ , $\{T\}$ , $\{H,T\}$ — all disjoint unions are covered by $\mathbb{P}(\{H\} \cup \{T\}) = p + (1-p) = 1$ and $\mathbb{P}(\emptyset) = 0$ . $\checkmark$

(b) Geometric probability (toss until Tail)

The outcomes are $\omega_k = H^{k-1}T$ for $k = 1, 2, \ldots$ , with $\mathbb{P}(\omega_k) = (1-p)^{k-1} p$ .

Non-negativity: $(1-p)^{k-1}p \geq 0$ since $p \in [0,1]$ . $\checkmark$

Normalization: $\mathbb{P}(\Omega) = \sum_{k=1}^{\infty} (1-p)^{k-1} p = p \sum_{j=0}^{\infty} (1-p)^j = \frac{p}{1-(1-p)} = 1.$ $\checkmark$

Countable additivity follows because outcomes are distinct singleton events and probability is defined as the sum over outcomes in the event. $\checkmark$

(c) Uniform on $[0,1]$ (Lebesgue measure)

$\mathbb{P}(A) = \mu(A)$ = total length of $A$ .

Non-negativity: Length is non-negative. $\checkmark$

Normalization: $\mu([0,1]) = 1$ . $\checkmark$

Countable additivity: The Lebesgue measure is countably additive by construction — this is the central theorem of Lebesgue integration theory. For disjoint Borel sets $A_1, A_2, \ldots$ , the total length of their union equals the sum of their lengths. $\checkmark$

Theorem: Elementary Consequences of the Axioms

Let $(\Omega, \mathcal{F}, \mathbb{P})$ be a probability space and $A, B \in \mathcal{F}$ . Then:

$\mathbb{P}(\emptyset) = 0$ .
(Complementation) $\mathbb{P}(A^c) = 1 - \mathbb{P}(A)$ .
(Monotonicity) If $A \subseteq B$ then $\mathbb{P}(A) \leq \mathbb{P}(B)$ .
$0 \leq \mathbb{P}(A) \leq 1$ .
(Inclusion-Exclusion, $n = 2$ ) $\mathbb{P}(A \cup B) = \mathbb{P}(A) + \mathbb{P}(B) - \mathbb{P}(A \cap B)$ .

These are the workhorse identities of probability. The complementation rule is used constantly: if $\mathbb{P}(A)$ is hard to compute directly, compute $\mathbb{P}(A^c)$ and subtract from 1. Monotonicity is the foundation of all probability bounds.

Proof

Property 1: $\mathbb{P}(\emptyset) = 0$

Write $\Omega = \Omega \cup \emptyset \cup \emptyset \cup \cdots$ , a countable disjoint union (since $\Omega \cap \emptyset = \emptyset$ and $\emptyset \cap \emptyset = \emptyset$ ). By Axiom 3: $\mathbb{P}(\Omega) = \mathbb{P}(\Omega) + \mathbb{P}(\emptyset) + \mathbb{P}(\emptyset) + \cdots$ . Subtracting $\mathbb{P}(\Omega) = 1$ from both sides: $0 = \mathbb{P}(\emptyset) + \mathbb{P}(\emptyset) + \cdots = k \cdot \mathbb{P}(\emptyset)$ for all $k$ . Hence $\mathbb{P}(\emptyset) = 0$ . $\blacksquare$

Property 2: Complementation

Since $A$ and $A^c$ are disjoint with $A \cup A^c = \Omega$ : $1 = \mathbb{P}(\Omega) = \mathbb{P}(A \cup A^c) = \mathbb{P}(A) + \mathbb{P}(A^c)$ , so $\mathbb{P}(A^c) = 1 - \mathbb{P}(A)$ . $\blacksquare$

Property 3: Monotonicity

If $A \subseteq B$ , write $B = A \cup (B \setminus A)$ , a disjoint union. By finite additivity: $\mathbb{P}(B) = \mathbb{P}(A) + \mathbb{P}(B \setminus A) \geq \mathbb{P}(A)$ , since $\mathbb{P}(B \setminus A) \geq 0$ by Axiom 1. $\blacksquare$

Property 5: Inclusion-exclusion for two sets

Write $A \cup B = (A \setminus B) \cup (B \setminus A) \cup (A \cap B)$ , a disjoint union. By finite additivity: $\mathbb{P}(A \cup B) = \mathbb{P}(A \setminus B) + \mathbb{P}(B \setminus A) + \mathbb{P}(A \cap B)$ . Since $\mathbb{P}(A) = \mathbb{P}(A \setminus B) + \mathbb{P}(A \cap B)$ and $\mathbb{P}(B) = \mathbb{P}(B \setminus A) + \mathbb{P}(A \cap B)$ , adding these: $\mathbb{P}(A) + \mathbb{P}(B) = \mathbb{P}(A \cup B) + \mathbb{P}(A \cap B)$ , which gives the result. $\blacksquare$

,

Theorem: General Inclusion-Exclusion Formula

For events $A_1, \ldots, A_n \in \mathcal{F}$ : $\mathbb{P}\!\left(\bigcup_{i=1}^{n} A_i\right) = \sum_{i} \mathbb{P}(A_i) - \sum_{i < j} \mathbb{P}(A_i \cap A_j) + \sum_{i < j < k} \mathbb{P}(A_i \cap A_j \cap A_k) - \cdots + (-1)^{n+1} \mathbb{P}\!\left(\bigcap_{i=1}^{n} A_i\right).$

The formula corrects for overcounting: when we add $\mathbb{P}(A_i)$ for all $i$ , points in two-way intersections are counted twice, so we subtract them; but then points in three-way intersections are subtracted once too many, so we add them back — and so on. The resulting alternating sum counts each point in the union exactly once.

Proof

Base case $n = 2$

Already proved in TElementary Consequences of the Axioms, Property 5.

Inductive step

Assume the formula holds for $n - 1$ events. Write $\bigcup_{i=1}^{n} A_i = B \cup A_n$ where $B = \bigcup_{i=1}^{n-1} A_i$ . By the two-set formula: $\mathbb{P}(B \cup A_n) = \mathbb{P}(B) + \mathbb{P}(A_n) - \mathbb{P}(B \cap A_n)$ . Apply the inductive hypothesis to $\mathbb{P}(B)$ (with $n-1$ events) and to $\mathbb{P}(B \cap A_n) = \mathbb{P}(\bigcup_{i=1}^{n-1}(A_i \cap A_n))$ (again with $n-1$ events: $A_1 \cap A_n, \ldots, A_{n-1} \cap A_n$ ). Collecting terms yields the formula for $n$ events. $\blacksquare$

,

Theorem: Union Bound (Boole's Inequality)

For any finite or countably infinite collection of events $\{A_k\}$ : $\mathbb{P}\!\left(\bigcup_{k} A_k\right) \leq \sum_{k} \mathbb{P}(A_k).$

The union bound is the most-used probability inequality in communications engineering. It says that the probability that at least one of many bad events occurs is at most the sum of their individual probabilities. In error probability analysis, it bounds the block error probability by the sum of pairwise error probabilities — a simple but powerful tool that leads to the Bhattacharyya and Chernoff bounds in Book FSI.

Proof

Finite union

By induction using the two-set formula: $\mathbb{P}(A \cup B) = \mathbb{P}(A) + \mathbb{P}(B) - \mathbb{P}(A \cap B) \leq \mathbb{P}(A) + \mathbb{P}(B)$ since $\mathbb{P}(A \cap B) \geq 0$ .

Countably infinite union

Let $B_n = \bigcup_{k=1}^{n} A_k$ . Then $B_n \uparrow \bigcup_{k=1}^{\infty} A_k$ (an increasing sequence). By the continuity of probability (TContinuity of Probability) and the finite union bound: $\mathbb{P}(\bigcup_{k=1}^{\infty} A_k) = \lim_{n} \mathbb{P}(B_n) \leq \lim_{n} \sum_{k=1}^{n} \mathbb{P}(A_k) = \sum_{k=1}^{\infty} \mathbb{P}(A_k)$ . $\blacksquare$

Theorem: Continuity of Probability

Let $\{A_n\}$ be a monotone sequence of events.

Increasing: If $A_1 \subseteq A_2 \subseteq \cdots$ , then $\displaystyle\lim_{n\to\infty} \mathbb{P}(A_n) = \mathbb{P}\!\left(\bigcup_{n=1}^{\infty} A_n\right)$ .

Decreasing: If $A_1 \supseteq A_2 \supseteq \cdots$ , then $\displaystyle\lim_{n\to\infty} \mathbb{P}(A_n) = \mathbb{P}\!\left(\bigcap_{n=1}^{\infty} A_n\right)$ .

Probability behaves like a continuous function with respect to monotone limits of events: the probability of the limiting event equals the limit of the probabilities. This mirrors the statement that for increasing/decreasing real sequences, $\lim f(A_n) = f(\lim A_n)$ — probability "commutes with limits" along monotone sequences.

Proof

Increasing case: decompose into disjoint pieces

Define $B_1 = A_1$ and $B_n = A_n \setminus A_{n-1}$ for $n \geq 2$ . The $B_n$ are pairwise disjoint and $\bigcup_{n=1}^{N} B_n = A_N$ , $\bigcup_{n=1}^{\infty} B_n = \bigcup_{n=1}^{\infty} A_n$ .

Apply countable additivity

$\mathbb{P}\!\left(\bigcup_{n=1}^{\infty} A_n\right) = \mathbb{P}\!\left(\bigcup_{n=1}^{\infty} B_n\right) = \sum_{n=1}^{\infty} \mathbb{P}(B_n) = \lim_{N\to\infty} \sum_{n=1}^{N} \mathbb{P}(B_n) = \lim_{N\to\infty} \mathbb{P}(A_N). \quad \blacksquare$ $

Decreasing case: apply to complements

If $A_n \downarrow$ , then $A_n^c \uparrow$ . By the increasing case: $\mathbb{P}(\bigcup A_n^c) = \lim \mathbb{P}(A_n^c) = \lim (1 - \mathbb{P}(A_n)) = 1 - \lim \mathbb{P}(A_n)$ . Also $\bigcup A_n^c = (\bigcap A_n)^c$ , so $1 - \mathbb{P}(\bigcap A_n) = 1 - \lim \mathbb{P}(A_n)$ , giving the result. $\blacksquare$

,

Theorem: First Borel-Cantelli Lemma

Let $\{A_n\}$ be a sequence of events with $\sum_{n=1}^{\infty} \mathbb{P}(A_n) < \infty$ . Define the event " $A_n$ occurs infinitely often" as $\{A_n \text{ i.o.}\} \triangleq \bigcap_{n=1}^{\infty} \bigcup_{k=n}^{\infty} A_k.$ Then $\mathbb{P}(A_n \text{ i.o.}) = 0$ .

If the sum of probabilities converges, then the probability of infinitely many events occurring is zero. In reliability theory: if component $k$ fails with probability $p_k$ and $\sum p_k < \infty$ , almost surely only finitely many components ever fail. In coding theory: if codeword $n$ is decoded incorrectly with probability $p_n$ and $\sum p_n < \infty$ , almost surely only finitely many codewords are in error.

Proof

Define the tail events

Let $B_n = \bigcup_{k=n}^{\infty} A_k$ . Then $B_1 \supseteq B_2 \supseteq \cdots$ (decreasing), and $\{A_n \text{ i.o.}\} = \bigcap_{n=1}^{\infty} B_n$ .

Bound the tail probability

By the union bound: $\mathbb{P}(B_n) \leq \sum_{k=n}^{\infty} \mathbb{P}(A_k)$ . Since $\sum_{k=1}^{\infty} \mathbb{P}(A_k) < \infty$ , the tail sum $\sum_{k=n}^{\infty} \mathbb{P}(A_k) \to 0$ as $n \to \infty$ . So $\mathbb{P}(B_n) \to 0$ .

Apply continuity of probability

By the decreasing case of TContinuity of Probability: $\mathbb{P}(A_n \text{ i.o.}) = \mathbb{P}(\bigcap_{n=1}^{\infty} B_n) = \lim_{n\to\infty} \mathbb{P}(B_n) = 0$ . $\blacksquare$

,

Why This Matters: Union Bound in Error Probability Analysis

In digital communications, a message from a codebook of size $M$ is transmitted over a noisy channel. A decoding error occurs if the received signal is closer (in some metric) to a codeword other than the one transmitted. Letting $\mathcal{E}_j$ be the event "the receiver prefers codeword $j$ over the true codeword," the block error probability is $P_e = \mathbb{P}\!\left(\bigcup_{j \neq j^*} \mathcal{E}_j\right) \leq \sum_{j \neq j^*} \mathbb{P}(\mathcal{E}_j).$ This is the union bound applied to error events. The bound is tight when errors are rare (small $P_e$ regime) and is the starting point for the Bhattacharyya bound and the Chernoff–Gallager error exponent. Every error-probability derivation in Book FSI (Chapters 6–7) uses exactly this step.

Inclusion-Exclusion for Three Events

Visualize how the inclusion-exclusion formula counts the probability of $A \cup B \cup C$ by alternately adding and subtracting intersection probabilities. Adjust the individual and pairwise probabilities to see the formula in action.

Parameters

\mathbb{P}(A)

0.35

\mathbb{P}(B)

0.4

\mathbb{P}(C)

0.3

\mathbb{P}(A\cap B)

0.1

\mathbb{P}(B\cap C)

0.08

\mathbb{P}(A\cap C)

0.07

Law of Large Numbers: Empirical Frequency Convergence

Watch the empirical frequency

N(A)/N

converge to

\mathbb{P}(A)

as the number of trials

N

grows. This visualizes the frequentist motivation for Kolmogorov's axioms.

The empirical frequency

N(A)/N

(blue) oscillates initially but converges to the true probability

\mathbb{P}(A) = p

(red dashed) as

N \to \infty

. This convergence is the Law of Large Numbers, proved rigorously in Chapter 11.

Empirical Frequency of Coin Flips

Simulate $n$ flips of a biased coin with $\mathbb{P}(\text{Heads}) = p$ . Watch the empirical frequency converge to $p$ as $n$ increases.

Parameters

p = \mathbb{P}(\text{Heads})

0.3

Number of trials

n

500

Random seed42

Common Mistake: Finite Additivity Is Not Enough

Mistake:

A weaker version of Axiom 3 requires additivity only for finitely many disjoint events. It is tempting to assume this is sufficient — after all, every experiment in practice involves finitely many outcomes, doesn't it?

Correction:

Finite additivity is not sufficient to derive the continuity of probability, the Borel-Cantelli lemmas, or any convergence theorem. Without countable additivity, one cannot speak meaningfully about limits of events or define the probability of an event described as the intersection of a countably infinite family (e.g., the event that a random walk never exceeds a threshold). All of probability theory for stochastic processes and convergence — the backbone of communications analysis — requires Axiom 3 in its countably additive form.

🔧Engineering Note

Choosing a Probability Space for a Channel Model

When modeling a wireless channel, the probability space is rarely written down explicitly — but it is always implicitly present. Consider a flat Rayleigh fading channel with received signal $Y = H \cdot x + W$ where $H \sim \mathcal{CN}(0,1)$ (channel gain) and $W \sim \mathcal{CN}(0,\sigma^2)$ (noise). The implicit probability space is:

$\Omega = \mathbb{C}^2$ (pairs of complex numbers $(h, w)$ )
$\mathcal{F} = \mathcal{B}(\mathbb{C}^2)$ (Borel sets in the complex plane $\times$ complex plane)
$\mathbb{P}$ = the product of two complex Gaussian distributions

Axiom 3 ensures we can compute $\mathbb{P}(|Y|^2 > \gamma)$ for any threshold $\gamma$ by integrating over the corresponding Borel set. Without the sigma-algebra structure, the integral would have no meaning.

Practical Constraints

•
The sample space must be large enough to describe all relevant quantities
•
For analog channels, use $\Omega \subseteq \mathbb{R}^n$ or $\mathbb{C}^n$ with Borel sigma-algebra
•
For discrete channels, the power set suffices

Probability Space

The triple $(\Omega, \mathcal{F}, \mathbb{P})$ consisting of a sample space, a sigma-algebra of events, and a probability measure satisfying Kolmogorov's three axioms.

Related: Sample Space, Sigma-Algebra

Almost Surely

An event $A$ holds almost surely (a.s.) if $\mathbb{P}(A) = 1$ . Equivalently, $A^c$ is a null event: $\mathbb{P}(A^c) = 0$ .

Related: Probability Space

Key Takeaway

The probability space $(\Omega, \mathcal{F}, \mathbb{P})$ is the central object of probability theory. Kolmogorov's three axioms (non-negativity, normalization, countable additivity) are the minimal rules that make the entire theory work. Everything — conditional probability, random variables, expectations, the law of large numbers — is a derived concept built on this triad.

Quick Check

Which of the following is NOT a consequence of the three Kolmogorov axioms?

$\mathbb{P}(A^c) = 1 - \mathbb{P}(A)$

$\mathbb{P}(A \cup B) \leq \mathbb{P}(A) + \mathbb{P}(B)$

$\mathbb{P}(A \mid B) = \mathbb{P}(A \cap B)/\mathbb{P}(B)$

$\mathbb{P}(\emptyset) = 0$

Correction:

\mathbb{P}(A \mid B) = \mathbb{P}(A \cap B)/\mathbb{P}(B)

Conditional probability is a DEFINITION, not a consequence of the axioms. It is introduced in Chapter 2 as a new concept built on top of the axiomatic framework.

Quick Check

If $A_n$ is the event that the $n$ -th transmitted packet is corrupted, and $\mathbb{P}(A_n) = 1/n^2$ for all $n$ , what does the first Borel-Cantelli lemma say?

With probability 1, infinitely many packets are corrupted.

With probability 1, only finitely many packets are corrupted.

No packets are ever corrupted.

The conclusion requires independence of the $A_n$ .

Correction:

With probability 1, only finitely many packets are corrupted.

Since $\sum_{n=1}^{\infty} 1/n^2 = \pi^2/6 < \infty$ , the first Borel-Cantelli lemma gives $\mathbb{P}(A_n \text{ i.o.}) = 0$ : only finitely many packets are corrupted, almost surely.

Kolmogorov's Axioms

Axioms: Formalizing Frequency as Measure

Historical Note: Andrei Kolmogorov and the Foundations of Probability (1933)

Definition: Probability Measure (Kolmogorov's Axioms)

Example: Constructing Probability Spaces

(a) Biased coin

(b) Geometric probability (toss until Tail)

(c) Uniform on $[0,1]$ (Lebesgue measure)

Theorem: Elementary Consequences of the Axioms

Property 1: $\mathbb{P}(\emptyset) = 0$

Property 2: Complementation

Property 3: Monotonicity

Property 5: Inclusion-exclusion for two sets

Theorem: General Inclusion-Exclusion Formula

Base case $n = 2$

Inductive step

Theorem: Union Bound (Boole's Inequality)

Finite union

Countably infinite union

Theorem: Continuity of Probability

Increasing case: decompose into disjoint pieces

Apply countable additivity

Decreasing case: apply to complements

Theorem: First Borel-Cantelli Lemma

Define the tail events

Bound the tail probability

Apply continuity of probability

Why This Matters: Union Bound in Error Probability Analysis

Inclusion-Exclusion for Three Events

Parameters

Law of Large Numbers: Empirical Frequency Convergence

Empirical Frequency of Coin Flips

Parameters

Common Mistake: Finite Additivity Is Not Enough

Choosing a Probability Space for a Channel Model

Probability Space

Almost Surely

Key Takeaway

Quick Check

Quick Check

Definition:
Probability Measure (Kolmogorov's Axioms)