Kolmogorov's Axioms
Axioms: Formalizing Frequency as Measure
Before Kolmogorov's 1933 monograph, probability was defined via frequencies: run an experiment times, count occurrences of event , define . This frequentist intuition is correct but not a definition β it merely says what probability approximates, not what it is.
Kolmogorov's insight was that the empirical frequency satisfies a small set of algebraic rules that can be elevated to axioms. The result is a clean, axiomatic foundation: three rules that characterize every valid probability model, from coin flips to Rayleigh fading to quantum measurements.
The three axioms below are the ones you will use in every proof in this book. Everything else β conditional probability, independence, expectation, the law of large numbers β follows as a consequence.
Historical Note: Andrei Kolmogorov and the Foundations of Probability (1933)
1933Andrei Nikolaevich Kolmogorov (1903β1987) was one of the most prolific mathematicians of the 20th century, with major contributions to analysis, topology, turbulence, information theory, and algorithmic complexity. His 1933 monograph Grundbegriffe der Wahrscheinlichkeitsrechnung (Foundations of the Theory of Probability) placed probability on the same rigorous axiomatic footing as geometry and algebra, grounding it in Lebesgue's measure theory.
The axioms he proposed β non-negativity, normalization, and countable additivity β are now so universally accepted that we simply call them "the axioms of probability." Kolmogorov later made foundational contributions to information theory (Kolmogorov complexity) and to the ergodic theory of stochastic processes.
Definition: Probability Measure (Kolmogorov's Axioms)
Probability Measure (Kolmogorov's Axioms)
A probability measure on is a set function satisfying:
Axiom 1 (Non-negativity): for all .
Axiom 2 (Normalization): .
Axiom 3 (Countable Additivity): If are pairwise disjoint ( for ), then
The triple is called a probability space.
Axiom 3 with the union restricted to finitely many terms is called finite additivity. Countable additivity is the stronger requirement β it extends the sum to infinity, which is what makes continuity of probability (Lemma 2 below) possible. There are philosophical schools (notably de Finetti) that accept only finite additivity, but for the mathematical development in this book, countable additivity is indispensable.
Example: Constructing Probability Spaces
For each of the following, verify that the proposed satisfies Kolmogorov's axioms: (a) , , for . (b) (toss until first Tail), . (c) , , (Lebesgue measure).
(a) Biased coin
Non-negativity: , since .
Normalization: .
Countable additivity (finite here): the four events , , , β all disjoint unions are covered by and .
(b) Geometric probability (toss until Tail)
The outcomes are for , with .
Non-negativity: since .
Normalization:
Countable additivity follows because outcomes are distinct singleton events and probability is defined as the sum over outcomes in the event.
(c) Uniform on $[0,1]$ (Lebesgue measure)
= total length of .
Non-negativity: Length is non-negative.
Normalization: .
Countable additivity: The Lebesgue measure is countably additive by construction β this is the central theorem of Lebesgue integration theory. For disjoint Borel sets , the total length of their union equals the sum of their lengths.
Theorem: Elementary Consequences of the Axioms
Let be a probability space and . Then:
- .
- (Complementation) .
- (Monotonicity) If then .
- .
- (Inclusion-Exclusion, ) .
These are the workhorse identities of probability. The complementation rule is used constantly: if is hard to compute directly, compute and subtract from 1. Monotonicity is the foundation of all probability bounds.
Property 1: $\mathbb{P}(\emptyset) = 0$
Write , a countable disjoint union (since and ). By Axiom 3: . Subtracting from both sides: for all . Hence .
Property 2: Complementation
Since and are disjoint with : , so .
Property 3: Monotonicity
If , write , a disjoint union. By finite additivity: , since by Axiom 1.
Property 5: Inclusion-exclusion for two sets
Write , a disjoint union. By finite additivity: . Since and , adding these: , which gives the result.
Theorem: General Inclusion-Exclusion Formula
For events :
The formula corrects for overcounting: when we add for all , points in two-way intersections are counted twice, so we subtract them; but then points in three-way intersections are subtracted once too many, so we add them back β and so on. The resulting alternating sum counts each point in the union exactly once.
Base case $n = 2$
Already proved in TElementary Consequences of the Axioms, Property 5.
Inductive step
Assume the formula holds for events. Write where . By the two-set formula: . Apply the inductive hypothesis to (with events) and to (again with events: ). Collecting terms yields the formula for events.
Theorem: Union Bound (Boole's Inequality)
For any finite or countably infinite collection of events :
The union bound is the most-used probability inequality in communications engineering. It says that the probability that at least one of many bad events occurs is at most the sum of their individual probabilities. In error probability analysis, it bounds the block error probability by the sum of pairwise error probabilities β a simple but powerful tool that leads to the Bhattacharyya and Chernoff bounds in Book FSI.
Finite union
By induction using the two-set formula: since .
Countably infinite union
Let . Then (an increasing sequence). By the continuity of probability (TContinuity of Probability) and the finite union bound: .
Theorem: Continuity of Probability
Let be a monotone sequence of events.
Increasing: If , then .
Decreasing: If , then .
Probability behaves like a continuous function with respect to monotone limits of events: the probability of the limiting event equals the limit of the probabilities. This mirrors the statement that for increasing/decreasing real sequences, β probability "commutes with limits" along monotone sequences.
Increasing case: decompose into disjoint pieces
Define and for . The are pairwise disjoint and , .
Apply countable additivity
$
Decreasing case: apply to complements
If , then . By the increasing case: . Also , so , giving the result.
Theorem: First Borel-Cantelli Lemma
Let be a sequence of events with . Define the event " occurs infinitely often" as Then .
If the sum of probabilities converges, then the probability of infinitely many events occurring is zero. In reliability theory: if component fails with probability and , almost surely only finitely many components ever fail. In coding theory: if codeword is decoded incorrectly with probability and , almost surely only finitely many codewords are in error.
Define the tail events
Let . Then (decreasing), and .
Bound the tail probability
By the union bound: . Since , the tail sum as . So .
Apply continuity of probability
By the decreasing case of TContinuity of Probability: .
Why This Matters: Union Bound in Error Probability Analysis
In digital communications, a message from a codebook of size is transmitted over a noisy channel. A decoding error occurs if the received signal is closer (in some metric) to a codeword other than the one transmitted. Letting be the event "the receiver prefers codeword over the true codeword," the block error probability is This is the union bound applied to error events. The bound is tight when errors are rare (small regime) and is the starting point for the Bhattacharyya bound and the ChernoffβGallager error exponent. Every error-probability derivation in Book FSI (Chapters 6β7) uses exactly this step.
Inclusion-Exclusion for Three Events
Visualize how the inclusion-exclusion formula counts the probability of by alternately adding and subtracting intersection probabilities. Adjust the individual and pairwise probabilities to see the formula in action.
Parameters
Law of Large Numbers: Empirical Frequency Convergence
Empirical Frequency of Coin Flips
Simulate flips of a biased coin with . Watch the empirical frequency converge to as increases.
Parameters
Common Mistake: Finite Additivity Is Not Enough
Mistake:
A weaker version of Axiom 3 requires additivity only for finitely many disjoint events. It is tempting to assume this is sufficient β after all, every experiment in practice involves finitely many outcomes, doesn't it?
Correction:
Finite additivity is not sufficient to derive the continuity of probability, the Borel-Cantelli lemmas, or any convergence theorem. Without countable additivity, one cannot speak meaningfully about limits of events or define the probability of an event described as the intersection of a countably infinite family (e.g., the event that a random walk never exceeds a threshold). All of probability theory for stochastic processes and convergence β the backbone of communications analysis β requires Axiom 3 in its countably additive form.
Choosing a Probability Space for a Channel Model
When modeling a wireless channel, the probability space is rarely written down explicitly β but it is always implicitly present. Consider a flat Rayleigh fading channel with received signal where (channel gain) and (noise). The implicit probability space is:
- (pairs of complex numbers )
- (Borel sets in the complex plane complex plane)
- = the product of two complex Gaussian distributions
Axiom 3 ensures we can compute for any threshold by integrating over the corresponding Borel set. Without the sigma-algebra structure, the integral would have no meaning.
- β’
The sample space must be large enough to describe all relevant quantities
- β’
For analog channels, use or with Borel sigma-algebra
- β’
For discrete channels, the power set suffices
Probability Space
The triple consisting of a sample space, a sigma-algebra of events, and a probability measure satisfying Kolmogorov's three axioms.
Related: Sample Space, Sigma-Algebra
Almost Surely
An event holds almost surely (a.s.) if . Equivalently, is a null event: .
Related: Probability Space
Key Takeaway
The probability space is the central object of probability theory. Kolmogorov's three axioms (non-negativity, normalization, countable additivity) are the minimal rules that make the entire theory work. Everything β conditional probability, random variables, expectations, the law of large numbers β is a derived concept built on this triad.
Quick Check
Which of the following is NOT a consequence of the three Kolmogorov axioms?
Conditional probability is a DEFINITION, not a consequence of the axioms. It is introduced in Chapter 2 as a new concept built on top of the axiomatic framework.
Quick Check
If is the event that the -th transmitted packet is corrupted, and for all , what does the first Borel-Cantelli lemma say?
With probability 1, infinitely many packets are corrupted.
With probability 1, only finitely many packets are corrupted.
No packets are ever corrupted.
The conclusion requires independence of the .
Since , the first Borel-Cantelli lemma gives : only finitely many packets are corrupted, almost surely.