Probability Spaces and Axioms
Why Probability for Wireless Communications?
Chapter 1 equipped us with the deterministic linear-algebraic machinery to write the MIMO input--output relation
But in any real system, the channel matrix , the noise vector , and often the transmit symbol itself are random. Wireless channels fade because of multipath propagation; thermal noise is an inherently stochastic phenomenon described by statistical mechanics; and coded data streams are designed to look random to maximise entropy.
A deterministic analysis can tell us what happens for a particular channel realisation, but system design demands answers to questions such as:
- What is the probability that the bit-error rate exceeds ?
- What average data rate can a fading channel support?
- How does spatial diversity across independent antenna paths reduce outage probability?
These questions are unanswerable without a rigorous probability framework. This section lays the measure-theoretic foundation --- sample spaces, -algebras, and the Kolmogorov axioms --- that underlies every probabilistic statement in the remainder of this text. The payoff is immediate: by the end of this section we will already be applying Bayes' theorem to optimal symbol detection.
Definition: Sample Space
Sample Space
The sample space, denoted , is the set of all possible outcomes of a random experiment. Each element is called a sample point (or outcome).
The sample space may be:
- Finite: e.g., for a single BPSK symbol.
- Countably infinite: e.g., for the number of packet arrivals in a time slot.
- Uncountable: e.g., for a continuous noise sample, or for the set of all possible MIMO channel realisations.
In wireless communications, common sample spaces include:
| Experiment | Sample space |
|---|---|
| Transmitted BPSK symbol | |
| Transmitted QPSK symbol | |
| Received baseband sample (AWGN) | or |
| Flat-fading SISO channel gain | |
| MIMO channel matrix () |
The choice of sample space is a modelling decision. For BPSK over AWGN, one may take (encoding both the transmitted symbol and the received signal) or, if the transmitted symbol is treated as deterministic, simply .
Definition: -Algebra (Sigma-Algebra)
-Algebra (Sigma-Algebra)
Let be a sample space. A -algebra (or -field) on is a collection of subsets of satisfying the following three axioms:
-
Contains the sample space: .
-
Closed under complementation: If , then .
-
Closed under countable unions: If , then .
Each element is called an event.
Immediate consequences of the axioms:
- (by axioms 1 and 2).
- is closed under countable intersections (by De Morgan's laws and axioms 2--3): .
- is closed under set differences: .
For finite or countable , one typically takes (the power set). For uncountable sample spaces such as , the power set is too large and one uses the Borel -algebra , generated by all open intervals. This subtlety becomes essential when defining continuous random variables in Section 2.2.
Definition: Probability Measure (Kolmogorov Axioms)
Probability Measure (Kolmogorov Axioms)
Let be a measurable space (a sample space equipped with a -algebra). A probability measure is a function satisfying the three Kolmogorov axioms:
(K1) Non-negativity. For every event ,
(K2) Normalization.
(K3) Countable additivity (-additivity). If are pairwise disjoint (i.e., for ), then
These three axioms, together with the -algebra structure, are sufficient to derive all standard rules of probability.
Countable additivity (K3) is strictly stronger than finite additivity. The distinction matters when taking limits of event sequences --- a situation that arises naturally in coding theory (block length ) and in ergodic arguments for stochastic processes.
Definition: Probability Space
Probability Space
A probability space is a triple where:
- is a sample space (DSample Space),
- is a -algebra on (-Algebra (Sigma-Algebra)" data-ref-type="definition">D-Algebra (Sigma-Algebra)),
- is a probability measure on (DProbability Measure (Kolmogorov Axioms)).
Every probabilistic model in communications theory begins --- at least implicitly --- with the specification of a probability space. The triple provides:
- a complete catalogue of what can happen (),
- a specification of which collections of outcomes are "observable" or "measurable" (), and
- a consistent assignment of likelihoods ().
Example: Probability Space for BPSK over AWGN
Construct an explicit probability space for the experiment of transmitting a single BPSK symbol over an AWGN channel with received signal , where .
Identify the sample space
Each outcome of the experiment is fully described by the pair (transmitted symbol, received signal). Hence
A typical sample point is with and .
Define the $\sigma$-algebra
We take the product -algebra
where is the power set of the binary set (with four elements: ) and is the Borel -algebra on .
Typical events include:
- = "symbol was sent"
- = "the received signal is positive"
- = "symbol was sent and "
Specify the probability measure
Assume equally likely symbols: . Given , the received signal has conditional density
The joint probability of any measurable set (where and ) is
This extends by additivity to all events in . One can verify that the Kolmogorov axioms (K1)--(K3) are satisfied: non-negativity is immediate, normalisation follows from for each , and countable additivity is inherited from the Lebesgue integral.
Summary
The complete probability space is
where is defined via the joint density above. This simple example already captures the essential structure: a discrete "information" component (the symbol) and a continuous "observation" component (the received signal), coupled through the channel law.
Theorem: Basic Properties of Probability Measures
Let be a probability space. Then:
- .
- Complement rule: for all .
- Monotonicity: If , then .
- Sub-additivity (union bound): .
These properties follow directly from the three Kolmogorov axioms. The union bound (property 4) is used extensively in communications for bounding error probabilities over constellations --- it is the foundation of the "nearest-neighbour union bound" on symbol-error rate.
For property 1, write and apply (K3).
For monotonicity, decompose .
Property 1: $P(\varnothing) = 0$
Write (a countable union of pairwise disjoint sets). By (K3),
Since , the infinite sum must converge, which requires .
Property 2: Complement rule
with . By (K3) (finite additivity as a special case) and (K2),
giving .
Property 3: Monotonicity
If , then with . By finite additivity,
since by (K1).
Property 4: Union bound
Define and for . The sets are pairwise disjoint, , and , so by monotonicity. Then
Theorem: Inclusion-Exclusion Principle
Let be a probability space. For any two events ,
More generally, for events ,
Summing double-counts the overlap . Subtracting corrects this. The general formula alternately adds and subtracts to correct for over- and under-counting of higher-order overlaps.
Decompose into three disjoint parts: , , and .
Apply finite additivity to this disjoint decomposition.
Disjoint decomposition
Write as the disjoint union
By finite additivity (a consequence of K3),
Express individual probabilities
Similarly, is a disjoint union, so
giving . Likewise,
Combine
Substituting into (1):
Definition: Conditional Probability
Conditional Probability
Let be a probability space and let with . The conditional probability of an event given is
Key properties:
-
For fixed with , the map is itself a probability measure on .
-
Multiplication rule: .
-
Chain rule (general): For events with ,
In detection theory, conditioning is the fundamental operation: the receiver observes and must compute --- the conditional probability of each hypothesis given the observation.
Theorem: Law of Total Probability
Let be a partition of : the are pairwise disjoint, , and for each . Then for any event ,
The result extends to countable partitions provided the sum converges.
We "break up" the calculation of by conditioning on which element of the partition occurred. In communications, the partition is often the set of possible transmitted symbols , and is the event of a detection error.
Write and note the pieces are disjoint.
Decompose and apply additivity
Since partitions ,
The sets are pairwise disjoint (because the are), so by countable additivity (K3),
where the last step uses the definition of conditional probability.
Theorem: Bayes' Theorem
Let be a partition of with for each , and let with . Then for each ,
The terms have standard names:
| Term | Name | Role |
|---|---|---|
| Prior | Belief about before observing | |
| Likelihood | How probable is under hypothesis | |
| Posterior | Updated belief about after observing | |
| Evidence (marginal likelihood) | Normalising constant |
Bayes' theorem "inverts" the direction of conditioning. The channel model gives us --- the likelihood. The receiver needs --- the posterior. Bayes' theorem is the bridge.
Start from the definition of and expand using total probability.
Apply the definition of conditional probability
Expand $P(A)$ via total probability
Example: MAP Detection for BPSK via Bayes' Theorem
A BPSK transmitter sends with equal prior probabilities over an AWGN channel. The receiver observes
Using Bayes' theorem, compute the posterior probability and derive the Maximum A Posteriori (MAP) decision rule.
Identify the likelihoods
The conditional density of given is
Therefore:
Apply Bayes' theorem
The posterior probability is (using the continuous analogue of Bayes' theorem):
With equal priors , the priors cancel:
Simplify using the log-likelihood ratio
Define the log-likelihood ratio (LLR):
Then
which is the sigmoid (logistic) function of .
Derive the MAP decision rule
The MAP detector chooses the symbol with the highest posterior:
Since if and only if , i.e., , the MAP rule simplifies to:
With equal priors, the MAP detector coincides with the maximum-likelihood (ML) detector. The decision boundary is at , which is the midpoint between the two constellation points --- an intuitively satisfying result.
Note: With unequal priors , the decision boundary shifts to , biasing detection toward the more probable symbol.
Bayes' Theorem in Action: BPSK Detection
Definition: Independence of Events
Independence of Events
Let be a probability space.
Pairwise independence. Two events are independent if
Equivalently (when ), and are independent if and only if : knowing that occurred does not change the probability of .
Mutual independence. Events are mutually independent if for every subcollection with ,
This requires equalities to hold (one for each subset of size ), which is strictly stronger than pairwise independence alone (which only requires equalities).
Independence is a modelling assumption, not something derived from the axioms. In wireless, the assumption that fading coefficients across well-separated antennas are independent is justified by physical arguments (sufficient antenna spacing in a rich scattering environment), but must always be stated explicitly.
Why This Matters: Independent Fading and Diversity Gain
The concept of independence is at the heart of diversity in wireless systems. Consider a receiver with antennas, each observing a faded copy of the transmitted signal:
where is the fading coefficient on antenna , is the transmitted symbol, and is additive noise.
If the fading coefficients are mutually independent (DIndependence of Events), the probability that all channels are simultaneously in a deep fade is
For Rayleigh fading with at low threshold, this product scales as , yielding a diversity order of . The error probability at high SNR then decays as --- each additional independent antenna path contributes one order of magnitude faster decay.
Key insight: Without independence, adding antennas may not help. If (fully correlated fading), all antennas fade together and the diversity order remains . Independence is the mechanism that converts extra hardware into reliability.
See full treatment in Chapter 4، Section 3
Historical Note: Kolmogorov's Axiomatization (1933)
Prior to the 20th century, probability was a collection of useful but ad hoc calculation rules. Attempts to ground it rigorously --- by Laplace (equally likely outcomes), von Mises (frequency limits), and others --- each covered only special cases.
In 1933, the Russian mathematician Andrey Nikolaevich Kolmogorov (1903--1987) published Grundbegriffe der Wahrscheinlichkeitsrechnung ("Foundations of the Theory of Probability"), in which he showed that the entire theory could be derived from just three axioms by identifying probability with a normalised measure on a -algebra of events. This measure-theoretic framework unified discrete and continuous probability, resolved paradoxes, and put limit theorems (law of large numbers, central limit theorem) on rigorous footing.
Kolmogorov's axioms are precisely the axioms (K1)--(K3) of DProbability Measure (Kolmogorov Axioms). Nearly a century later, they remain the universally accepted foundation of probability theory --- and, by extension, of all stochastic models in communications, information theory, and signal processing.
Historical irony: Kolmogorov initially trained in history before switching to mathematics. His axiomatization of probability was inspired by Lebesgue's theory of integration --- the same machinery that today underpins the definition of expectation, density functions, and information-theoretic integrals.
Common Mistake: Confusing Pairwise Independence with Mutual Independence
Mistake:
A common error is to assume that if events are pairwise independent (i.e., for every pair ), then they are automatically mutually independent.
Correction:
Pairwise independence does not imply mutual independence.
A classic counterexample uses two fair coin flips. Define:
- = "first coin is heads,"
- = "second coin is heads,"
- = "the two coins show the same face."
One can verify:
- .
- . (Pairwise independent.)
- . (Pairwise independent.)
- . (Pairwise independent.)
- But .
The triple intersection condition fails. Knowing both coins are heads () makes certain, so the three events are not mutually independent.
In wireless context: When modelling fading across multiple antennas or subcarriers, one must verify (or assume) mutual independence, not merely pairwise independence, for diversity arguments to hold. Physical channel models based on independent scatterers typically guarantee mutual independence, but engineered systems with shared RF components may introduce subtle three-way (or higher) correlations.
Quick Check
In a binary communication system, and . The channel has crossover probabilities and . What is ?
By Bayes' theorem: .
Quick Check
Let and be independent events with and . What is ?
If and are independent, then and are also independent. Therefore .
Quick Check
Events and satisfy , , and . What is ?
By the inclusion-exclusion principle: .
Sample space
The set of all possible outcomes of a random experiment. Each element is a sample point. The sample space may be finite, countably infinite, or uncountable.
Related: Sample Space, Probability Space
-algebra
A collection of subsets of that contains , is closed under complementation, and is closed under countable unions. Elements of are called events. For , the standard choice is the Borel -algebra .
Related: -Algebra (Sigma-Algebra), Probability Space
Probability measure
A function satisfying the Kolmogorov axioms: non-negativity (), normalisation (), and countable additivity for disjoint events.
Related: Probability Measure (Kolmogorov Axioms), Probability Space, Kolmogorov's Axiomatization (1933)
Conditional probability
The probability of event given that event has occurred, defined as when . Conditioning is the fundamental operation in Bayesian detection and estimation.
Related: Conditional Probability, Bayes' Theorem, MAP Detection for BPSK via Bayes' Theorem
Independence
Events and are independent if . Mutual independence of events requires the product rule to hold for every subcollection of size , which is strictly stronger than pairwise independence.
Related: Independence of Events, Confusing Pairwise Independence with Mutual Independence, Independent Fading and Diversity Gain
Key Takeaway
The core messages of this section:
-
Everything starts with . The probability space triple is the rigorous foundation for every stochastic statement in communications. The three Kolmogorov axioms --- non-negativity, normalisation, countable additivity --- are minimal yet sufficient to derive all of probability theory.
-
Bayes' theorem inverts the channel. The channel model gives us ; Bayes' theorem converts this into the posterior , enabling optimal (MAP) detection. For BPSK with equal priors over AWGN, the MAP rule reduces to .
-
Independence enables diversity. When fading paths are mutually independent, the probability of simultaneous deep fades decays as a product, yielding diversity order with antennas. Pairwise independence alone is insufficient --- mutual independence is required.