Why Measure Theory
Why Bother with Measure Theory?
Throughout this book we have worked with probability in the way most engineers learn it: sample spaces, PMFs, PDFs, expectations defined as sums or integrals. This machinery works well for discrete random variables and for continuous random variables with densities. But it breaks down β sometimes silently, sometimes spectacularly β in three situations that matter for research:
- Random variables that are neither discrete nor continuous (e.g., a fading channel gain that is zero with positive probability and continuously distributed otherwise).
- Conditioning on events of probability zero (what does really mean when ?).
- Infinite-dimensional probability (stochastic processes, random fields, the very notion of "a random function").
Measure theory provides a single, unified framework that handles all three. The payoff is not just rigor for its own sake β it is the language in which the deepest results of probability, information theory, and statistical inference are stated and proved.
Definition: The Riemann Integral (Brief Recap)
The Riemann Integral (Brief Recap)
The Riemann integral of a bounded function is defined as the limit of Riemann sums: partition the domain into subintervals, form upper and lower sums by evaluating at suprema and infima on each subinterval, and take the limit as the mesh of the partition goes to zero.
A bounded function is Riemann integrable if and only if its set of discontinuities has Lebesgue measure zero (the Lebesgue criterion).
Example: A Function the Riemann Integral Cannot Handle
Consider the Dirichlet function , which equals 1 if is rational and 0 if is irrational. Show that this function is not Riemann integrable on , yet it has a well-defined Lebesgue integral.
Riemann integral fails
On any subinterval , there exist both rationals and irrationals (density of and ). Hence:
- Upper Riemann sum
- Lower Riemann sum
Since upper and lower sums never agree, the function is not Riemann integrable.
Lebesgue integral works
The Lebesgue approach partitions the range instead of the domain. We have , which has Lebesgue measure (countable sets have measure zero). Similarly, has measure 1. Therefore:
The Key Idea: Partition the Range, Not the Domain
The Riemann integral asks: for each piece of the domain, how large is ? The Lebesgue integral asks: for each value , how much of the domain maps to ?
Formally, for a non-negative measurable function : where the supremum is over all finite partitions of the range.
This is why Lebesgue integration can handle functions that are "too wild" for Riemann: the Dirichlet function takes only two values (0 and 1), and the sets where it takes each value are perfectly measurable, even though they are interleaved in a way that defeats domain-based partitioning.
Definition: Lebesgue Integral of a Simple Function
Lebesgue Integral of a Simple Function
A simple function is a measurable function that takes finitely many values: where are disjoint measurable sets. Its Lebesgue integral with respect to a measure is:
Every non-negative measurable function is the pointwise limit of an increasing sequence of simple functions. The Lebesgue integral of a general non-negative function is defined as the supremum of integrals of simple functions below it.
Definition: Lebesgue Integral of a General Measurable Function
Lebesgue Integral of a General Measurable Function
For a measurable function , write where and . Then: provided at least one of the two integrals on the right is finite. When both are finite, we say is -integrable (or simply integrable) and write .
Historical Note: Henri Lebesgue and the Birth of Modern Integration
1902--1933Henri Lebesgue (1875--1941) introduced his theory of integration in his 1902 doctoral thesis at the Sorbonne. The thesis, Intgrale, longueur, aire (Integral, length, area), was one of the most consequential works in the history of mathematics. Lebesgue's insight was to measure the preimages of a function rather than partitioning its domain β an idea that unified integration, probability, and functional analysis.
The impact on probability was crystallized three decades later by Kolmogorov's 1933 Grundbegriffe der Wahrscheinlichkeitsrechnung, which built the entire axiomatic theory of probability on Lebesgue's measure-theoretic foundation.
Theorem: Monotone Convergence Theorem (MCT)
Let be a sequence of measurable functions with pointwise. Then: In short: for increasing non-negative sequences, the limit and the integral commute.
Since increases pointwise, both sides are non-decreasing sequences of non-negative extended reals. The content of the theorem is that no "mass escapes to infinity" β the integral of the limit is the limit of the integrals.
Lower bound
Since pointwise for every , monotonicity of the integral gives . Taking :
Upper bound via simple functions
Let and let be any simple function with . Fix and define . Then and Taking : . Since was arbitrary, let . Since was arbitrary, take the supremum over all such to get .
Combine
The two bounds yield .
Theorem: Dominated Convergence Theorem (DCT)
Let be a sequence of measurable functions with pointwise (or a.e.). Suppose there exists an integrable function (a "dominating" function) such that for all and a.e. . Then is integrable and:
The MCT handles monotone sequences; the DCT handles sequences that are not monotone but are "controlled" by a single integrable envelope. The dominated convergence theorem is the workhorse of measure-theoretic probability β it is the rigorous justification behind every interchange of limit and expectation we have performed informally in earlier chapters.
Apply Fatou's lemma to $g + f_n$ and $g - f_n$
Since , both and . By Fatou's lemma:
Extract the conclusion
Subtracting (which is finite) from the first inequality gives . The second gives , i.e., . Combining: .
Example: Interchanging Limit and Expectation via DCT
Let where . Show that using the DCT, even though is not integrable when handled carelessly.
Compute the pointwise limit
For any fixed , eventually so . At , for all . Hence a.s.
Find a dominating function
We have for . Actually, and on we get . So a.s., and is integrable on .
Apply DCT
By the DCT: .
The Cantor Function (Devil's Staircase)
The Cantor function is continuous, non-decreasing, and maps onto , yet it is constant almost everywhere (its derivative is zero on a set of Lebesgue measure 1). It is the CDF of a random variable that is neither discrete nor continuous β a distribution that is singular with respect to Lebesgue measure. Adjust the iteration depth to see how the function is constructed.
Parameters
Number of construction steps for the Cantor function
Common Mistake: Not Every Lebesgue-Integrable Function is Riemann-Integrable
Mistake:
Assuming that if a function has a finite Lebesgue integral, it must also be Riemann integrable.
Correction:
The Dirichlet function has Lebesgue integral zero on but is nowhere Riemann integrable. The Riemann integral requires the set of discontinuities to have measure zero, whereas the Lebesgue integral only requires measurability. Conversely, every Riemann-integrable function is also Lebesgue-integrable, and the two integrals agree.
Quick Check
The Monotone Convergence Theorem requires which condition on the sequence ?
pointwise and for some integrable
pointwise
are all bounded and converge uniformly
The MCT requires a non-negative, non-decreasing sequence. No dominating function is needed.
Lebesgue Integral
An integral defined by partitioning the range of a function and measuring the preimages, rather than partitioning the domain. For a non-negative measurable function on a measure space : .
Related: The Riemann Integral (Brief Recap), Measure
Simple Function
A measurable function taking finitely many values: where are disjoint measurable sets.
Related: Lebesgue Integral, Measurable Function (= Random Variable)
Key Takeaway
The Lebesgue integral generalizes the Riemann integral by partitioning the range instead of the domain. This seemingly simple change allows integration of far more functions, provides the monotone and dominated convergence theorems, and β most importantly for us β gives a rigorous foundation for expectation that works for all random variables, not just those with PMFs or PDFs.