Why Measure Theory

Why Bother with Measure Theory?

Throughout this book we have worked with probability in the way most engineers learn it: sample spaces, PMFs, PDFs, expectations defined as sums or integrals. This machinery works well for discrete random variables and for continuous random variables with densities. But it breaks down β€” sometimes silently, sometimes spectacularly β€” in three situations that matter for research:

  1. Random variables that are neither discrete nor continuous (e.g., a fading channel gain that is zero with positive probability and continuously distributed otherwise).
  2. Conditioning on events of probability zero (what does E[X∣Y=y]\mathbb{E}[X \mid Y = y] really mean when P(Y=y)=0P(Y = y) = 0?).
  3. Infinite-dimensional probability (stochastic processes, random fields, the very notion of "a random function").

Measure theory provides a single, unified framework that handles all three. The payoff is not just rigor for its own sake β€” it is the language in which the deepest results of probability, information theory, and statistical inference are stated and proved.

Definition:

The Riemann Integral (Brief Recap)

The Riemann integral of a bounded function f:[a,b]β†’Rf : [a, b] \to \mathbb{R} is defined as the limit of Riemann sums: partition the domain [a,b][a, b] into subintervals, form upper and lower sums by evaluating ff at suprema and infima on each subinterval, and take the limit as the mesh of the partition goes to zero.

A bounded function is Riemann integrable if and only if its set of discontinuities has Lebesgue measure zero (the Lebesgue criterion).

Example: A Function the Riemann Integral Cannot Handle

Consider the Dirichlet function 1Q(x)\mathbf{1}_{\mathbb{Q}}(x), which equals 1 if xx is rational and 0 if xx is irrational. Show that this function is not Riemann integrable on [0,1][0, 1], yet it has a well-defined Lebesgue integral.

,

The Key Idea: Partition the Range, Not the Domain

The Riemann integral asks: for each piece of the domain, how large is ff? The Lebesgue integral asks: for each value yy, how much of the domain maps to yy?

Formally, for a non-negative measurable function ff: ∫f dΞΌ=sup⁑{βˆ‘k=1nykβ‹…ΞΌ({x:yk≀f(x)<yk+1})}\int f\, d\mu = \sup \left\{ \sum_{k=1}^{n} y_k \cdot \mu(\{x : y_k \leq f(x) < y_{k+1}\}) \right\} where the supremum is over all finite partitions 0=y0<y1<β‹―<yn0 = y_0 < y_1 < \cdots < y_n of the range.

This is why Lebesgue integration can handle functions that are "too wild" for Riemann: the Dirichlet function takes only two values (0 and 1), and the sets where it takes each value are perfectly measurable, even though they are interleaved in a way that defeats domain-based partitioning.

Definition:

Lebesgue Integral of a Simple Function

A simple function is a measurable function that takes finitely many values: Ο†(x)=βˆ‘k=1nak1Ak(x),\varphi(x) = \sum_{k=1}^{n} a_k \mathbf{1}_{A_k}(x), where A1,…,AnA_1, \ldots, A_n are disjoint measurable sets. Its Lebesgue integral with respect to a measure ΞΌ\mu is: βˆ«Ο†β€‰dΞΌ=βˆ‘k=1nak μ(Ak).\int \varphi\, d\mu = \sum_{k=1}^{n} a_k \, \mu(A_k).

Every non-negative measurable function is the pointwise limit of an increasing sequence of simple functions. The Lebesgue integral of a general non-negative function is defined as the supremum of integrals of simple functions below it.

Definition:

Lebesgue Integral of a General Measurable Function

For a measurable function f:Ξ©β†’Rf : \Omega \to \mathbb{R}, write f=f+βˆ’fβˆ’f = f^+ - f^- where f+(x)=max⁑(f(x),0)f^+(x) = \max(f(x), 0) and fβˆ’(x)=max⁑(βˆ’f(x),0)f^-(x) = \max(-f(x), 0). Then: ∫f dΞΌ=∫f+ dΞΌβˆ’βˆ«fβˆ’β€‰dΞΌ,\int f\, d\mu = \int f^+\, d\mu - \int f^-\, d\mu, provided at least one of the two integrals on the right is finite. When both are finite, we say ff is ΞΌ\mu-integrable (or simply integrable) and write f∈L1(ΞΌ)f \in L^1(\mu).

Historical Note: Henri Lebesgue and the Birth of Modern Integration

1902--1933

Henri Lebesgue (1875--1941) introduced his theory of integration in his 1902 doctoral thesis at the Sorbonne. The thesis, Intgrale, longueur, aire (Integral, length, area), was one of the most consequential works in the history of mathematics. Lebesgue's insight was to measure the preimages of a function rather than partitioning its domain β€” an idea that unified integration, probability, and functional analysis.

The impact on probability was crystallized three decades later by Kolmogorov's 1933 Grundbegriffe der Wahrscheinlichkeitsrechnung, which built the entire axiomatic theory of probability on Lebesgue's measure-theoretic foundation.

Theorem: Monotone Convergence Theorem (MCT)

Let {fn}\{f_n\} be a sequence of measurable functions with 0≀f1≀f2≀⋯0 \leq f_1 \leq f_2 \leq \cdots pointwise. Then: lim⁑nβ†’βˆžβˆ«fn dΞΌ=∫lim⁑nβ†’βˆžfn dΞΌ.\lim_{n \to \infty} \int f_n\, d\mu = \int \lim_{n \to \infty} f_n\, d\mu. In short: for increasing non-negative sequences, the limit and the integral commute.

Since fnf_n increases pointwise, both sides are non-decreasing sequences of non-negative extended reals. The content of the theorem is that no "mass escapes to infinity" β€” the integral of the limit is the limit of the integrals.

Theorem: Dominated Convergence Theorem (DCT)

Let {fn}\{f_n\} be a sequence of measurable functions with fnβ†’ff_n \to f pointwise (or a.e.). Suppose there exists an integrable function gg (a "dominating" function) such that ∣fn(x)βˆ£β‰€g(x)|f_n(x)| \leq g(x) for all nn and a.e. xx. Then ff is integrable and: lim⁑nβ†’βˆžβˆ«fn dΞΌ=∫f dΞΌ.\lim_{n \to \infty} \int f_n\, d\mu = \int f\, d\mu.

The MCT handles monotone sequences; the DCT handles sequences that are not monotone but are "controlled" by a single integrable envelope. The dominated convergence theorem is the workhorse of measure-theoretic probability β€” it is the rigorous justification behind every interchange of limit and expectation we have performed informally in earlier chapters.

Example: Interchanging Limit and Expectation via DCT

Let Xn=nX1[0,1/n](X)X_n = n X \mathbf{1}_{[0, 1/n]}(X) where X∼Uniform[0,1]X \sim \text{Uniform}[0,1]. Show that lim⁑nβ†’βˆžE[Xn]=0\lim_{n \to \infty} \mathbb{E}[X_n] = 0 using the DCT, even though sup⁑nXn\sup_n X_n is not integrable when handled carelessly.

The Cantor Function (Devil's Staircase)

The Cantor function is continuous, non-decreasing, and maps [0,1][0,1] onto [0,1][0,1], yet it is constant almost everywhere (its derivative is zero on a set of Lebesgue measure 1). It is the CDF of a random variable that is neither discrete nor continuous β€” a distribution that is singular with respect to Lebesgue measure. Adjust the iteration depth to see how the function is constructed.

Parameters
8

Number of construction steps for the Cantor function

Common Mistake: Not Every Lebesgue-Integrable Function is Riemann-Integrable

Mistake:

Assuming that if a function has a finite Lebesgue integral, it must also be Riemann integrable.

Correction:

The Dirichlet function 1Q\mathbf{1}_{\mathbb{Q}} has Lebesgue integral zero on [0,1][0,1] but is nowhere Riemann integrable. The Riemann integral requires the set of discontinuities to have measure zero, whereas the Lebesgue integral only requires measurability. Conversely, every Riemann-integrable function is also Lebesgue-integrable, and the two integrals agree.

Quick Check

The Monotone Convergence Theorem requires which condition on the sequence {fn}\{f_n\}?

fnβ†’ff_n \to f pointwise and ∣fnβˆ£β‰€g|f_n| \leq g for some integrable gg

0≀f1≀f2≀⋯0 \leq f_1 \leq f_2 \leq \cdots pointwise

fnf_n are all bounded and converge uniformly

Lebesgue Integral

An integral defined by partitioning the range of a function and measuring the preimages, rather than partitioning the domain. For a non-negative measurable function ff on a measure space (Ξ©,F,ΞΌ)(\Omega, \mathcal{F}, \mu): ∫f dΞΌ=sup⁑{βˆ«Ο†β€‰dΞΌ:0≀φ≀f, φ simple}\int f\, d\mu = \sup\{\int \varphi\, d\mu : 0 \leq \varphi \leq f, \, \varphi \text{ simple}\}.

Related: The Riemann Integral (Brief Recap), Measure

Simple Function

A measurable function taking finitely many values: Ο†=βˆ‘k=1nak1Ak\varphi = \sum_{k=1}^n a_k \mathbf{1}_{A_k} where {Ak}\{A_k\} are disjoint measurable sets.

Related: Lebesgue Integral, Measurable Function (= Random Variable)

Key Takeaway

The Lebesgue integral generalizes the Riemann integral by partitioning the range instead of the domain. This seemingly simple change allows integration of far more functions, provides the monotone and dominated convergence theorems, and β€” most importantly for us β€” gives a rigorous foundation for expectation that works for all random variables, not just those with PMFs or PDFs.