Radon-Nikodym Theorem and Likelihood Ratios
Densities as Derivatives of Measures
We are used to thinking of a PDF as the function you integrate to get probabilities: . But what is this function, really? It is the rate at which the probability measure accumulates mass compared to the Lebesgue measure . In other words, the PDF is a derivative of one measure with respect to another: .
The Radon-Nikodym theorem makes this precise and vastly generalizes it: any time one measure is "dominated by" another (absolutely continuous), there exists a density function relating them. The payoff for us is immediate: the likelihood ratio used in hypothesis testing is nothing but the Radon-Nikodym derivative — and this viewpoint extends hypothesis testing to settings where densities do not exist (e.g., testing between stochastic processes).
Definition: Absolute Continuity of Measures
Absolute Continuity of Measures
Let and be measures on . We say is absolutely continuous with respect to , written , if: In words: every -null set is also -null. The measure "dominates" .
If and , the two measures are equivalent (): they agree on which sets have measure zero.
Definition: Singular Measures
Singular Measures
Two measures and on are mutually singular, written , if there exists with and . Intuitively, and "live on disjoint sets."
The Cantor distribution is singular with respect to Lebesgue measure: it concentrates all its mass on the Cantor set, which has Lebesgue measure zero. Point masses (discrete distributions) are also singular w.r.t. Lebesgue measure.
Theorem: Lebesgue Decomposition Theorem
Let and be -finite measures on . Then has a unique decomposition: where (absolutely continuous part) and (singular part).
Any measure can be split into a part that has a density w.r.t. and a part that lives on a -null set. For a random variable's distribution w.r.t. Lebesgue measure: the absolutely continuous part is the "continuous density" part, and the singular part includes point masses (discrete component) and singular continuous distributions (like the Cantor distribution).
Construction via Radon-Nikodym
Define (a -finite measure). Both and . By Radon-Nikodym (applied to the -dominated case): and . Set . Then is absolutely continuous w.r.t. , and is singular ().
Theorem: Radon-Nikodym Theorem
Let be a -finite measure on and let be a finite measure with . Then there exists a non-negative measurable function such that: The function is unique -a.e. and is called the Radon-Nikodym derivative of with respect to , written .
The Radon-Nikodym derivative is the "density" of relative to . When (Lebesgue measure) and (distribution of a continuous random variable), , the probability density function. The theorem says that this notion of density exists whenever one measure is absolutely continuous with respect to another.
Von Neumann's proof via Hilbert space
Consider . The linear functional is bounded: (by Cauchy-Schwarz, since is finite). By the Riesz representation theorem, there exists with for all .
Identify the derivative
Setting : . Rearranging: . One shows -a.e. (using ). Define . Then for all .
Historical Note: Radon, Nikodym, and the Density Problem
1913--1930Johann Radon proved the theorem in 1913 for the special case of with Lebesgue measure. The full abstract version was established by Otton Nikodym in 1930. The theorem resolved a long-standing question: under what conditions does one measure have a "density" with respect to another? The answer — absolute continuity — is both necessary and sufficient, and the result became a cornerstone of modern analysis, probability, and mathematical statistics.
Definition: Likelihood Ratio as Radon-Nikodym Derivative
Likelihood Ratio as Radon-Nikodym Derivative
Let and be two probability measures on with . The likelihood ratio is the Radon-Nikodym derivative: When and have densities with respect to a common dominating measure (e.g., Lebesgue measure), this reduces to the familiar ratio:
The Radon-Nikodym viewpoint is essential when densities do not exist — for example, when testing between two Gaussian processes (the Cameron-Martin-Girsanov theorem) or between discrete and continuous hypotheses.
Theorem: Chain Rule for Radon-Nikodym Derivatives
If , then and:
Just like the chain rule for ordinary derivatives: the density of relative to is the product of densities along the "chain." This is used in statistics when changing the reference measure — for example, computing the likelihood ratio under a composite hypothesis by going through a parametric family.
Verify the defining property
For any : where the second equality uses the change-of-measure formula for the integral (if , then ). The result satisfies the defining property of , so by uniqueness they are equal a.e.
Example: Gaussian Likelihood Ratio as Radon-Nikodym Derivative
Let and on . Compute the Radon-Nikodym derivative .
Both measures are a.c. w.r.t. Lebesgue
and . Since both are positive everywhere, (mutually absolutely continuous).
Apply the chain rule
$
Interpretation
This is the familiar likelihood ratio for testing vs with a single Gaussian observation. The Neyman-Pearson lemma says the optimal test compares to a threshold — or equivalently, compares to a threshold (since is monotone in ).
Radon-Nikodym Derivative as Density Ratio
Visualize two probability distributions and (both Gaussian with different parameters) and their Radon-Nikodym derivative . The derivative shows where places relatively more mass than .
Parameters
The Neyman-Pearson Lemma in Radon-Nikodym Language
The Neyman-Pearson lemma (Book FSI, Chapter 2) states that the most powerful test of versus at level rejects when , where is the likelihood ratio.
In the Radon-Nikodym framework, this is completely general: it works even when are measures on infinite-dimensional spaces (e.g., the path space of a stochastic process). This is how one formulates detection of signals in continuous-time noise — the Cameron-Martin theorem gives the explicit form of for Gaussian processes.
Theorem: Change of Measure Formula
If with , then for any measurable : In probability: if denotes expectation under and , then
To compute an expectation under , you can instead compute a weighted expectation under , where the weight is the likelihood ratio. This is the foundation of importance sampling — a Monte Carlo technique where you sample from a convenient distribution and reweight by .
Simple functions
For : . By linearity, the formula holds for simple functions.
General case via MCT
For , approximate from below by simple functions . By MCT applied to both sides: .
Example: Importance Sampling for Rare Event Estimation
Estimate where using importance sampling with proposal .
Change of measure
where .
Monte Carlo estimator
Sample and compute: Under , about half the samples exceed 5, so the indicator fires frequently — unlike naive Monte Carlo under where and you would need billions of samples.
Why This Matters: From Radon-Nikodym to Radar Detection
In radar and sonar, the received signal is modeled as a continuous-time stochastic process. Testing whether a target is present (signal + noise vs. noise alone) is a hypothesis test between two measures on the space of sample paths. The Radon-Nikodym derivative for Gaussian processes is given by the Cameron-Martin formula: where is the known signal waveform and is the observed process. The sufficient statistic is the correlator output — the continuous-time matched filter from Chapter 15, now justified measure-theoretically.
Importance Sampling in BER Estimation
In communication systems, bit error rates (BER) below are common design targets. Naive Monte Carlo simulation requires samples to estimate a BER of with reasonable confidence. Importance sampling, using the change-of-measure formula, shifts the noise distribution to make errors more likely and reweights by . This can reduce the required sample count by orders of magnitude.
The optimal importance sampling distribution for Gaussian channels shifts the noise mean to the decision boundary — the theoretical justification comes directly from the Radon-Nikodym theorem.
Types of Densities via Radon-Nikodym
| Setting | Dominating measure | Radon-Nikodym derivative |
|---|---|---|
| Continuous RV | Lebesgue measure | |
| Discrete RV | Counting measure | PMF |
| Hypothesis testing | (null hypothesis) | Likelihood ratio |
| Bayesian posterior | Prior | Posterior density |
Common Mistake: Likelihood Ratio Undefined When
Mistake:
Computing and ignoring points where but .
Correction:
If is not absolutely continuous with respect to (i.e., there exist sets where gives zero probability but does not), the Radon-Nikodym derivative does not exist. In hypothesis testing, this means the two hypotheses are "partially distinguishable with certainty" — you can perfectly detect on the set where . The general Lebesgue decomposition handles this case.
Quick Check
The Radon-Nikodym derivative exists when:
and have the same support
(P is absolutely continuous w.r.t. Q)
(P and Q are mutually singular)
Absolute continuity is the necessary and sufficient condition for the Radon-Nikodym derivative to exist.
Radon-Nikodym Derivative
The measurable function satisfying for all measurable . Exists when ; unique -a.e. Generalizes the notion of PDF, PMF, and likelihood ratio.
Absolute Continuity (of Measures)
means every -null set is also -null: . Equivalent to having a density (Radon-Nikodym derivative) with respect to .
Related: Radon-Nikodym Theorem, Singular Measures
Key Takeaway
The Radon-Nikodym theorem unifies PDFs, PMFs, and likelihood ratios under a single concept: the derivative of one measure with respect to another. The likelihood ratio is the central object of hypothesis testing and drives the Neyman-Pearson lemma, Wald's SPRT, and importance sampling. The measure-theoretic viewpoint extends all of these to settings — like continuous-time processes and infinite-dimensional spaces — where classical densities do not exist.