Radon-Nikodym Theorem and Likelihood Ratios

Densities as Derivatives of Measures

We are used to thinking of a PDF fX(x)f_X(x) as the function you integrate to get probabilities: P(a<Xb)=abfX(x)dxP(a < X \leq b) = \int_a^b f_X(x)\, dx. But what is this function, really? It is the rate at which the probability measure PXP_X accumulates mass compared to the Lebesgue measure λ\lambda. In other words, the PDF is a derivative of one measure with respect to another: fX=dPX/dλf_X = dP_X / d\lambda.

The Radon-Nikodym theorem makes this precise and vastly generalizes it: any time one measure is "dominated by" another (absolutely continuous), there exists a density function relating them. The payoff for us is immediate: the likelihood ratio L(y)=f1(y)/f0(y)L(y) = f_1(y) / f_0(y) used in hypothesis testing is nothing but the Radon-Nikodym derivative dP1/dP0dP_1 / dP_0 — and this viewpoint extends hypothesis testing to settings where densities do not exist (e.g., testing between stochastic processes).

,

Definition:

Absolute Continuity of Measures

Let μ\mu and ν\nu be measures on (Ω,F)(\Omega, \mathcal{F}). We say ν\nu is absolutely continuous with respect to μ\mu, written νμ\nu \ll \mu, if: μ(A)=0    ν(A)=0for all AF.\mu(A) = 0 \implies \nu(A) = 0 \quad \text{for all } A \in \mathcal{F}. In words: every μ\mu-null set is also ν\nu-null. The measure μ\mu "dominates" ν\nu.

If νμ\nu \ll \mu and μν\mu \ll \nu, the two measures are equivalent (νμ\nu \sim \mu): they agree on which sets have measure zero.

Definition:

Singular Measures

Two measures μ\mu and ν\nu on (Ω,F)(\Omega, \mathcal{F}) are mutually singular, written μν\mu \perp \nu, if there exists AFA \in \mathcal{F} with μ(A)=0\mu(A) = 0 and ν(Ac)=0\nu(A^c) = 0. Intuitively, μ\mu and ν\nu "live on disjoint sets."

The Cantor distribution is singular with respect to Lebesgue measure: it concentrates all its mass on the Cantor set, which has Lebesgue measure zero. Point masses (discrete distributions) are also singular w.r.t. Lebesgue measure.

Theorem: Lebesgue Decomposition Theorem

Let μ\mu and ν\nu be σ\sigma-finite measures on (Ω,F)(\Omega, \mathcal{F}). Then ν\nu has a unique decomposition: ν=νa+νs,\nu = \nu_a + \nu_s, where νaμ\nu_a \ll \mu (absolutely continuous part) and νsμ\nu_s \perp \mu (singular part).

Any measure can be split into a part that has a density w.r.t. μ\mu and a part that lives on a μ\mu-null set. For a random variable's distribution w.r.t. Lebesgue measure: the absolutely continuous part is the "continuous density" part, and the singular part includes point masses (discrete component) and singular continuous distributions (like the Cantor distribution).

Theorem: Radon-Nikodym Theorem

Let μ\mu be a σ\sigma-finite measure on (Ω,F)(\Omega, \mathcal{F}) and let ν\nu be a finite measure with νμ\nu \ll \mu. Then there exists a non-negative measurable function ff such that: ν(A)=Afdμfor all AF.\nu(A) = \int_A f\, d\mu \quad \text{for all } A \in \mathcal{F}. The function ff is unique μ\mu-a.e. and is called the Radon-Nikodym derivative of ν\nu with respect to μ\mu, written f=dνdμf = \frac{d\nu}{d\mu}.

The Radon-Nikodym derivative is the "density" of ν\nu relative to μ\mu. When μ=λ\mu = \lambda (Lebesgue measure) and ν=PX\nu = P_X (distribution of a continuous random variable), dPX/dλ=fXdP_X / d\lambda = f_X, the probability density function. The theorem says that this notion of density exists whenever one measure is absolutely continuous with respect to another.

,

Historical Note: Radon, Nikodym, and the Density Problem

1913--1930

Johann Radon proved the theorem in 1913 for the special case of Rn\mathbb{R}^n with Lebesgue measure. The full abstract version was established by Otton Nikodym in 1930. The theorem resolved a long-standing question: under what conditions does one measure have a "density" with respect to another? The answer — absolute continuity — is both necessary and sufficient, and the result became a cornerstone of modern analysis, probability, and mathematical statistics.

Definition:

Likelihood Ratio as Radon-Nikodym Derivative

Let P0P_0 and P1P_1 be two probability measures on (Ω,F)(\Omega, \mathcal{F}) with P1P0P_1 \ll P_0. The likelihood ratio is the Radon-Nikodym derivative: L=dP1dP0.L = \frac{dP_1}{dP_0}. When P0P_0 and P1P_1 have densities f0,f1f_0, f_1 with respect to a common dominating measure μ\mu (e.g., Lebesgue measure), this reduces to the familiar ratio: L(ω)=f1(ω)f0(ω).L(\omega) = \frac{f_1(\omega)}{f_0(\omega)}.

The Radon-Nikodym viewpoint is essential when densities do not exist — for example, when testing between two Gaussian processes (the Cameron-Martin-Girsanov theorem) or between discrete and continuous hypotheses.

,

Theorem: Chain Rule for Radon-Nikodym Derivatives

If νμλ\nu \ll \mu \ll \lambda, then νλ\nu \ll \lambda and: dνdλ=dνdμdμdλλ-a.e.\frac{d\nu}{d\lambda} = \frac{d\nu}{d\mu} \cdot \frac{d\mu}{d\lambda} \quad \lambda\text{-a.e.}

Just like the chain rule for ordinary derivatives: the density of ν\nu relative to λ\lambda is the product of densities along the "chain." This is used in statistics when changing the reference measure — for example, computing the likelihood ratio under a composite hypothesis by going through a parametric family.

Example: Gaussian Likelihood Ratio as Radon-Nikodym Derivative

Let P0=N(0,1)P_0 = \mathcal{N}(0, 1) and P1=N(μ,1)P_1 = \mathcal{N}(\mu, 1) on (R,B(R))(\mathbb{R}, \mathcal{B}(\mathbb{R})). Compute the Radon-Nikodym derivative dP1/dP0dP_1/dP_0.

Radon-Nikodym Derivative dP/dQdP/dQ as Density Ratio

Visualize two probability distributions PP and QQ (both Gaussian with different parameters) and their Radon-Nikodym derivative dP/dQ=fP(x)/fQ(x)dP/dQ = f_P(x)/f_Q(x). The derivative shows where PP places relatively more mass than QQ.

Parameters
1
1
0
1

The Neyman-Pearson Lemma in Radon-Nikodym Language

The Neyman-Pearson lemma (Book FSI, Chapter 2) states that the most powerful test of H0:P=P0H_0: P = P_0 versus H1:P=P1H_1: P = P_1 at level α\alpha rejects H0H_0 when L(ω)>ηL(\omega) > \eta, where L=dP1/dP0L = dP_1/dP_0 is the likelihood ratio.

In the Radon-Nikodym framework, this is completely general: it works even when P0,P1P_0, P_1 are measures on infinite-dimensional spaces (e.g., the path space of a stochastic process). This is how one formulates detection of signals in continuous-time noise — the Cameron-Martin theorem gives the explicit form of dP1/dP0dP_1/dP_0 for Gaussian processes.

Theorem: Change of Measure Formula

If νμ\nu \ll \mu with f=dν/dμf = d\nu/d\mu, then for any measurable g0g \geq 0: gdν=gfdμ.\int g\, d\nu = \int g \cdot f\, d\mu. In probability: if EQ\mathbb{E}_Q denotes expectation under QQ and L=dP/dQL = dP/dQ, then EP[g(X)]=EQ[g(X)L(X)].\mathbb{E}_P[g(X)] = \mathbb{E}_Q[g(X) \cdot L(X)].

To compute an expectation under PP, you can instead compute a weighted expectation under QQ, where the weight is the likelihood ratio. This is the foundation of importance sampling — a Monte Carlo technique where you sample from a convenient distribution QQ and reweight by dP/dQdP/dQ.

Example: Importance Sampling for Rare Event Estimation

Estimate P(X>5)P(X > 5) where XN(0,1)X \sim \mathcal{N}(0, 1) using importance sampling with proposal Q=N(5,1)Q = \mathcal{N}(5, 1).

Why This Matters: From Radon-Nikodym to Radar Detection

In radar and sonar, the received signal is modeled as a continuous-time stochastic process. Testing whether a target is present (signal + noise vs. noise alone) is a hypothesis test between two measures on the space of sample paths. The Radon-Nikodym derivative dP1/dP0dP_1/dP_0 for Gaussian processes is given by the Cameron-Martin formula: dP1dP0=exp(0Ts(t)dX(t)120Ts2(t)dt),\frac{dP_1}{dP_0} = \exp\left(\int_0^T s(t)\, dX(t) - \frac{1}{2}\int_0^T s^2(t)\, dt\right), where s(t)s(t) is the known signal waveform and X(t)X(t) is the observed process. The sufficient statistic is the correlator output 0Ts(t)dX(t)\int_0^T s(t)\, dX(t) — the continuous-time matched filter from Chapter 15, now justified measure-theoretically.

⚠️Engineering Note

Importance Sampling in BER Estimation

In communication systems, bit error rates (BER) below 10610^{-6} are common design targets. Naive Monte Carlo simulation requires 108\sim 10^8 samples to estimate a BER of 10610^{-6} with reasonable confidence. Importance sampling, using the change-of-measure formula, shifts the noise distribution to make errors more likely and reweights by dP/dQdP/dQ. This can reduce the required sample count by orders of magnitude.

The optimal importance sampling distribution for Gaussian channels shifts the noise mean to the decision boundary — the theoretical justification comes directly from the Radon-Nikodym theorem.

Types of Densities via Radon-Nikodym

SettingDominating measure μ\muRadon-Nikodym derivative dν/dμd\nu/d\mu
Continuous RVLebesgue measure λ\lambdaPDF fX(x)f_X(x)
Discrete RVCounting measurePMF pX(x)p_X(x)
Hypothesis testingP0P_0 (null hypothesis)Likelihood ratio L=f1/f0L = f_1/f_0
Bayesian posteriorPrior π\piPosterior density Lπ\propto L \cdot \pi
,

Common Mistake: Likelihood Ratio Undefined When P1≪̸P0P_1 \not\ll P_0

Mistake:

Computing L(x)=f1(x)/f0(x)L(x) = f_1(x)/f_0(x) and ignoring points where f0(x)=0f_0(x) = 0 but f1(x)>0f_1(x) > 0.

Correction:

If P1P_1 is not absolutely continuous with respect to P0P_0 (i.e., there exist sets where P0P_0 gives zero probability but P1P_1 does not), the Radon-Nikodym derivative does not exist. In hypothesis testing, this means the two hypotheses are "partially distinguishable with certainty" — you can perfectly detect H1H_1 on the set where P0=0P_0 = 0. The general Lebesgue decomposition P1=(P1)a+(P1)sP_1 = (P_1)_a + (P_1)_s handles this case.

Quick Check

The Radon-Nikodym derivative dP/dQdP/dQ exists when:

PP and QQ have the same support

PQP \ll Q (P is absolutely continuous w.r.t. Q)

PQP \perp Q (P and Q are mutually singular)

Radon-Nikodym Derivative

The measurable function f=dν/dμf = d\nu/d\mu satisfying ν(A)=Afdμ\nu(A) = \int_A f\, d\mu for all measurable AA. Exists when νμ\nu \ll \mu; unique μ\mu-a.e. Generalizes the notion of PDF, PMF, and likelihood ratio.

Related: Absolute Continuity of Measures, Likelihood Ratio

Absolute Continuity (of Measures)

νμ\nu \ll \mu means every μ\mu-null set is also ν\nu-null: μ(A)=0ν(A)=0\mu(A) = 0 \Rightarrow \nu(A) = 0. Equivalent to ν\nu having a density (Radon-Nikodym derivative) with respect to μ\mu.

Related: Radon-Nikodym Theorem, Singular Measures

Key Takeaway

The Radon-Nikodym theorem unifies PDFs, PMFs, and likelihood ratios under a single concept: the derivative of one measure with respect to another. The likelihood ratio dP1/dP0dP_1/dP_0 is the central object of hypothesis testing and drives the Neyman-Pearson lemma, Wald's SPRT, and importance sampling. The measure-theoretic viewpoint extends all of these to settings — like continuous-time processes and infinite-dimensional spaces — where classical densities do not exist.