Ferkans — Interactive Telecom Tutor

Densities as Derivatives of Measures

We are used to thinking of a PDF $f_X(x)$ as the function you integrate to get probabilities: $P(a < X \leq b) = \int_a^b f_X(x)\, dx$ . But what is this function, really? It is the rate at which the probability measure $P_X$ accumulates mass compared to the Lebesgue measure $\lambda$ . In other words, the PDF is a derivative of one measure with respect to another: $f_X = dP_X / d\lambda$ .

The Radon-Nikodym theorem makes this precise and vastly generalizes it: any time one measure is "dominated by" another (absolutely continuous), there exists a density function relating them. The payoff for us is immediate: the likelihood ratio $L(y) = f_1(y) / f_0(y)$ used in hypothesis testing is nothing but the Radon-Nikodym derivative $dP_1 / dP_0$ — and this viewpoint extends hypothesis testing to settings where densities do not exist (e.g., testing between stochastic processes).

,

Definition:
Absolute Continuity of Measures

Let $\mu$ and $\nu$ be measures on $(\Omega, \mathcal{F})$ . We say $\nu$ is absolutely continuous with respect to $\mu$ , written $\nu \ll \mu$ , if: $\mu(A) = 0 \implies \nu(A) = 0 \quad \text{for all } A \in \mathcal{F}.$ In words: every $\mu$ -null set is also $\nu$ -null. The measure $\mu$ "dominates" $\nu$ .

If $\nu \ll \mu$ and $\mu \ll \nu$ , the two measures are equivalent ( $\nu \sim \mu$ ): they agree on which sets have measure zero.

Definition:
Singular Measures

Two measures $\mu$ and $\nu$ on $(\Omega, \mathcal{F})$ are mutually singular, written $\mu \perp \nu$ , if there exists $A \in \mathcal{F}$ with $\mu(A) = 0$ and $\nu(A^c) = 0$ . Intuitively, $\mu$ and $\nu$ "live on disjoint sets."

The Cantor distribution is singular with respect to Lebesgue measure: it concentrates all its mass on the Cantor set, which has Lebesgue measure zero. Point masses (discrete distributions) are also singular w.r.t. Lebesgue measure.

Theorem: Lebesgue Decomposition Theorem

Let $\mu$ and $\nu$ be $\sigma$ -finite measures on $(\Omega, \mathcal{F})$ . Then $\nu$ has a unique decomposition: $\nu = \nu_a + \nu_s,$ where $\nu_a \ll \mu$ (absolutely continuous part) and $\nu_s \perp \mu$ (singular part).

Any measure can be split into a part that has a density w.r.t. $\mu$ and a part that lives on a $\mu$ -null set. For a random variable's distribution w.r.t. Lebesgue measure: the absolutely continuous part is the "continuous density" part, and the singular part includes point masses (discrete component) and singular continuous distributions (like the Cantor distribution).

Proof

Construction via Radon-Nikodym

Define $\rho = \mu + \nu$ (a $\sigma$ -finite measure). Both $\mu \ll \rho$ and $\nu \ll \rho$ . By Radon-Nikodym (applied to the $\rho$ -dominated case): $d\mu = f\, d\rho$ and $d\nu = g\, d\rho$ . Set $A = \{f > 0\}$ . Then $\nu_a = \nu|_A$ is absolutely continuous w.r.t. $\mu$ , and $\nu_s = \nu|_{A^c}$ is singular ( $\mu(A^c) = 0$ ). $\blacksquare$

Theorem: Radon-Nikodym Theorem

Let $\mu$ be a $\sigma$ -finite measure on $(\Omega, \mathcal{F})$ and let $\nu$ be a finite measure with $\nu \ll \mu$ . Then there exists a non-negative measurable function $f$ such that: $\nu(A) = \int_A f\, d\mu \quad \text{for all } A \in \mathcal{F}.$ The function $f$ is unique $\mu$ -a.e. and is called the Radon-Nikodym derivative of $\nu$ with respect to $\mu$ , written $f = \frac{d\nu}{d\mu}$ .

The Radon-Nikodym derivative is the "density" of $\nu$ relative to $\mu$ . When $\mu = \lambda$ (Lebesgue measure) and $\nu = P_X$ (distribution of a continuous random variable), $dP_X / d\lambda = f_X$ , the probability density function. The theorem says that this notion of density exists whenever one measure is absolutely continuous with respect to another.

Proof

Von Neumann's proof via Hilbert space

Consider $L^2(\Omega, \mathcal{F}, \mu + \nu)$ . The linear functional $\Lambda(g) = \int g\, d\nu$ is bounded: $|\Lambda(g)| \leq \|g\|_{L^2(\mu + \nu)}$ (by Cauchy-Schwarz, since $\nu$ is finite). By the Riesz representation theorem, there exists $h \in L^2(\mu + \nu)$ with $\int g\, d\nu = \int gh\, d(\mu + \nu)$ for all $g \in L^2$ .

Identify the derivative

Setting $g = \mathbf{1}_A$ : $\nu(A) = \int_A h\, d\mu + \int_A h\, d\nu$ . Rearranging: $\int_A (1 - h)\, d\nu = \int_A h\, d\mu$ . One shows $0 \leq h < 1$ $(\mu + \nu)$ -a.e. (using $\nu \ll \mu$ ). Define $f = h / (1 - h)$ . Then $\nu(A) = \int_A f\, d\mu$ for all $A \in \mathcal{F}$ . $\blacksquare$

,

Historical Note: Radon, Nikodym, and the Density Problem

1913--1930

Johann Radon proved the theorem in 1913 for the special case of $\mathbb{R}^n$ with Lebesgue measure. The full abstract version was established by Otton Nikodym in 1930. The theorem resolved a long-standing question: under what conditions does one measure have a "density" with respect to another? The answer — absolute continuity — is both necessary and sufficient, and the result became a cornerstone of modern analysis, probability, and mathematical statistics.

Definition:
Likelihood Ratio as Radon-Nikodym Derivative

Let $P_0$ and $P_1$ be two probability measures on $(\Omega, \mathcal{F})$ with $P_1 \ll P_0$ . The likelihood ratio is the Radon-Nikodym derivative: $L = \frac{dP_1}{dP_0}.$ When $P_0$ and $P_1$ have densities $f_0, f_1$ with respect to a common dominating measure $\mu$ (e.g., Lebesgue measure), this reduces to the familiar ratio: $L(\omega) = \frac{f_1(\omega)}{f_0(\omega)}.$

The Radon-Nikodym viewpoint is essential when densities do not exist — for example, when testing between two Gaussian processes (the Cameron-Martin-Girsanov theorem) or between discrete and continuous hypotheses.

,

Theorem: Chain Rule for Radon-Nikodym Derivatives

If $\nu \ll \mu \ll \lambda$ , then $\nu \ll \lambda$ and: $\frac{d\nu}{d\lambda} = \frac{d\nu}{d\mu} \cdot \frac{d\mu}{d\lambda} \quad \lambda\text{-a.e.}$

Just like the chain rule for ordinary derivatives: the density of $\nu$ relative to $\lambda$ is the product of densities along the "chain." This is used in statistics when changing the reference measure — for example, computing the likelihood ratio under a composite hypothesis by going through a parametric family.

Proof

Verify the defining property

For any $A \in \mathcal{F}$ : $\nu(A) = \int_A \frac{d\nu}{d\mu}\, d\mu = \int_A \frac{d\nu}{d\mu} \cdot \frac{d\mu}{d\lambda}\, d\lambda,$ where the second equality uses the change-of-measure formula for the integral (if $d\mu = g\, d\lambda$ , then $\int_A h\, d\mu = \int_A hg\, d\lambda$ ). The result $\frac{d\nu}{d\mu} \cdot \frac{d\mu}{d\lambda}$ satisfies the defining property of $\frac{d\nu}{d\lambda}$ , so by uniqueness they are equal a.e. $\blacksquare$

Example: Gaussian Likelihood Ratio as Radon-Nikodym Derivative

Let $P_0 = \mathcal{N}(0, 1)$ and $P_1 = \mathcal{N}(\mu, 1)$ on $(\mathbb{R}, \mathcal{B}(\mathbb{R}))$ . Compute the Radon-Nikodym derivative $dP_1/dP_0$ .

Solution

Both measures are a.c. w.r.t. Lebesgue

$dP_0/d\lambda = \frac{1}{\sqrt{2\pi}} e^{-x^2/2}$ and $dP_1/d\lambda = \frac{1}{\sqrt{2\pi}} e^{-(x-\mu)^2/2}$ . Since both are positive everywhere, $P_0 \sim P_1$ (mutually absolutely continuous).

Apply the chain rule

$\frac{dP_1}{dP_0}(x) = \frac{dP_1/d\lambda}{dP_0/d\lambda}(x) = \frac{e^{-(x-\mu)^2/2}}{e^{-x^2/2}} = e^{\mu x - \mu^2/2}.$ $

Interpretation

This is the familiar likelihood ratio for testing $H_0: \theta = 0$ vs $H_1: \theta = \mu$ with a single Gaussian observation. The Neyman-Pearson lemma says the optimal test compares $L(x) = e^{\mu x - \mu^2/2}$ to a threshold — or equivalently, compares $x$ to a threshold (since $L$ is monotone in $x$ ).

Radon-Nikodym Derivative $dP/dQ$ as Density Ratio

Visualize two probability distributions $P$ and $Q$ (both Gaussian with different parameters) and their Radon-Nikodym derivative $dP/dQ = f_P(x)/f_Q(x)$ . The derivative shows where $P$ places relatively more mass than $Q$ .

Parameters

\mu_P

1

\sigma_P

1

\mu_Q

0

\sigma_Q

1

The Neyman-Pearson Lemma in Radon-Nikodym Language

The Neyman-Pearson lemma (Book FSI, Chapter 2) states that the most powerful test of $H_0: P = P_0$ versus $H_1: P = P_1$ at level $\alpha$ rejects $H_0$ when $L(\omega) > \eta$ , where $L = dP_1/dP_0$ is the likelihood ratio.

In the Radon-Nikodym framework, this is completely general: it works even when $P_0, P_1$ are measures on infinite-dimensional spaces (e.g., the path space of a stochastic process). This is how one formulates detection of signals in continuous-time noise — the Cameron-Martin theorem gives the explicit form of $dP_1/dP_0$ for Gaussian processes.

Theorem: Change of Measure Formula

If $\nu \ll \mu$ with $f = d\nu/d\mu$ , then for any measurable $g \geq 0$ : $\int g\, d\nu = \int g \cdot f\, d\mu.$ In probability: if $\mathbb{E}_Q$ denotes expectation under $Q$ and $L = dP/dQ$ , then $\mathbb{E}_P[g(X)] = \mathbb{E}_Q[g(X) \cdot L(X)].$

To compute an expectation under $P$ , you can instead compute a weighted expectation under $Q$ , where the weight is the likelihood ratio. This is the foundation of importance sampling — a Monte Carlo technique where you sample from a convenient distribution $Q$ and reweight by $dP/dQ$ .

Proof

Simple functions

For $g = \mathbf{1}_A$ : $\int \mathbf{1}_A\, d\nu = \nu(A) = \int_A f\, d\mu = \int \mathbf{1}_A f\, d\mu$ . By linearity, the formula holds for simple functions.

General case via MCT

For $g \geq 0$ , approximate $g$ from below by simple functions $g_n \uparrow g$ . By MCT applied to both sides: $\int g\, d\nu = \lim_n \int g_n\, d\nu = \lim_n \int g_n f\, d\mu = \int gf\, d\mu$ . $\blacksquare$

Example: Importance Sampling for Rare Event Estimation

Estimate $P(X > 5)$ where $X \sim \mathcal{N}(0, 1)$ using importance sampling with proposal $Q = \mathcal{N}(5, 1)$ .

Solution

Change of measure

$P(X > 5) = \mathbb{E}_P[\mathbf{1}_{X > 5}] = \mathbb{E}_Q[\mathbf{1}_{X > 5} \cdot L(X)]$ where $L(x) = \frac{dP}{dQ}(x) = e^{-(x^2/2 - (x-5)^2/2)} = e^{-5x + 25/2}$ .

Monte Carlo estimator

Sample $X_1, \ldots, X_n \sim \mathcal{N}(5, 1)$ and compute: $\hat{p} = \frac{1}{n} \sum_{i=1}^n \mathbf{1}_{X_i > 5} \cdot e^{-5X_i + 25/2}.$ Under $Q = \mathcal{N}(5,1)$ , about half the samples exceed 5, so the indicator fires frequently — unlike naive Monte Carlo under $P$ where $P(X > 5) \approx 2.87 \times 10^{-7}$ and you would need billions of samples.

Why This Matters: From Radon-Nikodym to Radar Detection

In radar and sonar, the received signal is modeled as a continuous-time stochastic process. Testing whether a target is present (signal + noise vs. noise alone) is a hypothesis test between two measures on the space of sample paths. The Radon-Nikodym derivative $dP_1/dP_0$ for Gaussian processes is given by the Cameron-Martin formula: $\frac{dP_1}{dP_0} = \exp\left(\int_0^T s(t)\, dX(t) - \frac{1}{2}\int_0^T s^2(t)\, dt\right),$ where $s(t)$ is the known signal waveform and $X(t)$ is the observed process. The sufficient statistic is the correlator output $\int_0^T s(t)\, dX(t)$ — the continuous-time matched filter from Chapter 15, now justified measure-theoretically.

⚠️Engineering Note

Importance Sampling in BER Estimation

In communication systems, bit error rates (BER) below $10^{-6}$ are common design targets. Naive Monte Carlo simulation requires $\sim 10^8$ samples to estimate a BER of $10^{-6}$ with reasonable confidence. Importance sampling, using the change-of-measure formula, shifts the noise distribution to make errors more likely and reweights by $dP/dQ$ . This can reduce the required sample count by orders of magnitude.

The optimal importance sampling distribution for Gaussian channels shifts the noise mean to the decision boundary — the theoretical justification comes directly from the Radon-Nikodym theorem.

Types of Densities via Radon-Nikodym

Setting	Dominating measure $\mu$	Radon-Nikodym derivative $d\nu/d\mu$
Continuous RV	Lebesgue measure $\lambda$	PDF $f_X(x)$
Discrete RV	Counting measure	PMF $p_X(x)$
Hypothesis testing	$P_0$ (null hypothesis)	Likelihood ratio $L = f_1/f_0$
Bayesian posterior	Prior $\pi$	Posterior density $\propto L \cdot \pi$

,

Common Mistake: Likelihood Ratio Undefined When $P_1 \not\ll P_0$

Mistake:

Computing $L(x) = f_1(x)/f_0(x)$ and ignoring points where $f_0(x) = 0$ but $f_1(x) > 0$ .

Correction:

If $P_1$ is not absolutely continuous with respect to $P_0$ (i.e., there exist sets where $P_0$ gives zero probability but $P_1$ does not), the Radon-Nikodym derivative does not exist. In hypothesis testing, this means the two hypotheses are "partially distinguishable with certainty" — you can perfectly detect $H_1$ on the set where $P_0 = 0$ . The general Lebesgue decomposition $P_1 = (P_1)_a + (P_1)_s$ handles this case.

Quick Check

The Radon-Nikodym derivative $dP/dQ$ exists when:

$P$ and $Q$ have the same support

$P \ll Q$ (P is absolutely continuous w.r.t. Q)

$P \perp Q$ (P and Q are mutually singular)

Correction:

P \ll Q

(P is absolutely continuous w.r.t. Q)

Absolute continuity is the necessary and sufficient condition for the Radon-Nikodym derivative to exist.

Radon-Nikodym Derivative

The measurable function $f = d\nu/d\mu$ satisfying $\nu(A) = \int_A f\, d\mu$ for all measurable $A$ . Exists when $\nu \ll \mu$ ; unique $\mu$ -a.e. Generalizes the notion of PDF, PMF, and likelihood ratio.

Absolute Continuity (of Measures)

$\nu \ll \mu$ means every $\mu$ -null set is also $\nu$ -null: $\mu(A) = 0 \Rightarrow \nu(A) = 0$ . Equivalent to $\nu$ having a density (Radon-Nikodym derivative) with respect to $\mu$ .

Key Takeaway

The Radon-Nikodym theorem unifies PDFs, PMFs, and likelihood ratios under a single concept: the derivative of one measure with respect to another. The likelihood ratio $dP_1/dP_0$ is the central object of hypothesis testing and drives the Neyman-Pearson lemma, Wald's SPRT, and importance sampling. The measure-theoretic viewpoint extends all of these to settings — like continuous-time processes and infinite-dimensional spaces — where classical densities do not exist.

Radon-Nikodym Theorem and Likelihood Ratios

Densities as Derivatives of Measures

Definition: Absolute Continuity of Measures

Definition: Singular Measures

Theorem: Lebesgue Decomposition Theorem

Construction via Radon-Nikodym

Theorem: Radon-Nikodym Theorem

Von Neumann's proof via Hilbert space

Identify the derivative

Historical Note: Radon, Nikodym, and the Density Problem

Definition: Likelihood Ratio as Radon-Nikodym Derivative

Theorem: Chain Rule for Radon-Nikodym Derivatives

Verify the defining property

Example: Gaussian Likelihood Ratio as Radon-Nikodym Derivative

Both measures are a.c. w.r.t. Lebesgue

Apply the chain rule

Interpretation

Radon-Nikodym Derivative dP/dQdP/dQdP/dQ as Density Ratio

Parameters

The Neyman-Pearson Lemma in Radon-Nikodym Language

Theorem: Change of Measure Formula

Simple functions

General case via MCT

Example: Importance Sampling for Rare Event Estimation

Change of measure

Monte Carlo estimator

Why This Matters: From Radon-Nikodym to Radar Detection

Importance Sampling in BER Estimation

Types of Densities via Radon-Nikodym

Common Mistake: Likelihood Ratio Undefined When P1≪̸P0P_1 \not\ll P_0P1​≪P0​

Quick Check

Radon-Nikodym Derivative

Absolute Continuity (of Measures)

Key Takeaway

Definition:
Absolute Continuity of Measures

Definition:
Singular Measures

Definition:
Likelihood Ratio as Radon-Nikodym Derivative

Radon-Nikodym Derivative $dP/dQ$ as Density Ratio

Common Mistake: Likelihood Ratio Undefined When $P_1 \not\ll P_0$