Ferkans — Interactive Telecom Tutor

Why Another Lower Bound?

We already have the Cramer-Rao lower bound (CRLB), derived in Chapter 18. It gives us $\mathrm{Var}(\hat\theta) \geq 1/I_F(\theta)$ for any unbiased estimator of a deterministic parameter $\theta$ . Excellent --- except for two well-known caveats. First, the CRLB is pointwise: it depends on the true $\theta$ , so to use it as a design target one has to either commit to a worst case or average over $\theta$ somehow. Second, the inequality only binds unbiased estimators --- and in many problems of interest (low SNR, small samples, peaked priors) the best estimator is biased, and the CRLB simply does not apply.

The Van Trees inequality fixes both defects at once. The parameter is modelled as a random variable with a prior density $\pi(\theta)$ . The result is a scalar lower bound on the Bayesian mean-squared error --- the MSE averaged over both the observation and the prior --- that applies to every estimator, biased or unbiased. The bound has the shape $\mathbb{E}_{\theta,y}\!\left[(\hat\theta(y)-\theta)^2\right] \;\geq\; \frac{1}{I_B}, \qquad I_B = \mathbb{E}_\pi[I_F(\theta)] + I_P.$ The Bayesian information $I_B$ decomposes into the average Fisher information $\mathbb{E}_\pi[I_F(\theta)]$ (what the data tells us) plus the prior information $I_P$ (what we already knew before the data arrived). When the prior is flat, $I_P = 0$ and we recover an averaged CRLB; when the prior is peaked, $I_P$ dominates and the bound is tighter.

The proof is a beautiful one-line application of Cauchy-Schwarz, and the result is our go-to lower bound whenever we have a Bayesian model and want an inequality that holds without any regularity assumption on the estimator.

Definition:
Prior Information

Let $\pi(\theta)$ be a prior density on $\theta \in \mathbb{R}$ , assumed differentiable and vanishing at the boundary of its support. The prior information is $I_P \;=\; \mathbb{E}_\pi\!\left[ \left(\frac{\partial}{\partial\theta} \log \pi(\theta)\right)^2 \right] \;=\; \int \frac{\big(\pi'(\theta)\big)^2}{\pi(\theta)}\,d\theta.$ The integrand is the squared score function of the prior, measured in the same units as the Fisher information of the likelihood.

For a Gaussian prior $\pi(\theta) = \mathcal{N}(\mu_0, \sigma_0^2)$ , a direct calculation gives $I_P = 1/\sigma_0^2$ --- exactly the inverse prior variance. This is the same object that enters the posterior precision in conjugate Gaussian updates (Chapter 7).

Definition:
Bayesian Information

For a Bayesian estimation problem with prior $\pi(\theta)$ , likelihood $f(y \mid \theta)$ , and Fisher information $I_F(\theta) = \mathbb{E}\!\left[\left(\partial_\theta \log f(Y\mid\theta)\right)^2 \mid \theta\right]$ , the Bayesian information is $I_B \;=\; \mathbb{E}_\pi[I_F(\theta)] \;+\; I_P.$ This is the quantity that appears in the denominator of the Van Trees bound. It is always non-negative, and strictly positive whenever either the data or the prior carries information about $\theta$ .

Theorem: Van Trees Inequality (Bayesian CRLB)

Let $\theta$ have a differentiable prior $\pi(\theta)$ on $\mathbb{R}$ with $\pi(\theta) \to 0$ at $\pm\infty$ , let $f(y\mid\theta)$ be the likelihood with Fisher information $I_F(\theta)$ , and let $\hat\theta(Y)$ be any (possibly biased, possibly non-smooth) estimator with finite Bayesian MSE. Then $\mathbb{E}_{\theta,Y}\!\left[(\hat\theta(Y) - \theta)^2\right] \;\geq\; \frac{1}{I_B}, \qquad I_B \;=\; \mathbb{E}_\pi[I_F(\theta)] + I_P.$

The right-hand side combines the information in the data ( $\mathbb{E}_\pi[I_F(\theta)]$ , averaged because we no longer condition on a fixed $\theta$ ) with the information already in the prior ( $I_P$ ). With more of either, the bound tightens.

Proof

Setup: the score of the joint density

Let $p(\theta,y) = \pi(\theta)\,f(y\mid\theta)$ be the joint density of $(\theta,Y)$ . Define the joint score $\psi(\theta,y) \;=\; \frac{\partial}{\partial\theta}\log p(\theta,y) \;=\; \frac{\pi'(\theta)}{\pi(\theta)} + \frac{\partial}{\partial\theta} \log f(y\mid\theta).$ The two terms are uncorrelated (the conditional score has zero mean given $\theta$ ), so $\mathbb{E}[\psi(\theta,Y)^2] \;=\; \mathbb{E}_\pi[I_F(\theta)] + I_P \;=\; I_B.$

The key identity via integration by parts

For any estimator $\hat\theta(y)$ with finite second moment, integration by parts in $\theta$ (using $\pi \to 0$ at the boundary) yields $\mathbb{E}\!\left[(\hat\theta(Y)-\theta)\,\psi(\theta,Y)\right] \;=\; -\mathbb{E}\!\left[\frac{\partial}{\partial\theta}(\hat\theta(Y)-\theta)\right] \;=\; 1,$ because $\hat\theta(y)$ does not depend on $\theta$ , so its partial derivative vanishes, leaving $-\mathbb{E}[-1] = 1$ . The boundary term vanishes because $\pi$ does.

Applying Cauchy-Schwarz

Cauchy-Schwarz on the inner product $\mathbb{E}[(\hat\theta-\theta)\,\psi]$ gives $1 \;=\; \left(\mathbb{E}[(\hat\theta(Y)-\theta)\,\psi(\theta,Y)]\right)^2 \;\leq\; \mathbb{E}[(\hat\theta(Y)-\theta)^2]\cdot\mathbb{E}[\psi^2] \;=\; \mathbb{E}[(\hat\theta(Y)-\theta)^2]\cdot I_B.$ Rearranging gives the claim. Equality holds iff $\hat\theta(Y)-\theta$ and $\psi(\theta,Y)$ are proportional P-almost-surely, which generally requires the likelihood and prior to be conjugate Gaussian.

,

Key Takeaway

The Van Trees bound replaces the pointwise $1/I_F(\theta)$ of the CRLB with $1/I_B$ , where $I_B = \mathbb{E}_\pi[I_F(\theta)] + I_P$ . Prior information adds to data information in a literally additive way. This is why a concentrated prior always tightens the achievable MSE, and why the "effective sample size" of a Bayesian experiment equals the classical sample size plus the prior precision.

Example: Van Trees for the Gaussian Location Model

Let $\theta \sim \mathcal{N}(0,\sigma_0^2)$ and observe $n$ i.i.d. samples $Y_i = \theta + W_i$ with $W_i \sim \mathcal{N}(0,\sigma^2)$ . Compute the Van Trees bound and compare with the Bayesian MMSE of the posterior mean.

Solution

Fisher information of the likelihood

For each sample, $I_F^{(1)}(\theta) = 1/\sigma^2$ (constant in $\theta$ ). For $n$ i.i.d. samples, $I_F(\theta) = n/\sigma^2$ , also constant in $\theta$ , so $\mathbb{E}_\pi[I_F(\theta)] = n/\sigma^2$ .

Prior information

The Gaussian prior has $\log \pi(\theta) = -\theta^2/(2\sigma_0^2) + \text{const}$ , so $(\partial_\theta \log \pi)^2 = \theta^2/\sigma_0^4$ and $I_P = \mathbb{E}_\pi[\theta^2]/\sigma_0^4 = 1/\sigma_0^2$ .

Van Trees bound

Adding, $I_B = n/\sigma^2 + 1/\sigma_0^2$ , so the Van Trees bound reads $\mathbb{E}[(\hat\theta-\theta)^2] \;\geq\; \frac{1}{n/\sigma^2 + 1/\sigma_0^2} \;=\; \frac{\sigma^2\,\sigma_0^2} {\sigma^2 + n\sigma_0^2}.$

Comparison with the MMSE

The posterior mean $\hat\theta_{\text{MMSE}} = (\sigma_0^2/ (\sigma_0^2 + \sigma^2/n))\cdot \bar Y$ has Bayesian MSE exactly $\sigma^2\sigma_0^2/(\sigma^2 + n\sigma_0^2)$ --- the bound is achieved with equality. In the Gaussian-Gaussian conjugate case, Van Trees is tight, as promised by the equality condition in Cauchy-Schwarz.

Van Trees Bound vs. Classical CRLB vs. Empirical MSE

Vary the prior width $\sigma_0$ and the sample size $n$ in the Gaussian location model. The classical CRLB ( $\sigma^2/n$ ) ignores the prior; the Van Trees bound shrinks it by a factor $\sigma_0^2/(\sigma_0^2 + \sigma^2/n)$ . Empirical MSE of the MMSE estimator is overlaid from Monte Carlo.

Parameters

\sigma_0

1

Prior standard deviation

\sigma_w

1

Noise standard deviation

n_{\max}

50

Maximum number of samples shown

Vector Extension

For a vector parameter $\boldsymbol\theta \in \mathbb{R}^d$ , the Van Trees inequality generalises to a matrix statement: for any estimator $\hat{\boldsymbol\theta}(Y)$ , $\mathbb{E}\!\left[(\hat{\boldsymbol\theta}-\boldsymbol\theta) (\hat{\boldsymbol\theta}-\boldsymbol\theta)^T\right] \;\succeq\; \mathbf{I}_B^{-1},$ where $\mathbf{I}_B = \mathbb{E}_\pi[\mathbf{J}(\boldsymbol\theta)] + \mathbf{I}_P$ is the sum of the averaged Fisher information matrix and the prior information matrix $\mathbf{I}_P = \mathbb{E}_\pi[\nabla\log\pi\,\nabla\log\pi^T]$ . Taking traces recovers the scalar bound on the total MSE.

Common Mistake: Prior Must Vanish at the Boundary

Mistake:

Applying Van Trees with a uniform prior on $[a,b]$ and concluding that $I_P = 0$ because $\partial_\theta \log \pi = 0$ in the interior.

Correction:

The integration-by-parts step in the proof requires $\pi(\theta) \to 0$ at the boundary of its support. A uniform prior on $[a,b]$ does not satisfy this --- it is discontinuous at $a$ and $b$ --- so the standard Van Trees bound does not apply. A cleaner way to handle a compactly supported prior is to replace it with a smooth approximation (e.g., a truncated Gaussian that bleeds out) and take a limit, or use the constrained-Van-Trees variant of Gill and Levit (1995) that handles compact supports directly.

Historical Note: From Radar Estimation to a Textbook Bound

1960s-1990s

Harry Van Trees introduced the inequality in his 1968 textbook Detection, Estimation, and Modulation Theory, Part I, in the context of estimating a random parameter (e.g., target range, velocity) from a noisy radar waveform. The bound was part of a larger programme to produce performance benchmarks that hold uniformly over all estimators --- a concern that arose naturally in a radar community that was beginning to use maximum-a-posteriori and nonlinear least-squares estimators whose biases could not be bounded by the classical CRLB.

The bound circulated in the radar literature for almost three decades before statisticians picked it up. Gill and Levit's 1995 paper in Bernoulli gave the first fully rigorous treatment, identified the boundary condition on $\pi$ , extended the bound to multiple parameters, and proved asymptotic efficiency results. Today the Van Trees inequality appears as the canonical "Bayesian CRLB" in every serious textbook on parameter estimation.

⚠️Engineering Note

Van Trees for Wireless Positioning

The Van Trees inequality is the workhorse bound in GNSS, UWB, and 5G/6G positioning analysis. The parameter is the user's position $\mathbf{p}\in\mathbb{R}^3$ ; the prior $\pi(\mathbf{p})$ captures coarse information from the network (cell association, a previous fix, a road network constraint); the Fisher information is built from the effective signal bandwidth and the geometry of the anchors. The engineer's dashboard typically reports two bounds side by side: the classical CRLB (showing what a fresh estimate can achieve with no prior) and the Van Trees bound (showing what a tracked estimate can achieve with the previous-fix prior).

Practical Constraints

•
The prior must be smooth and properly normalised --- a Dirac mass destroys the bound
•
When $\mathbf{p}$ is constrained to a road, the prior is effectively 1-D and $I_P$ can be enormous
•
In dense multipath, the Fisher information degrades and the bound is dominated by $I_P$ (the tracker 'coasts' on the prior)

📋 Ref: 3GPP TR 38.857 (NR positioning)

Quick Check

In the Gaussian location model with $\theta \sim \mathcal{N}(0,\sigma_0^2)$ and $n$ i.i.d. observations of variance $\sigma^2$ , which statement about the Van Trees bound $1/I_B$ is TRUE as $\sigma_0 \to \infty$ (flat prior)?

The bound diverges because $I_P \to 0$ .

The bound approaches the classical CRLB $\sigma^2/n$ .

The bound approaches zero because the prior becomes very broad.

The bound becomes undefined because the prior is no longer proper.

Correction:

The bound approaches the classical CRLB

\sigma^2/n

.

$I_B = n/\sigma^2 + 1/\sigma_0^2 \to n/\sigma^2$ , so $1/I_B \to \sigma^2/n$ , the standard CRLB for an unbiased estimator of a deterministic $\theta$ .

Bayesian CRLB

A lower bound on the Bayesian MSE of any estimator, given by the inverse of the Bayesian information $I_B = \mathbb{E}_\pi[I_F(\theta)] + I_P$ . Also known as the Van Trees inequality.

Prior Information

The scalar $I_P = \mathbb{E}_\pi[(\partial_\theta \log \pi(\theta))^2]$ quantifying how peaked the prior is around its support. Plays the same role as Fisher information but for the prior density.

The Van Trees Inequality

Why Another Lower Bound?

Definition: Prior Information

Definition: Bayesian Information

Theorem: Van Trees Inequality (Bayesian CRLB)

Setup: the score of the joint density

The key identity via integration by parts

Applying Cauchy-Schwarz

Key Takeaway

Example: Van Trees for the Gaussian Location Model

Fisher information of the likelihood

Prior information

Van Trees bound

Comparison with the MMSE

Van Trees Bound vs. Classical CRLB vs. Empirical MSE

Parameters

Vector Extension

Common Mistake: Prior Must Vanish at the Boundary

Historical Note: From Radar Estimation to a Textbook Bound

Van Trees for Wireless Positioning

Quick Check

Bayesian CRLB

Prior Information

Definition:
Prior Information

Definition:
Bayesian Information