The Van Trees Inequality

Why Another Lower Bound?

We already have the Cramer-Rao lower bound (CRLB), derived in Chapter 18. It gives us Var(θ^)1/IF(θ)\mathrm{Var}(\hat\theta) \geq 1/I_F(\theta) for any unbiased estimator of a deterministic parameter θ\theta. Excellent --- except for two well-known caveats. First, the CRLB is pointwise: it depends on the true θ\theta, so to use it as a design target one has to either commit to a worst case or average over θ\theta somehow. Second, the inequality only binds unbiased estimators --- and in many problems of interest (low SNR, small samples, peaked priors) the best estimator is biased, and the CRLB simply does not apply.

The Van Trees inequality fixes both defects at once. The parameter is modelled as a random variable with a prior density π(θ)\pi(\theta). The result is a scalar lower bound on the Bayesian mean-squared error --- the MSE averaged over both the observation and the prior --- that applies to every estimator, biased or unbiased. The bound has the shape Eθ,y ⁣[(θ^(y)θ)2]    1IB,IB=Eπ[IF(θ)]+IP.\mathbb{E}_{\theta,y}\!\left[(\hat\theta(y)-\theta)^2\right] \;\geq\; \frac{1}{I_B}, \qquad I_B = \mathbb{E}_\pi[I_F(\theta)] + I_P. The Bayesian information IBI_B decomposes into the average Fisher information Eπ[IF(θ)]\mathbb{E}_\pi[I_F(\theta)] (what the data tells us) plus the prior information IPI_P (what we already knew before the data arrived). When the prior is flat, IP=0I_P = 0 and we recover an averaged CRLB; when the prior is peaked, IPI_P dominates and the bound is tighter.

The proof is a beautiful one-line application of Cauchy-Schwarz, and the result is our go-to lower bound whenever we have a Bayesian model and want an inequality that holds without any regularity assumption on the estimator.

Definition:

Prior Information

Let π(θ)\pi(\theta) be a prior density on θR\theta \in \mathbb{R}, assumed differentiable and vanishing at the boundary of its support. The prior information is IP  =  Eπ ⁣[(θlogπ(θ))2]  =  (π(θ))2π(θ)dθ.I_P \;=\; \mathbb{E}_\pi\!\left[ \left(\frac{\partial}{\partial\theta} \log \pi(\theta)\right)^2 \right] \;=\; \int \frac{\big(\pi'(\theta)\big)^2}{\pi(\theta)}\,d\theta. The integrand is the squared score function of the prior, measured in the same units as the Fisher information of the likelihood.

For a Gaussian prior π(θ)=N(μ0,σ02)\pi(\theta) = \mathcal{N}(\mu_0, \sigma_0^2), a direct calculation gives IP=1/σ02I_P = 1/\sigma_0^2 --- exactly the inverse prior variance. This is the same object that enters the posterior precision in conjugate Gaussian updates (Chapter 7).

Definition:

Bayesian Information

For a Bayesian estimation problem with prior π(θ)\pi(\theta), likelihood f(yθ)f(y \mid \theta), and Fisher information IF(θ)=E ⁣[(θlogf(Yθ))2θ]I_F(\theta) = \mathbb{E}\!\left[\left(\partial_\theta \log f(Y\mid\theta)\right)^2 \mid \theta\right], the Bayesian information is IB  =  Eπ[IF(θ)]  +  IP.I_B \;=\; \mathbb{E}_\pi[I_F(\theta)] \;+\; I_P. This is the quantity that appears in the denominator of the Van Trees bound. It is always non-negative, and strictly positive whenever either the data or the prior carries information about θ\theta.

Theorem: Van Trees Inequality (Bayesian CRLB)

Let θ\theta have a differentiable prior π(θ)\pi(\theta) on R\mathbb{R} with π(θ)0\pi(\theta) \to 0 at ±\pm\infty, let f(yθ)f(y\mid\theta) be the likelihood with Fisher information IF(θ)I_F(\theta), and let θ^(Y)\hat\theta(Y) be any (possibly biased, possibly non-smooth) estimator with finite Bayesian MSE. Then Eθ,Y ⁣[(θ^(Y)θ)2]    1IB,IB  =  Eπ[IF(θ)]+IP.\mathbb{E}_{\theta,Y}\!\left[(\hat\theta(Y) - \theta)^2\right] \;\geq\; \frac{1}{I_B}, \qquad I_B \;=\; \mathbb{E}_\pi[I_F(\theta)] + I_P.

The right-hand side combines the information in the data (Eπ[IF(θ)]\mathbb{E}_\pi[I_F(\theta)], averaged because we no longer condition on a fixed θ\theta) with the information already in the prior (IPI_P). With more of either, the bound tightens.

,

Key Takeaway

The Van Trees bound replaces the pointwise 1/IF(θ)1/I_F(\theta) of the CRLB with 1/IB1/I_B, where IB=Eπ[IF(θ)]+IPI_B = \mathbb{E}_\pi[I_F(\theta)] + I_P. Prior information adds to data information in a literally additive way. This is why a concentrated prior always tightens the achievable MSE, and why the "effective sample size" of a Bayesian experiment equals the classical sample size plus the prior precision.

Example: Van Trees for the Gaussian Location Model

Let θN(0,σ02)\theta \sim \mathcal{N}(0,\sigma_0^2) and observe nn i.i.d. samples Yi=θ+WiY_i = \theta + W_i with WiN(0,σ2)W_i \sim \mathcal{N}(0,\sigma^2). Compute the Van Trees bound and compare with the Bayesian MMSE of the posterior mean.

Van Trees Bound vs. Classical CRLB vs. Empirical MSE

Vary the prior width σ0\sigma_0 and the sample size nn in the Gaussian location model. The classical CRLB (σ2/n\sigma^2/n) ignores the prior; the Van Trees bound shrinks it by a factor σ02/(σ02+σ2/n)\sigma_0^2/(\sigma_0^2 + \sigma^2/n). Empirical MSE of the MMSE estimator is overlaid from Monte Carlo.

Parameters
1

Prior standard deviation

1

Noise standard deviation

50

Maximum number of samples shown

Vector Extension

For a vector parameter θRd\boldsymbol\theta \in \mathbb{R}^d, the Van Trees inequality generalises to a matrix statement: for any estimator θ^(Y)\hat{\boldsymbol\theta}(Y), E ⁣[(θ^θ)(θ^θ)T]    IB1,\mathbb{E}\!\left[(\hat{\boldsymbol\theta}-\boldsymbol\theta) (\hat{\boldsymbol\theta}-\boldsymbol\theta)^T\right] \;\succeq\; \mathbf{I}_B^{-1}, where IB=Eπ[J(θ)]+IP\mathbf{I}_B = \mathbb{E}_\pi[\mathbf{J}(\boldsymbol\theta)] + \mathbf{I}_P is the sum of the averaged Fisher information matrix and the prior information matrix IP=Eπ[logπlogπT]\mathbf{I}_P = \mathbb{E}_\pi[\nabla\log\pi\,\nabla\log\pi^T]. Taking traces recovers the scalar bound on the total MSE.

Common Mistake: Prior Must Vanish at the Boundary

Mistake:

Applying Van Trees with a uniform prior on [a,b][a,b] and concluding that IP=0I_P = 0 because θlogπ=0\partial_\theta \log \pi = 0 in the interior.

Correction:

The integration-by-parts step in the proof requires π(θ)0\pi(\theta) \to 0 at the boundary of its support. A uniform prior on [a,b][a,b] does not satisfy this --- it is discontinuous at aa and bb --- so the standard Van Trees bound does not apply. A cleaner way to handle a compactly supported prior is to replace it with a smooth approximation (e.g., a truncated Gaussian that bleeds out) and take a limit, or use the constrained-Van-Trees variant of Gill and Levit (1995) that handles compact supports directly.

Historical Note: From Radar Estimation to a Textbook Bound

1960s-1990s

Harry Van Trees introduced the inequality in his 1968 textbook Detection, Estimation, and Modulation Theory, Part I, in the context of estimating a random parameter (e.g., target range, velocity) from a noisy radar waveform. The bound was part of a larger programme to produce performance benchmarks that hold uniformly over all estimators --- a concern that arose naturally in a radar community that was beginning to use maximum-a-posteriori and nonlinear least-squares estimators whose biases could not be bounded by the classical CRLB.

The bound circulated in the radar literature for almost three decades before statisticians picked it up. Gill and Levit's 1995 paper in Bernoulli gave the first fully rigorous treatment, identified the boundary condition on π\pi, extended the bound to multiple parameters, and proved asymptotic efficiency results. Today the Van Trees inequality appears as the canonical "Bayesian CRLB" in every serious textbook on parameter estimation.

⚠️Engineering Note

Van Trees for Wireless Positioning

The Van Trees inequality is the workhorse bound in GNSS, UWB, and 5G/6G positioning analysis. The parameter is the user's position pR3\mathbf{p}\in\mathbb{R}^3; the prior π(p)\pi(\mathbf{p}) captures coarse information from the network (cell association, a previous fix, a road network constraint); the Fisher information is built from the effective signal bandwidth and the geometry of the anchors. The engineer's dashboard typically reports two bounds side by side: the classical CRLB (showing what a fresh estimate can achieve with no prior) and the Van Trees bound (showing what a tracked estimate can achieve with the previous-fix prior).

Practical Constraints
  • The prior must be smooth and properly normalised --- a Dirac mass destroys the bound

  • When p\mathbf{p} is constrained to a road, the prior is effectively 1-D and IPI_P can be enormous

  • In dense multipath, the Fisher information degrades and the bound is dominated by IPI_P (the tracker 'coasts' on the prior)

📋 Ref: 3GPP TR 38.857 (NR positioning)

Quick Check

In the Gaussian location model with θN(0,σ02)\theta \sim \mathcal{N}(0,\sigma_0^2) and nn i.i.d. observations of variance σ2\sigma^2, which statement about the Van Trees bound 1/IB1/I_B is TRUE as σ0\sigma_0 \to \infty (flat prior)?

The bound diverges because IP0I_P \to 0.

The bound approaches the classical CRLB σ2/n\sigma^2/n.

The bound approaches zero because the prior becomes very broad.

The bound becomes undefined because the prior is no longer proper.

Bayesian CRLB

A lower bound on the Bayesian MSE of any estimator, given by the inverse of the Bayesian information IB=Eπ[IF(θ)]+IPI_B = \mathbb{E}_\pi[I_F(\theta)] + I_P. Also known as the Van Trees inequality.

Related: Cramer (1946), Rao (1945), and a Near-Simultaneous Discovery, Fisher Information, Prior Information

Prior Information

The scalar IP=Eπ[(θlogπ(θ))2]I_P = \mathbb{E}_\pi[(\partial_\theta \log \pi(\theta))^2] quantifying how peaked the prior is around its support. Plays the same role as Fisher information but for the prior density.

Related: Van Trees Inequality (Bayesian CRLB), Fisher Information