The Van Trees Inequality
Why Another Lower Bound?
We already have the Cramer-Rao lower bound (CRLB), derived in Chapter 18. It gives us for any unbiased estimator of a deterministic parameter . Excellent --- except for two well-known caveats. First, the CRLB is pointwise: it depends on the true , so to use it as a design target one has to either commit to a worst case or average over somehow. Second, the inequality only binds unbiased estimators --- and in many problems of interest (low SNR, small samples, peaked priors) the best estimator is biased, and the CRLB simply does not apply.
The Van Trees inequality fixes both defects at once. The parameter is modelled as a random variable with a prior density . The result is a scalar lower bound on the Bayesian mean-squared error --- the MSE averaged over both the observation and the prior --- that applies to every estimator, biased or unbiased. The bound has the shape The Bayesian information decomposes into the average Fisher information (what the data tells us) plus the prior information (what we already knew before the data arrived). When the prior is flat, and we recover an averaged CRLB; when the prior is peaked, dominates and the bound is tighter.
The proof is a beautiful one-line application of Cauchy-Schwarz, and the result is our go-to lower bound whenever we have a Bayesian model and want an inequality that holds without any regularity assumption on the estimator.
Definition: Prior Information
Prior Information
Let be a prior density on , assumed differentiable and vanishing at the boundary of its support. The prior information is The integrand is the squared score function of the prior, measured in the same units as the Fisher information of the likelihood.
For a Gaussian prior , a direct calculation gives --- exactly the inverse prior variance. This is the same object that enters the posterior precision in conjugate Gaussian updates (Chapter 7).
Definition: Bayesian Information
Bayesian Information
For a Bayesian estimation problem with prior , likelihood , and Fisher information , the Bayesian information is This is the quantity that appears in the denominator of the Van Trees bound. It is always non-negative, and strictly positive whenever either the data or the prior carries information about .
Theorem: Van Trees Inequality (Bayesian CRLB)
Let have a differentiable prior on with at , let be the likelihood with Fisher information , and let be any (possibly biased, possibly non-smooth) estimator with finite Bayesian MSE. Then
The right-hand side combines the information in the data (, averaged because we no longer condition on a fixed ) with the information already in the prior (). With more of either, the bound tightens.
Setup: the score of the joint density
Let be the joint density of . Define the joint score The two terms are uncorrelated (the conditional score has zero mean given ), so
The key identity via integration by parts
For any estimator with finite second moment, integration by parts in (using at the boundary) yields because does not depend on , so its partial derivative vanishes, leaving . The boundary term vanishes because does.
Applying Cauchy-Schwarz
Cauchy-Schwarz on the inner product gives Rearranging gives the claim. Equality holds iff and are proportional P-almost-surely, which generally requires the likelihood and prior to be conjugate Gaussian.
Key Takeaway
The Van Trees bound replaces the pointwise of the CRLB with , where . Prior information adds to data information in a literally additive way. This is why a concentrated prior always tightens the achievable MSE, and why the "effective sample size" of a Bayesian experiment equals the classical sample size plus the prior precision.
Example: Van Trees for the Gaussian Location Model
Let and observe i.i.d. samples with . Compute the Van Trees bound and compare with the Bayesian MMSE of the posterior mean.
Fisher information of the likelihood
For each sample, (constant in ). For i.i.d. samples, , also constant in , so .
Prior information
The Gaussian prior has , so and .
Van Trees bound
Adding, , so the Van Trees bound reads
Comparison with the MMSE
The posterior mean has Bayesian MSE exactly --- the bound is achieved with equality. In the Gaussian-Gaussian conjugate case, Van Trees is tight, as promised by the equality condition in Cauchy-Schwarz.
Van Trees Bound vs. Classical CRLB vs. Empirical MSE
Vary the prior width and the sample size in the Gaussian location model. The classical CRLB () ignores the prior; the Van Trees bound shrinks it by a factor . Empirical MSE of the MMSE estimator is overlaid from Monte Carlo.
Parameters
Prior standard deviation
Noise standard deviation
Maximum number of samples shown
Vector Extension
For a vector parameter , the Van Trees inequality generalises to a matrix statement: for any estimator , where is the sum of the averaged Fisher information matrix and the prior information matrix . Taking traces recovers the scalar bound on the total MSE.
Common Mistake: Prior Must Vanish at the Boundary
Mistake:
Applying Van Trees with a uniform prior on and concluding that because in the interior.
Correction:
The integration-by-parts step in the proof requires at the boundary of its support. A uniform prior on does not satisfy this --- it is discontinuous at and --- so the standard Van Trees bound does not apply. A cleaner way to handle a compactly supported prior is to replace it with a smooth approximation (e.g., a truncated Gaussian that bleeds out) and take a limit, or use the constrained-Van-Trees variant of Gill and Levit (1995) that handles compact supports directly.
Historical Note: From Radar Estimation to a Textbook Bound
1960s-1990sHarry Van Trees introduced the inequality in his 1968 textbook Detection, Estimation, and Modulation Theory, Part I, in the context of estimating a random parameter (e.g., target range, velocity) from a noisy radar waveform. The bound was part of a larger programme to produce performance benchmarks that hold uniformly over all estimators --- a concern that arose naturally in a radar community that was beginning to use maximum-a-posteriori and nonlinear least-squares estimators whose biases could not be bounded by the classical CRLB.
The bound circulated in the radar literature for almost three decades before statisticians picked it up. Gill and Levit's 1995 paper in Bernoulli gave the first fully rigorous treatment, identified the boundary condition on , extended the bound to multiple parameters, and proved asymptotic efficiency results. Today the Van Trees inequality appears as the canonical "Bayesian CRLB" in every serious textbook on parameter estimation.
Van Trees for Wireless Positioning
The Van Trees inequality is the workhorse bound in GNSS, UWB, and 5G/6G positioning analysis. The parameter is the user's position ; the prior captures coarse information from the network (cell association, a previous fix, a road network constraint); the Fisher information is built from the effective signal bandwidth and the geometry of the anchors. The engineer's dashboard typically reports two bounds side by side: the classical CRLB (showing what a fresh estimate can achieve with no prior) and the Van Trees bound (showing what a tracked estimate can achieve with the previous-fix prior).
- •
The prior must be smooth and properly normalised --- a Dirac mass destroys the bound
- •
When is constrained to a road, the prior is effectively 1-D and can be enormous
- •
In dense multipath, the Fisher information degrades and the bound is dominated by (the tracker 'coasts' on the prior)
Quick Check
In the Gaussian location model with and i.i.d. observations of variance , which statement about the Van Trees bound is TRUE as (flat prior)?
The bound diverges because .
The bound approaches the classical CRLB .
The bound approaches zero because the prior becomes very broad.
The bound becomes undefined because the prior is no longer proper.
, so , the standard CRLB for an unbiased estimator of a deterministic .
Bayesian CRLB
A lower bound on the Bayesian MSE of any estimator, given by the inverse of the Bayesian information . Also known as the Van Trees inequality.
Related: Cramer (1946), Rao (1945), and a Near-Simultaneous Discovery, Fisher Information, Prior Information
Prior Information
The scalar quantifying how peaked the prior is around its support. Plays the same role as Fisher information but for the prior density.
Related: Van Trees Inequality (Bayesian CRLB), Fisher Information