Ferkans — Interactive Telecom Tutor

ex-ch24-01

Easy

Compute the prior information $I_P$ for each of the following priors on $\theta \in \mathbb{R}$ : (a) Gaussian $\pi(\theta) = \mathcal{N}(0,\sigma_0^2)$ ; (b) Laplace $\pi(\theta) = \tfrac{1}{2b}\exp(-|\theta|/b)$ ; (c) Cauchy $\pi(\theta) = \tfrac{1}{\pi}\cdot\tfrac{1}{1+\theta^2}$ . For each, check the Van-Trees boundary condition $\pi(\theta)\to 0$ at $\pm\infty$ and comment on any subtlety.

Show Hint

Use $I_P = \int (\pi'(\theta))^2/\pi(\theta)\,d\theta$ where the prime means $\partial_\theta \log \pi$ squared times $\pi$ .

For Laplace, $\partial_\theta \log\pi = -\mathrm{sign}(\theta)/b$ so the score is piecewise constant.

For Cauchy, compute $\partial_\theta \log\pi = -2\theta/(1+\theta^2)$ and integrate.

Solution

Gaussian prior

$\log\pi(\theta) = -\theta^2/(2\sigma_0^2) + \text{const}$ , so $\partial_\theta\log\pi = -\theta/\sigma_0^2$ and $I_P = \mathbb{E}_\pi[\theta^2]/\sigma_0^4 = 1/\sigma_0^2$ . The prior vanishes smoothly at infinity, so Van Trees applies directly.

Laplace prior

$\partial_\theta\log\pi = -\mathrm{sign}(\theta)/b$ , so $(\partial_\theta\log\pi)^2 = 1/b^2$ and $I_P = 1/b^2$ . The density has a kink at $0$ but is continuous and vanishes at $\pm\infty$ , so the integration-by-parts boundary condition holds.

Cauchy prior

$\partial_\theta\log\pi = -2\theta/(1+\theta^2)$ , so $I_P = \int_{-\infty}^{\infty}\frac{4\theta^2}{(1+\theta^2)^2}\cdot \frac{1}{\pi(1+\theta^2)}\,d\theta = \frac{1}{2}.$ The Cauchy vanishes at $\pm\infty$ (polynomially), so Van Trees still applies. Note that $\mathbb{E}_\pi[\theta^2] = \infty$ yet $I_P$ is finite — the prior information does not require finite prior variance.

ex-ch24-02

Easy

In the Gaussian location model with $\theta \sim \mathcal{N}(0,\sigma_0^2)$ and $n$ i.i.d. observations $Y_i = \theta + W_i$ with $W_i \sim \mathcal{N}(0,\sigma_w^2)$ , the Van-Trees bound is $1/I_B = \sigma_w^2\sigma_0^2/(\sigma_w^2 + n\sigma_0^2)$ . Show that the "effective sample size" of the Bayesian experiment equals the classical sample size plus a constant term, and identify that constant.

Show Hint

Write $1/I_B$ in the form $\sigma_w^2/n_{\text{eff}}$ and solve for $n_{\text{eff}}$ .

The prior-precision term is $1/\sigma_0^2$ , which has units of inverse variance — it adds to $n/\sigma_w^2$ in $I_B$ .

Solution

Rewrite the bound

$I_B = n/\sigma_w^2 + 1/\sigma_0^2$ . Factor out $1/\sigma_w^2$ : $I_B = \frac{1}{\sigma_w^2}\left(n + \frac{\sigma_w^2}{\sigma_0^2}\right) \;=\; \frac{n_{\text{eff}}}{\sigma_w^2}, \qquad n_{\text{eff}} = n + \frac{\sigma_w^2}{\sigma_0^2}.$

Interpretation

The additive constant $\sigma_w^2/\sigma_0^2$ is the ratio of the noise variance to the prior variance. When the prior is as sharp as a single noisy measurement ( $\sigma_0^2 = \sigma_w^2$ ), it counts like one extra observation. When the prior is $k$ times sharper, it counts like $k$ extra observations. This is exactly the sense in which "a tight prior is worth $k$ samples."

,

ex-ch24-03

Easy

Consider the translation-invariant time-of-arrival problem with uniform prior on $[0,L]$ and binary-error probability $P_{\min}(h) = Q(h\sqrt{\gamma})$ for a constant $\gamma > 0$ (the effective per-unit-lag SNR). Using the uniform-prior form of the Ziv-Zakai bound, $\text{ZZB} = \int_0^L h\,(L-h)/L \cdot \mathcal{V}\{P_{\min}(h)\}\,dh$ , argue that valley-filling has no effect here, and write the bound as a single integral.

Show Hint

Is $Q(h\sqrt{\gamma})$ monotonic in $h$ ?

If the integrand is already non-increasing, $\mathcal{V}\{\cdot\}$ is the identity.

Solution

Monotonicity of $P_{\min}$

The Q-function $Q(x) = \Pr(Z \geq x)$ for $Z\sim\mathcal{N}(0,1)$ is strictly decreasing in $x$ . Since $h\sqrt{\gamma}$ is increasing in $h$ , $P_{\min}(h)$ is strictly decreasing in $h$ . Therefore $\mathcal{V}\{P_{\min}\} = P_{\min}$ .

Resulting integral

$\text{ZZB} \;=\; \int_0^L \frac{h(L-h)}{L}\,Q\!\left(h\sqrt{\gamma}\right)\,dh.$ $ This matches the Bellini form for a Gaussian-shift problem with no autocorrelation sidelobes — which is expected, since a pure translation model has no ambiguity function structure other than its main lobe.

ex-ch24-04

Easy

Verify the I-MMSE identity for the Gaussian input $X\sim\mathcal{N}(0,1)$ on the channel $Y = \sqrt{\gamma}\,X + N$ , $N\sim\mathcal{N}(0,1)$ . That is, compute $I(X;Y)$ , compute $\text{mmse}(\gamma)$ , and confirm $d I(X;Y)/d\gamma = \tfrac{1}{2}\text{mmse}(\gamma)$ .

Show Hint

Use $I(X;Y) = \tfrac{1}{2}\log(1+\gamma)$ (nats) for Gaussian input.

The MMSE estimator of $X$ given $Y$ for jointly Gaussian variables is linear; its error variance is $1/(1+\gamma)$ .

Solution

Mutual information

For Gaussian input on a Gaussian channel, $I(X;Y) = \tfrac{1}{2}\log(1+\gamma)$ nats.

MMSE

The MMSE estimator is $\hat X = \sqrt{\gamma}Y/(1+\gamma)$ with error variance $\text{mmse}(\gamma) = 1/(1+\gamma)$ .

Verification

$\dfrac{d}{d\gamma}\tfrac{1}{2}\log(1+\gamma) = \tfrac{1}{2}\cdot \tfrac{1}{1+\gamma} = \tfrac{1}{2}\text{mmse}(\gamma)$ . The identity holds exactly.

ex-ch24-05

Easy

For the narrowband ISAC angle-estimation model with a uniform linear array of $N_t$ elements, unit transmit covariance $\mathbf{R}_s = \mathbf{I}/N_t$ , $T$ snapshots, complex reflectivity $|\alpha|^2 = 1$ and per-antenna noise variance $\sigma^2$ , the angle CRB (in radians squared) reads $\mathrm{CRB}(\theta) = \sigma^2/(2|\alpha|^2\,\dot{\mathbf{a}}^H \mathbf{R}_s \dot{\mathbf{a}}\, T\,|\mathbf{a}^H\mathbf{a}|)$ — up to a constant. Using the standard steering-vector derivative for a half-wavelength ULA, show that $\mathrm{CRB}(\theta)\propto 1/N_t^3$ in the large-array limit.

Show Hint

The steering vector is $\mathbf{a}(\theta) = [1,e^{j\pi\sin\theta},\ldots,e^{j\pi(N_t-1)\sin\theta}]^T$ .

Compute $\|\dot{\mathbf{a}}\|^2 = \sum_{k=0}^{N_t-1}(\pi k \cos\theta)^2$ and use $\sum_{k=0}^{N_t-1}k^2 = N_t(N_t-1)(2N_t-1)/6$ .

Solution

Steering-vector derivative

$\dot{\mathbf{a}}(\theta) = j\pi\cos\theta\cdot\mathrm{diag}(0,1,\ldots,N_t-1)\,\mathbf{a}(\theta)$ .

Squared norm

$\|\dot{\mathbf{a}}\|^2 = (\pi\cos\theta)^2 \sum_{k=0}^{N_t-1}k^2 = (\pi\cos\theta)^2\cdot N_t(N_t-1)(2N_t-1)/6 \sim (\pi\cos\theta)^2 N_t^3/3$ for large $N_t$ .

CRB scaling

With $\mathbf{R}_s = \mathbf{I}/N_t$ , $\dot{\mathbf{a}}^H\mathbf{R}_s \dot{\mathbf{a}} = \|\dot{\mathbf{a}}\|^2/N_t \sim (\pi\cos\theta)^2 N_t^2/3$ , and $|\mathbf{a}^H\mathbf{a}| = N_t$ . Plugging in: $\mathrm{CRB}(\theta) \propto \frac{\sigma^2}{|\alpha|^2\,T\, (\pi\cos\theta)^2 N_t^2/3\cdot N_t} \propto \frac{1}{N_t^3}.$ The cubic gain is the classical super-resolution scaling of ULAs.

,

ex-ch24-06

Medium

A uniform prior on $[-a,a]$ violates the boundary condition of the Van Trees inequality. Consider the smoothed approximation $\pi_\epsilon(\theta) \;=\; \frac{1}{Z_\epsilon}\,\exp\!\left( -\frac{(\theta^2-a^2)_+}{\epsilon^2}\right),$ where $(x)_+ = \max(x,0)$ and $Z_\epsilon$ normalises. Compute $I_P$ for this smoothed prior and show it diverges as $\epsilon\to 0$ . Interpret what this means for the Van Trees bound on a compactly-supported prior.

Show Hint

Outside $[-a,a]$ the prior is Gaussian-like with variance $\sim\epsilon^2/(2|\theta|)$ .

The score $\partial_\theta\log\pi_\epsilon$ is nonzero only for $|\theta|>a$ and grows as $2\theta/\epsilon^2$ .

Compute $\mathbb{E}_{\pi_\epsilon}[(\partial_\theta\log\pi_\epsilon)^2]$ and track its $\epsilon$ -scaling.

Solution

Score of the smoothed prior

For $|\theta| > a$ , $\partial_\theta\log\pi_\epsilon = -2\theta/\epsilon^2$ . For $|\theta| \leq a$ , the score is zero. So $I_P = \int_{|\theta|>a} \dfrac{4\theta^2}{\epsilon^4}\,\pi_\epsilon(\theta)\,d\theta$ .

Asymptotic evaluation

Substitute $\theta = a + \epsilon u/\sqrt{2a}$ in the right tail. The tail probability mass is $O(\epsilon)$ (boundary-layer width), and within that layer $\theta^2/\epsilon^4 \sim a^2/\epsilon^4$ , so $I_P \sim C\,a^2\cdot \epsilon/\epsilon^4 = C\,a^2/\epsilon^3 \to \infty$ as $\epsilon\to 0$ .

Interpretation

The prior information diverges as the prior becomes sharper at the boundary, which drives $1/I_B$ to zero and the Van Trees bound collapses. This is the mathematical signature of an ill-posed boundary condition: the smooth Van-Trees bound is not the right tool for hard-support priors, and one must instead use the Gill-Levit constrained version or the Ziv-Zakai bound, which handle bounded supports natively.

ex-ch24-07

Medium

For a vector parameter $\boldsymbol\theta\in\mathbb{R}^d$ with prior $\pi(\boldsymbol\theta)$ and likelihood $p(\mathbf{y}|\boldsymbol\theta)$ , derive the matrix form of the Van Trees inequality: $\mathrm{Cov}_{\theta,Y}(\hat{\boldsymbol\theta} - \boldsymbol\theta) \;\succeq\; \mathbf{I}_B^{-1}, \quad \mathbf{I}_B = \mathbb{E}_\pi[\mathbf{I}_F(\boldsymbol\theta)] + \mathbf{I}_P,$ where $\mathbf{I}_P = \mathbb{E}_\pi[\nabla\log\pi\,(\nabla\log\pi)^T]$ . Outline the Cauchy-Schwarz step that produces the PSD inequality.

Show Hint

Start with the joint score $\boldsymbol\psi = \nabla_\theta\log p(\boldsymbol\theta,\mathbf{y})$ .

Show $\mathbb{E}[(\hat{\boldsymbol\theta}-\boldsymbol\theta)\boldsymbol\psi^T] = \mathbf{I}$ via vector integration by parts.

Apply the matrix Cauchy-Schwarz: for zero-mean vectors $\mathbf{u},\mathbf{v}$ , $\mathbb{E}[\mathbf{u}\mathbf{u}^T]\succeq \mathbb{E}[\mathbf{u}\mathbf{v}^T]\mathbb{E}[\mathbf{v}\mathbf{v}^T]^{-1}\mathbb{E}[\mathbf{v}\mathbf{u}^T]$ .

Solution

Joint score

Let $p(\boldsymbol\theta,\mathbf{y}) = \pi(\boldsymbol\theta)\, p(\mathbf{y}|\boldsymbol\theta)$ . The joint score is $\boldsymbol\psi = \nabla\log\pi + \nabla\log p(\mathbf{y}|\boldsymbol\theta)$ . The two terms are uncorrelated (conditional score has zero mean given $\boldsymbol\theta$ ), so $\mathbb{E}[\boldsymbol\psi\boldsymbol\psi^T] = \mathbf{I}_P + \mathbb{E}_\pi[\mathbf{I}_F(\boldsymbol\theta)] = \mathbf{I}_B$ .

Identity via integration by parts

For each pair of coordinates $(i,j)$ , $\mathbb{E}[(\hat\theta_i - \theta_i)\psi_j] = -\mathbb{E}[\partial_{\theta_j}(\hat\theta_i - \theta_i)] = \delta_{ij}$ , since $\hat\theta_i$ does not depend on $\theta_j$ . The boundary term from integration by parts vanishes by the assumed decay of $\pi$ . Stacking: $\mathbb{E}[(\hat{\boldsymbol\theta}-\boldsymbol\theta)\boldsymbol\psi^T] = \mathbf{I}$ .

Matrix Cauchy-Schwarz

With $\mathbf{u} = \hat{\boldsymbol\theta}-\boldsymbol\theta$ and $\mathbf{v} = \boldsymbol\psi$ , the matrix Cauchy-Schwarz inequality gives $\mathbb{E}[\mathbf{u}\mathbf{u}^T] \succeq \mathbb{E}[\mathbf{u}\mathbf{v}^T]\,(\mathbb{E}[\mathbf{v}\mathbf{v}^T])^{-1}\,\mathbb{E}[\mathbf{v}\mathbf{u}^T] = \mathbf{I}\cdot\mathbf{I}_B^{-1}\cdot\mathbf{I} = \mathbf{I}_B^{-1}.$ Equality holds iff $\mathbf{u}$ and $\mathbf{v}$ are (matrix-) proportional a.s.

,

ex-ch24-08

Medium

Consider TOA estimation with a pulse of RMS bandwidth $B_{\text{rms}}$ and energy $E_s$ , noise PSD $N_0/2$ , and uniform prior on $[0,L]$ . The Q-function argument at small $h$ is $x(h) = 2\pi B_{\text{rms}} h\sqrt{E_s/(2 N_0)}$ . Show that as $E_s/N_0\to\infty$ , the uniform-prior ZZB $\int_0^L h(L-h)/L\cdot Q(x(h))\,dh$ converges to the CRLB $N_0/(8\pi^2 B_{\text{rms}}^2 E_s)$ .

Show Hint

At high SNR, $Q(x)$ is sharply peaked at small $x$ , so the $(L-h)/L$ factor is $\approx 1$ and the integration range extends to $\infty$ .

Change variables $u = x(h)$ and use $\int_0^\infty u\,Q(u)\,du = 1/4$ .

Solution

High-SNR simplification

At high SNR, $Q(x(h))$ is supported on $h\sim 1/(B_{\text{rms}}\sqrt{E_s/N_0}) \ll L$ , so $(L-h)/L\approx 1$ and we can extend the upper limit to $\infty$ : $\text{ZZB}\approx \int_0^\infty h\,Q(2\pi B_{\text{rms}} h\sqrt{E_s/(2N_0)})\,dh$ .

Change of variables

Let $u = 2\pi B_{\text{rms}} h\sqrt{E_s/(2N_0)}$ , so $h = u/(2\pi B_{\text{rms}})\cdot\sqrt{2N_0/E_s}$ and $dh = du/(2\pi B_{\text{rms}})\cdot\sqrt{2N_0/E_s}$ . Hence $\text{ZZB}\approx \frac{2N_0}{(2\pi B_{\text{rms}})^2 E_s} \int_0^\infty u\,Q(u)\,du.$

Evaluating the integral

Integration by parts gives $\int_0^\infty u\,Q(u)\,du = 1/4$ . Therefore $\text{ZZB}\approx \frac{2N_0}{(2\pi B_{\text{rms}})^2 E_s}\cdot\frac{1}{4} = \frac{N_0}{8\pi^2 B_{\text{rms}}^2 E_s},$ which is exactly the CRLB for TOA estimation. The ZZB reduces to the CRLB in the high-SNR regime, as required.

,

ex-ch24-09

Medium

Sketch the MMSE curve $\text{mmse}(\gamma)$ for BPSK input ( $X\in\{\pm 1\}$ equiprobable) and derive its limits at $\gamma\to 0$ and $\gamma\to\infty$ . Using I-MMSE, express the BPSK capacity $I(X;Y) = \ln 2 - \mathbb{E}[\ln\cosh(\gamma + \sqrt{\gamma}N)]$ as an integral of the MMSE.

Show Hint

The MMSE at $\gamma=0$ equals $\mathrm{Var}(X)$ ; at $\gamma=\infty$ it equals $0$ .

The posterior mean is $\mathbb{E}[X|Y] = \tanh(\sqrt{\gamma}Y)$ ; find $1 - \mathbb{E}[\tanh^2(\sqrt{\gamma}Y)]$ at the limits.

Solution

Low-SNR limit

$\text{mmse}(0) = \mathrm{Var}(X) = 1$ . Near $\gamma=0$ , expansion gives $\text{mmse}(\gamma)\approx 1 - \gamma + O(\gamma^2)$ .

High-SNR limit

As $\gamma\to\infty$ , $\tanh(\sqrt{\gamma}Y)\to\mathrm{sign}(Y)=X$ almost surely, so $\text{mmse}(\gamma)\to 0$ exponentially fast. Specifically $\text{mmse}(\gamma) \approx 4 e^{-\gamma}\cdot(1+o(1))$ as $\gamma\to\infty$ by the tail of $\tanh$ .

Capacity via I-MMSE

By the I-MMSE identity, $I_{\text{BPSK}}(\gamma) = \tfrac{1}{2}\int_0^\gamma \text{mmse}(s)\,ds$ . Numerically integrating the sigmoidal MMSE curve produces the BPSK capacity, saturating at $\ln 2$ nats. The sharp sigmoid transition in $\text{mmse}$ at $\gamma\sim 1$ corresponds to the steep rise of $I$ from zero to its saturation.

ex-ch24-10

Medium

A sparse Bernoulli-Gaussian input has $X = B\cdot G$ with $B\sim\mathrm{Bernoulli}(p)$ and $G\sim\mathcal{N}(0,1/p)$ independent (so $\mathbb{E}[X^2]=1$ ). Compute $\text{mmse}(0)$ . Argue heuristically why $\text{mmse}(\gamma)$ has an L-shape: a long plateau at low SNR followed by a sharp drop.

Show Hint

At $\gamma=0$ the MMSE equals the variance of $X$ — use the scaling chosen.

The detector must first "detect" whether $B=1$ before estimating $G$ ; this is a phase transition in $\gamma$ .

Solution

MMSE at zero SNR

$\text{mmse}(0) = \mathrm{Var}(X) = \mathbb{E}[X^2] - \mathbb{E}[X]^2 = 1 - 0 = 1$ .

L-shape heuristic

When $\gamma$ is small, the posterior $P(B=1|Y)$ is flat across noise realisations and the estimator essentially guesses $\hat X \approx 0$ , giving MMSE near $\mathrm{Var}(X) = 1$ . Only when $\gamma\gtrsim 1/p$ does the likelihood ratio for $B$ exceed the noise floor — at that point, $B$ becomes reliably detectable and the posterior concentrates on the correct support. The MMSE then drops rapidly as the conditional Gaussian estimate of $G$ takes over. The plateau width scales like $1/p$ and the drop steepness like $p$ , giving the characteristic L-shape of sparse inputs.

Connection to compressed sensing

This L-shape is the single-letter version of the phase transition observed in compressed-sensing reconstruction: at measurement density below a critical threshold, the MSE is stuck at the prior variance; above threshold, it drops to a denoising-limited floor. The I-MMSE integral converts this shape into a rate-versus-SNR curve with a matching phase transition.

ex-ch24-11

Medium

A monostatic ISAC transmitter with $N_t$ antennas transmits a waveform with sample covariance $\mathbf{R}_s\succeq\mathbf{0}$ , $\mathrm{tr}(\mathbf{R}_s) = P$ . For a single target at angle $\theta_0$ , the angular Fisher information scales like $I_F(\theta_0)\propto \dot{\mathbf{a}}^H(\theta_0)\mathbf{R}_s\dot{\mathbf{a}}(\theta_0)$ . For a single-user communication channel $\mathbf{h}$ , the rate is $R = \log(1+\mathbf{h}^H\mathbf{R}_s\mathbf{h}/\sigma_c^2)$ . Pose the rate-CRB Pareto optimisation as a Lagrangian in $\mathbf{R}_s$ . Identify the rank-1 extremes.

Show Hint

Maximise $(1-\lambda) R(\mathbf{R}_s) + \lambda\,\dot{\mathbf{a}}^H\mathbf{R}_s\dot{\mathbf{a}}$ subject to the power constraint.

At $\lambda=0$ , this is pure waterfilling toward $\mathbf{h}$ — rank 1 along $\mathbf{h}/\|\mathbf{h}\|$ .

At $\lambda=1$ , it aligns along $\dot{\mathbf{a}}(\theta_0)/\|\dot{\mathbf{a}}\|$ — rank 1 along the angle-gradient.

Solution

Lagrangian

Maximise $J(\mathbf{R}_s) = (1-\lambda)\log(1+\mathbf{h}^H\mathbf{R}_s\mathbf{h}/\sigma_c^2) + \lambda\cdot\dot{\mathbf{a}}^H\mathbf{R}_s\dot{\mathbf{a}}$ over $\mathbf{R}_s\succeq\mathbf{0},\;\mathrm{tr}(\mathbf{R}_s)\leq P$ .

Communication extreme

At $\lambda=0$ , $J$ depends only on $\mathbf{h}^H\mathbf{R}_s\mathbf{h}$ . By the rank-1 optimality of power-constrained quadratic forms, $\mathbf{R}_s^\star = P\,\mathbf{h}\mathbf{h}^H/\|\mathbf{h}\|^2$ . The transmitter puts all its power in the user's direction.

Sensing extreme

At $\lambda=1$ , only $\dot{\mathbf{a}}^H\mathbf{R}_s\dot{\mathbf{a}}$ matters. Again rank-1 is optimal: $\mathbf{R}_s^\star = P\,\dot{\mathbf{a}}(\theta_0)\dot{\mathbf{a}}^H(\theta_0)/\|\dot{\mathbf{a}}\|^2$ . Power is steered along the angle-gradient direction, which is orthogonal to the main-lobe direction $\mathbf{a}(\theta_0)$ and maximises the slope of the beampattern (optimal for angular discrimination).

Interior of the tradeoff

For $\lambda\in(0,1)$ , the optimal $\mathbf{R}_s^\star$ has rank up to $2$ (a mixture of the two rank-1 solutions). Scaling $\lambda$ from $0$ to $1$ traces out the Pareto boundary, morphing smoothly from the communication beam toward the sensing beam.

,

ex-ch24-12

Medium

Show that for a scalar Gaussian channel $Y=\sqrt{\gamma}X+N$ with $X$ having variance $\sigma_X^2$ (arbitrary distribution), $\text{mmse}(\gamma)\leq \sigma_X^2/(1+\gamma\sigma_X^2)$ — the MMSE of any input is at most the Gaussian MMSE at the same power. Conclude that $I(X;Y)\leq \tfrac{1}{2}\log(1+\gamma\sigma_X^2)$ and identify this as the Gaussian-input upper bound.

Show Hint

The LMMSE estimator of $X$ from $Y$ gives an upper bound on the conditional-mean MMSE, since the LMMSE is suboptimal in general.

Integrate the MMSE bound over $\gamma$ and use I-MMSE to get an upper bound on $I$ .

Solution

LMMSE as upper bound

The linear MMSE estimator achieves error variance $\sigma_X^2/(1+\gamma\sigma_X^2)$ for any input with variance $\sigma_X^2$ . The posterior-mean (MMSE) estimator is optimal, so its error variance is at most the LMMSE error variance: $\text{mmse}(\gamma) \leq \sigma_X^2/(1+\gamma\sigma_X^2)$ .

Integrate via I-MMSE

$I(X;Y) = \tfrac{1}{2}\int_0^\gamma \text{mmse}(s)\,ds \leq \tfrac{1}{2}\int_0^\gamma \frac{\sigma_X^2}{1+s\sigma_X^2}\,ds = \tfrac{1}{2}\log(1+\gamma\sigma_X^2)$ .

Gaussian-input is the maximiser

Equality holds iff $\text{mmse}(s) = \sigma_X^2/(1+s\sigma_X^2)$ for all $s\in[0,\gamma]$ , which by the uniqueness of the LMMSE implies $X$ is jointly Gaussian with $Y$ , hence $X$ itself is Gaussian. This is the I-MMSE proof of the entropy-power inequality statement that Gaussian inputs maximise mutual information at a power constraint — a slick alternative to the standard Shannon-Gelfand-Yaglom derivation.

,

ex-ch24-13

Medium

In a single-target ISAC setting, the $3\times 3$ FIM for $(\theta,\tau,\nu)$ has a specific block structure: diagonal entries scale with the squared RMS bandwidth, squared array aperture, and squared coherent duration respectively, while cross-terms arise from waveform time-frequency-space coupling. Describe qualitatively (one sentence each) how each of the following waveform choices affects the three CRBs and their cross-terms: (a) narrowband single-tone (no bandwidth), (b) random OFDM symbol (spread bandwidth, random symbols), (c) up-chirp LFM (deterministic time-frequency coupling).

Show Hint

Delay precision needs bandwidth; Doppler needs duration; angle needs array aperture.

Cross-terms reflect how the waveform structure couples two or more parameters.

Solution

Narrowband single-tone

No bandwidth means zero delay information: the delay diagonal of the FIM is zero (infinite delay CRB) and delay-angle / delay-Doppler cross-terms vanish. Angle and Doppler precision remain fine. This is the classic "Doppler-only" radar mode.

Random OFDM symbol

Spread bandwidth gives good delay precision; random symbols decorrelate the waveform across subcarriers, killing cross-terms (the FIM becomes nearly diagonal). This is the ideal for joint estimation: each parameter is decoupled from the others and the CRBs on all three are small.

Up-chirp LFM

Good delay and Doppler precision (wideband, long-duration), but the linear time-frequency coupling produces a strong delay-Doppler cross-term: a shift in $\tau$ produces the same first-order change in the template as a shift in $\nu$ , so the two are ambiguous. Marginal CRBs look fine; joint CRBs (on linear combinations of $\tau$ and $\nu$ ) are excellent only along one diagonal of the $(\tau,\nu)$ plane and poor along the other. This is the classic LFM range-Doppler coupling.

,

ex-ch24-14

Hard

Consider the phase estimation problem $Y = \cos(\theta) + W$ , $W\sim\mathcal{N}(0,\sigma^2)$ , with $\theta\sim\mathrm{Uniform}[-\pi,\pi]$ (wrapped — treat the support as the circle). The CRLB at $\theta_0$ is $\sigma^2/\sin^2(\theta_0)$ , which diverges at $\theta_0 = 0,\pi$ . Use the Ziv-Zakai bound (in its translation form on the circle) to produce a finite bound that does not blow up at these points.

Show Hint

The binary error is $P_{\min}(h) = Q(|\cos(\theta_0) - \cos(\theta_0+h)|/(2\sigma))$ — but on the circle, the relevant distance is $h$ itself.

Marginalise over the uniform circular prior; the $\min\{\pi(\theta),\pi(\theta+h)\}$ factor is simply $1/(2\pi)$ for $h\in[0,2\pi]$ .

The valley-filled binary error is bounded by $1/2$ , so the ZZB is capped at the circular variance $\pi^2/3$ .

Solution

Binary error on the circle

For a uniform $\theta$ and observation $Y = \cos(\theta)+W$ , the expected binary error between $\theta$ and $\theta+h$ under the Bayes detector is $P_{\min}(h) = \mathbb{E}_\theta\bigl[Q(|\cos(\theta)-\cos(\theta+h)|/(2\sigma))\bigr]$ . After averaging over uniform $\theta\in[0,2\pi]$ , this is a deterministic function of $h$ .

ZZB integrand

With the uniform-prior form, $\text{ZZB} = \int_0^\pi h\cdot\frac{\pi-h}{\pi}\cdot\mathcal{V}\{P_{\min}(h)\}\,dh$ (using the half-interval because of $\cos$ -symmetry). Valley-filling makes the integrand non-increasing; at small $h$ where $\cos$ is flat (near $h=0,\pi$ ) the raw $P_{\min}$ is near $1/2$ , and valley-filling propagates this upward.

Bound is finite

Because $P_{\min}(h)\leq 1/2$ for all $h$ , the ZZB is bounded by $\int_0^\pi h(\pi-h)/(2\pi)\,dh = \pi^2/12\cdot 2 = \pi^2/6$ . The CRLB singularity at $\theta_0=0,\pi$ reflects that a pointwise estimator is hopeless near these angles (a tiny jitter wraps around), but the averaged ZZB is bounded because the prior averages over the bad regions. This is precisely why ZZB is the standard bound for circular / phase / angle estimation with singular CRLBs.

Engineering takeaway

Any estimator of a circular parameter that quotes only the CRLB is misleading at the singular points. Radar and positioning engineers dealing with angle-of-arrival near endfire ( $\theta\to\pm 90^\circ$ ) always use ZZB for the honest accuracy bound.

,

ex-ch24-15

Hard

Using I-MMSE, prove the MMSE monotonicity property: for the Gaussian channel $Y = \sqrt{\gamma}X + N$ , $\text{mmse}(\gamma)$ is a strictly decreasing, convex function of $\gamma$ for any non-degenerate input $X$ .

Show Hint

Write $\text{mmse}(\gamma) = 2\,dI(X;Y)/d\gamma$ .

By the data-processing inequality, $I(X;Y)$ is non-decreasing in $\gamma$ ; this gives monotonicity of $\text{mmse}$ indirectly. But to show strict decrease and convexity, examine $d^2 I/d\gamma^2$ directly using Stein-type identities.

Guo-Shamai-Verdu show $d\,\text{mmse}/d\gamma = -2\,\mathbb{E}[\mathrm{Var}(X|Y)^2] \leq 0$ with strict inequality unless $X|Y$ is degenerate.

Solution

Monotonicity

By I-MMSE, $\text{mmse}(\gamma) = 2\,dI/d\gamma$ . The mutual information $I(X;Y_\gamma)$ is non-decreasing in $\gamma$ (more SNR cannot remove information), and strictly increasing whenever $X$ is non-degenerate. Hence $dI/d\gamma > 0$ and $\text{mmse}(\gamma) > 0$ for all $\gamma$ . This is consistent but does not directly show monotonicity of $\text{mmse}$ .

Derivative identity

Guo, Shamai and Verdu (2005, Theorem 2) prove the higher-order identity $\frac{d}{d\gamma}\text{mmse}(\gamma) = -2\,\mathbb{E}_Y\bigl[\mathrm{Var}(X|Y)^2\bigr].$ The right-hand side is non-positive, and zero iff $\mathrm{Var}(X|Y)=0$ almost surely, which requires $X$ to be a deterministic function of $Y$ — impossible for non-degenerate $X$ and finite $\gamma$ . Hence $d\text{mmse}/d\gamma < 0$ : $\text{mmse}$ is strictly decreasing.

Convexity

A similar computation (Guo-Shamai-Verdu, Theorem 3) gives $\frac{d^2}{d\gamma^2}\text{mmse}(\gamma) = 4\,\mathbb{E}_Y\bigl[\mathrm{Var}(X|Y)^3 + 3\,\mathrm{Var}(X|Y)\cdot(\mathrm{third\ moment\ term})\bigr]\geq 0,$ with the bracket non-negative. Therefore $\text{mmse}(\gamma)$ is convex in $\gamma$ .

Geometric interpretation

$\text{mmse}(\gamma)$ is the slope of $2I(X;Y_\gamma)$ . Convexity of $\text{mmse}$ says that $I$ is concave and strictly so — consistent with the well-known concavity of mutual information in SNR for any fixed input distribution. I-MMSE gives a clean, estimation-theoretic proof of a long-standing information-theoretic property.

,

ex-ch24-16

Hard

Consider joint TOA-AOA estimation with a wideband waveform over a ULA. The parameter is $(\tau,\theta)$ with diagonal Fisher information $I_\tau = 8\pi^2 B_{\text{rms}}^2 E_s/N_0$ and $I_\theta = (|\alpha|^2/\sigma^2) N_t^3 (\pi\cos\theta_0)^2/3$ . Assuming a uniform prior on $[\tau_0-\Delta_\tau,\tau_0+\Delta_\tau] \times [\theta_0-\Delta_\theta,\theta_0+\Delta_\theta]$ , derive the vector Van Trees bound $\mathrm{tr}(\mathbf{I}_B^{-1})$ and compare to the trace of the inverse Fisher. Discuss when the prior term dominates.

Show Hint

Compute $\mathbf{I}_P$ for a box-like smoothed prior (or treat it as the sum of two scalar prior informations).

For a Gaussian-smoothed box of half-width $\Delta$ , $I_P\approx 1/\Delta^2$ when the box is tight enough.

Compare $\mathrm{tr}(\mathbf{I}_F^{-1}) = 1/I_\tau + 1/I_\theta$ with $\mathrm{tr}(\mathbf{I}_B^{-1}) = 1/(I_\tau + 1/\Delta_\tau^2) + 1/(I_\theta + 1/\Delta_\theta^2)$ .

Solution

Assembling the Bayesian information

With a separable prior on $(\tau,\theta)$ approximated by independent Gaussians of variance $\Delta_\tau^2,\Delta_\theta^2$ , $\mathbf{I}_P = \mathrm{diag}(1/\Delta_\tau^2, 1/\Delta_\theta^2)$ . The data-averaged Fisher information is $\mathbf{I}_F = \mathrm{diag}(I_\tau, I_\theta)$ (assuming negligible cross-terms for a wideband uncorrelated waveform). Thus $\mathbf{I}_B = \mathrm{diag}(I_\tau + 1/\Delta_\tau^2, I_\theta + 1/\Delta_\theta^2).$

Trace of the inverse

$\mathrm{tr}(\mathbf{I}_B^{-1}) = \frac{1}{I_\tau + 1/\Delta_\tau^2} + \frac{1}{I_\theta + 1/\Delta_\theta^2}.$ $This is always below the classical CRLB trace$ \mathrm{tr}(\mathbf{I}F^{-1}) = 1/I\tau + 1/I_\theta$, with the gap shrinking as priors widen.

Prior-dominated regime

The prior dominates on coordinate $i$ when $1/\Delta_i^2 \gtrsim I_{F,i}$ , i.e., when the prior precision exceeds the data Fisher information. For a tight delay prior of half-width $\Delta_\tau = 1/(10 B_{\text{rms}})$ (say, a previous ranging fix accurate to one tenth of the pulse width), the delay Van-Trees bound is dominated by the prior whenever $I_\tau \lesssim 100 B_{\text{rms}}^2$ , i.e., at low SNR. This is the coasting regime in which the tracker is "believing the prior more than the data."

Engineering interpretation

For 5G NR positioning with large bandwidth and small array, the delay dimension enters the prior-dominated regime long before the angle dimension. This explains why network-based tracking (with history-informed priors) beats single-shot CRLB predictions by large factors at low SINR.

,

ex-ch24-17

Hard

A deterministic waveform with sample covariance $\mathbf{R}_s = \mathbf{h}\mathbf{h}^H\cdot P/\|\mathbf{h}\|^2$ (communication-optimal, rank-1) gives a rate $R_{\max} = \log(1+P\|\mathbf{h}\|^2/\sigma_c^2)$ but zero sensing information along $\dot{\mathbf{a}}(\theta_0)$ whenever $\mathbf{h}\perp \dot{\mathbf{a}}(\theta_0)$ . Propose a rank-2 transmit covariance that achieves (i) the same rate $R_{\max}$ and (ii) non-zero angular Fisher information. What is the price?

Show Hint

To preserve the rate, the projection of $\mathbf{R}_s$ onto $\mathbf{h}$ must still carry full power along $\mathbf{h}$ .

A rank-2 $\mathbf{R}_s = \beta\mathbf{h}\mathbf{h}^H/\|\mathbf{h}\|^2 + (1-\beta)\dot{\mathbf{a}}\dot{\mathbf{a}}^H/\|\dot{\mathbf{a}}\|^2$ with $\beta<1$ trades rate for sensing.

To achieve the same rate, one needs a rank-2 $\mathbf{R}_s$ that preserves $\mathbf{h}^H\mathbf{R}_s\mathbf{h}=P\|\mathbf{h}\|^2$ — impossible in general if $\mathrm{tr}(\mathbf{R}_s)=P$ .

Solution

The constraint analysis

Preserving rate requires $\mathbf{h}^H\mathbf{R}_s\mathbf{h}/\|\mathbf{h}\|^2 = P$ . Power constraint requires $\mathrm{tr}(\mathbf{R}_s) = P$ . Cauchy-Schwarz gives $\mathbf{h}^H\mathbf{R}_s\mathbf{h}\leq \|\mathbf{h}\|^2\mathrm{tr}(\mathbf{R}_s)$ , with equality iff $\mathbf{R}_s$ is rank-1 aligned with $\mathbf{h}$ . So achieving $R_{\max}$ forces rank-1 along $\mathbf{h}$ — one cannot get extra sensing for free.

The rank-2 tradeoff

One must accept a rate loss. Let $\mathbf{R}_s = \beta P \mathbf{h}\mathbf{h}^H/\|\mathbf{h}\|^2 + (1-\beta) P \dot{\mathbf{a}}\dot{\mathbf{a}}^H/\|\dot{\mathbf{a}}\|^2$ with $\beta\in[0,1]$ . Then (assuming orthogonality) $R(\beta) = \log(1 + \beta P\|\mathbf{h}\|^2/\sigma_c^2) < R_{\max}$ and angular Fisher is proportional to $(1-\beta) P\|\dot{\mathbf{a}}\|^2$ .

The price

A fractional rate loss of $\Delta R = \log(1+P\|\mathbf{h}\|^2/\sigma_c^2) - \log(1+\beta P\|\mathbf{h}\|^2/\sigma_c^2)$ buys angular Fisher proportional to $(1-\beta)$ . For small sensing allocations ( $\beta$ near 1), $\Delta R \approx (1-\beta)\cdot \rho/(1+\rho)\log e$ with $\rho = P\|\mathbf{h}\|^2/\sigma_c^2$ . At high SNR, even a small $(1-\beta)$ buys significant sensing while costing little rate — the standard "free lunch at the Pareto frontier" result.

Connection to ISAC design

Commercial ISAC systems typically set $\beta\in[0.7,0.9]$ in high-SNR regimes: keep most power in the data direction, allocate 10-30% to the sensing direction, and accept a ~0.5 bit/symbol rate loss for non-degenerate target estimation.

,

ex-ch24-18

Hard

Consider a toy ISAC capacity-distortion problem in the style of Kobayashi-Caire-Kramer: a memoryless DMC with state $S$ , channel $p(y|x,s)$ , and the transmitter observing causal feedback to estimate $S$ . The achievable rate-distortion region is $\mathcal{C}_D = \bigcup_{p(x)}\{R\leq I(X;Y|S),\;D\geq\mathbb{E}[d(S,\hat S(X,Y))]\}$ . Explain why the Pareto tradeoff between rate and distortion in this joint formulation can differ from a naive CRB-rate tradeoff, and give an intuition for when the two coincide.

Show Hint

In the joint formulation, $X$ is optimised for both communication and estimation; in the CRB-rate tradeoff, only the marginal distribution of $X$ matters for sensing.

The joint bound is tight (I(X;Y|S) exploits knowing S); the CRB is a local, pointwise bound that ignores distribution-level structure.

They coincide when the distortion is squared-error, the channel is Gaussian, and the prior on S is matched to the CRB regime.

Solution

Information-theoretic vs. estimation-theoretic view

The Kobayashi-Caire-Kramer formulation maximises the joint information-distortion tradeoff: the input distribution $p(x)$ controls both the rate $I(X;Y|S)$ and the achievable distortion $D = \min_{\hat S}\mathbb{E}[d(S,\hat S(X,Y))]$ . Both depend on the full distribution of $X$ , not just its second moment. In contrast, the Fisher-information-based rate-CRB tradeoff depends on the input only through its sample covariance $\mathbf{R}_s$ .

Where they differ

Any non-Gaussian input achieves the same CRB as a Gaussian with the same covariance (CRB is a second-order functional), but different non-Gaussian inputs achieve different $I(X;Y|S)$ and different $D$ . So the CRB-rate frontier is strictly below the full information-distortion frontier whenever non-Gaussian inputs can do better — for instance, when the channel benefits from discrete modulation or when the distortion function is non-MSE.

Where they coincide

They coincide asymptotically when: (i) the distortion is squared error, (ii) the channel is Gaussian with Gaussian state, (iii) the SNR is high enough that Gaussian input is near-optimal for communication, and (iv) the state prior is broad enough that the CRB is tight (no ambiguity regime). Under these conditions, both frameworks reduce to the same quadratic Pareto tradeoff.

Research takeaway

CommIT (Kobayashi-Caire-Kramer 2018, Xiong-Liu-Cui-Yuan-Han-Caire 2023) use the information-theoretic formulation to characterise the fundamental capacity-distortion region for ISAC. The CRB-rate region is a second-order surrogate that is correct in the high-SNR Gaussian regime and misleading outside it. Research aimed at discrete-modulation ISAC designs or non-MSE distortion must use the joint bound.

,

Exercises

ex-ch24-01

Gaussian prior

Laplace prior

Cauchy prior

ex-ch24-02

Rewrite the bound

Interpretation

ex-ch24-03

Monotonicity of $P_{\min}$

Resulting integral

ex-ch24-04

Mutual information

MMSE

Verification

ex-ch24-05

Steering-vector derivative

Squared norm

CRB scaling

ex-ch24-06

Score of the smoothed prior

Asymptotic evaluation

Interpretation

ex-ch24-07

Joint score

Identity via integration by parts

Matrix Cauchy-Schwarz

ex-ch24-08

High-SNR simplification

Change of variables

Evaluating the integral

ex-ch24-09

Low-SNR limit

High-SNR limit

Capacity via I-MMSE

ex-ch24-10

MMSE at zero SNR

L-shape heuristic

Connection to compressed sensing

ex-ch24-11

Lagrangian

Communication extreme

Sensing extreme

Interior of the tradeoff

ex-ch24-12

LMMSE as upper bound

Integrate via I-MMSE

Gaussian-input is the maximiser

ex-ch24-13

Narrowband single-tone

Random OFDM symbol

Up-chirp LFM

ex-ch24-14

Binary error on the circle

ZZB integrand

Bound is finite

Engineering takeaway

ex-ch24-15

Monotonicity

Derivative identity

Convexity

Geometric interpretation

ex-ch24-16

Assembling the Bayesian information

Trace of the inverse

Prior-dominated regime

Engineering interpretation

ex-ch24-17

The constraint analysis

The rank-2 tradeoff

The price

Connection to ISAC design

ex-ch24-18

Information-theoretic vs. estimation-theoretic view

Where they differ

Where they coincide

Research takeaway