Exercises

ex-fsi-ch23-01

Easy

Show that the Huber loss ρδ(r)\rho_\delta(r) is continuously differentiable everywhere, including at r=δ|r|=\delta.

ex-fsi-ch23-02

Easy

Compute the breakdown point of the trimmed mean that discards the kk smallest and kk largest of nn samples. Express it in terms of kk and nn.

ex-fsi-ch23-03

Easy

Verify that the influence function of the sample mean under the standard Gaussian is IF(x)=x\mathrm{IF}(x) = x. Interpret the unbounded growth.

ex-fsi-ch23-04

Easy

Write the Gaussian KDE with bandwidth hh on samples x1,,xnx_1,\dots,x_n in closed form. Verify that it integrates to 11.

ex-fsi-ch23-05

Easy

State Mercer's condition for a function k(x,x)k(x,x') to be a valid positive-definite kernel.

ex-fsi-ch23-06

Medium

Derive the IRLS update for the Huber M-estimator of location. Show that each IRLS iteration is a weighted-least-squares problem with weights wi=min(1,δ/ri)w_i = \min(1,\delta/|r_i|).

ex-fsi-ch23-07

Medium

For the Gaussian KDE with bandwidth hh, the integrated MSE is AMISE(h)=R(K)nh+h4μ2(K)2R(p)4\mathrm{AMISE}(h) = \dfrac{R(K)}{nh} + \dfrac{h^4 \mu_2(K)^2 R(p'')}{4}. Find the optimal hh^\star and resulting AMISE rate.

ex-fsi-ch23-08

Medium

Prove that kernel ridge regression minfHi=1n(yif(xi))2+λfH2\min_{f\in\mathcal{H}} \sum_{i=1}^n (y_i-f(x_i))^2 + \lambda\|f\|_\mathcal{H}^2 has solution f(x)=k(x)T(K+λI)1yf^\star(x) = \mathbf{k}(x)^T(\mathbf{K}+\lambda\mathbf{I})^{-1}\mathbf{y}.

ex-fsi-ch23-09

Medium

Compute the posterior mean and variance of a GP at a test point xx_*, given training data (X,y)(\mathbf{X},\mathbf{y}), noise variance σn2\sigma_n^2, and covariance function kk.

ex-fsi-ch23-10

Medium

The Tukey biweight loss has derivative ψc(r)=r(1(r/c)2)2\psi_c(r) = r(1 - (r/c)^2)^2 for rc|r|\le c and 00 otherwise. Show that the corresponding ρ\rho is NOT convex and give one consequence.

ex-fsi-ch23-11

Medium

Show the "kernel trick" for the polynomial kernel k(x,x)=(1+x,x)dk(x,x')=(1+\langle x,x'\rangle)^d in R2\mathbb{R}^2, d=2d=2: write the explicit feature map ϕ\phi such that k(x,x)=ϕ(x),ϕ(x)k(x,x')=\langle\phi(x),\phi(x')\rangle.

ex-fsi-ch23-12

Medium

For ISTA with step η\eta and soft threshold τ\tau, one iteration is x(t+1)=Sτ(x(t)+ηAT(yAx(t)))\mathbf{x}^{(t+1)} = \mathcal{S}_\tau(\mathbf{x}^{(t)} + \eta \mathbf{A}^T(\mathbf{y}-\mathbf{A}\mathbf{x}^{(t)})). Rewrite LISTA as stacked layers with learnable (We(t),Ws(t),τ(t))(\mathbf{W}_e^{(t)},\mathbf{W}_s^{(t)},\tau^{(t)}) per layer, and give the number of parameters.

ex-fsi-ch23-13

Hard

Prove that at the minimax point, the Huber loss corresponds to ML estimation under the density proportional to exp(ρδ(r))\exp(-\rho_\delta(r)). Identify this density (hint: Gaussian center, Laplace tails).

ex-fsi-ch23-14

Hard

For the Nadaraya–Watson estimator with Gaussian kernel and bandwidth hh, derive the leading-order bias at an interior point x0x_0 and show it depends on p(x0),m(x0),m(x0)p'(x_0), m'(x_0), m''(x_0) where m(x)=E[YX=x]m(x)=\mathbb{E}[Y\mid X=x].

ex-fsi-ch23-15

Hard

Let θ^n\hat{\theta}_n be the MSE-trained neural estimator from nn i.i.d. pairs (y,θ)(\mathbf{y},\boldsymbol{\theta}). Under universal approximation and nn\to\infty, prove that θ^n\hat{\theta}_n converges to the MMSE estimator E[θy]\mathbb{E}[\boldsymbol{\theta}\mid\mathbf{y}] almost surely.

ex-fsi-ch23-16

Hard

Suppose a LISTA network trained on Atrain\mathbf{A}_{\text{train}} is deployed where Atest=Atrain+Δ\mathbf{A}_{\text{test}} = \mathbf{A}_{\text{train}} + \Delta with Δ2=η\|\Delta\|_2 = \eta. Give an upper bound on the recovery error in terms of η\eta and the operator norm of the learned weights.