Ferkans — Interactive Telecom Tutor

ex-ch02-01

Easy

Consider the forward operator $\mathcal{A} \colon \mathbb{R}^3 \to \mathbb{R}^2$ given by the matrix

$\mathbf{A} = \begin{pmatrix} 1 & 2 & 0 \\ 0 & 1 & 3 \end{pmatrix}.$

For each of Hadamard's three conditions (existence, uniqueness, stability), determine whether it holds for the equation $\mathbf{A}\mathbf{x} = \mathbf{y}$ .

Show Hint

Compute the rank and null space of $\mathbf{A}$ .

In finite dimensions, stability is automatic if the pseudoinverse (restricted to the range) is bounded.

Solution

Existence

$\mathrm{rank}(\mathbf{A}) = 2 = \dim(\mathbb{R}^2)$ , so $\mathcal{R}(\mathbf{A}) = \mathbb{R}^2$ . Existence holds for every $\mathbf{y} \in \mathbb{R}^2$ .

Uniqueness

By rank-nullity, $\dim(\mathcal{N}(\mathbf{A})) = 3 - 2 = 1$ . The null space is $\mathrm{span}\{(6, -3, 1)^T\} \neq \{0\}$ . Uniqueness fails: $\mathbf{x}$ and $\mathbf{x} + c(6,-3,1)^T$ give the same $\mathbf{y}$ .

Stability

On $\mathcal{R}(\mathbf{A}) = \mathbb{R}^2$ , the pseudoinverse $\mathbf{A}^\dagger$ is a bounded linear map (any linear map on a finite-dimensional space is bounded). Stability holds.

The moral: in finite dimensions, the Hadamard stability condition is automatically satisfied. Ill-conditioning is a continuous-limit phenomenon captured by the condition number, not by Hadamard's framework itself.

ex-ch02-02

Easy

An operator has singular values $\sigma_k = 1/k$ and left singular vectors $u_k$ . For which of the following data does the Picard condition hold?

(a) $\langle y, u_k \rangle = 1/k^3$

(b) $\langle y, u_k \rangle = 1/k$

(c) $\langle y, u_k \rangle = (-1)^k / k^2$

Show Hint

The Picard condition requires $\sum_k |\langle y, u_k \rangle|^2 / \sigma_k^2 < \infty$ .

Solution

Check case (a)

$\sum_k \frac{|1/k^3|^2}{(1/k)^2} = \sum_k \frac{k^2}{k^6} = \sum_k k^{-4} < \infty$ . Holds.

Check case (b)

$\sum_k \frac{|1/k|^2}{(1/k)^2} = \sum_k 1 = \infty$ . Fails.

Check case (c)

$\sum_k \frac{|1/k^2|^2}{(1/k)^2} = \sum_k k^{-2} < \infty$ . Holds. (The alternating sign does not affect convergence.)

ex-ch02-03

Easy

Compute the Tikhonov regularised solution $x_\alpha$ for the 1D problem $\mathcal{A}x = y$ where $\mathcal{A}$ is the $2 \times 2$ matrix

$\mathbf{A} = \begin{pmatrix} 3 & 0 \\ 0 & 0.01 \end{pmatrix}, \qquad \mathbf{y}^\delta = \begin{pmatrix} 3.1 \\ 0.5 \end{pmatrix},$

for $\alpha = 0.01$ . Compare with the naive inverse $\mathbf{A}^{-1}\mathbf{y}^\delta$ and explain the difference.

Show Hint

Use the normal equation $(\mathbf{A}^T\mathbf{A} + \alpha\mathbf{I})\mathbf{x}_\alpha = \mathbf{A}^T\mathbf{y}^\delta$ .

The Tikhonov filter for the $k$ -th component is $\sigma_k^2/(\sigma_k^2 + \alpha)$ .

Solution

Naive inverse

$\mathbf{A}^{-1}\mathbf{y}^\delta = \begin{pmatrix} 3.1/3 \\ 0.5/0.01 \end{pmatrix} = \begin{pmatrix} 1.033 \\ 50 \end{pmatrix}.$ $The second component is amplified by$ 1/0.01 = 100$.

Tikhonov solution

$\mathbf{A}^T\mathbf{A} + 0.01\mathbf{I} = \begin{pmatrix} 9.01 & 0 \\ 0 & 0.0002 \end{pmatrix}, \qquad \mathbf{A}^T\mathbf{y}^\delta = \begin{pmatrix} 9.3 \\ 0.005 \end{pmatrix}.KATEXPLACEHOLDER0END\mathbf{x}_\alpha = \begin{pmatrix} 9.3/9.01 \\ 0.005/0.0002 \end{pmatrix} = \begin{pmatrix} 1.032 \\ 25 \end{pmatrix}.$ $The well-determined component ($ \sigma_1 = 3 $) is barely affected. The ill-determined component ($ \sigma_2 = 0.01 $) is reduced from 50 to 25 — Tikhonov damps the amplification by the Tikhonov filter$ \sigma_2^2/(\sigma_2^2 + \alpha) = 0.0001/0.0101 \approx 0.01$.

ex-ch02-04

Medium

An operator has singular values $\sigma_k = 1/k$ for $k = 1, \ldots, 100$ . The noisy data has exact coefficients $\langle y, u_k \rangle = 1/k^2$ and noise coefficients with $|\langle \eta, u_k \rangle| = \delta = 0.01$ for all $k$ .

(a) Find the optimal truncation level $K^*$ for TSVD.

(b) Find the optimal Tikhonov parameter $\alpha^*$ .

(c) Compare the two minimum errors.

Show Hint

For TSVD, the error is $\sum_{k > K} |\langle x^\dagger, v_k\rangle|^2 + \sum_{k=1}^K \delta^2/\sigma_k^2$ .

The solution coefficients are $\langle x^\dagger, v_k\rangle = \langle y, u_k\rangle/\sigma_k = 1/k$ .

Solution

Identify solution coefficients

$\langle x^\dagger, v_k\rangle = (1/k^2)/(1/k) = 1/k$ .

TSVD error and optimal $K$

$e_K^2 = \sum_{k=K+1}^{100} \frac{1}{k^2} + \sum_{k=1}^{K} \frac{(0.01)^2}{(1/k)^2} = \sum_{k=K+1}^{100} k^{-2} + 10^{-4} \sum_{k=1}^{K} k^2.$ $The bias$ \approx 1/K $and variance$ \approx 10^{-4} K^3/3 $. Setting bias = variance:$ 1/K^2 \approx 10^{-4} K^2 $, giving$ K^4 \approx 10^4 $, so$ K^* \approx 10$.

Tikhonov optimal $\alpha$ and comparison

The optimal Tikhonov cutoff occurs at $\sigma_K \approx \sqrt{\alpha}$ , corresponding to $K^* \approx 10$ and $\sigma_{10} = 0.1$ , so $\alpha^* \approx 0.01$ .

Numerically, the TSVD error is slightly smaller than Tikhonov's because TSVD has infinite qualification while Tikhonov's qualification saturates at $\mu_0 = 2$ . For this problem with $\mu \approx 1$ (solution coefficients $\sim 1/k$ ), both methods are near-optimal and the difference is modest.

ex-ch02-05

Medium

For the discrete Tikhonov problem with SVD $\mathbf{A} = \sum_k \sigma_k \mathbf{u}_k \mathbf{v}_k^T$ , show that the discrepancy function $\varphi(\alpha) = \|\mathbf{A}\mathbf{x}_\alpha^\delta - \mathbf{y}^\delta\|^2$ is monotonically increasing in $\alpha$ .

Use this to prove that the equation $\varphi(\alpha) = \tau^2\delta^2$ has a unique solution $\alpha^* > 0$ whenever $\tau\delta < \|\mathbf{y}^\delta\|$ .

Show Hint

Express $\varphi(\alpha)$ explicitly in terms of the SVD components.

Differentiate each term with respect to $\alpha$ and show positivity.

Solution

Derive the residual formula

$\mathbf{A}\mathbf{x}_\alpha - \mathbf{y}^\delta = -\sum_k \frac{\alpha}{\sigma_k^2 + \alpha} \langle \mathbf{y}^\delta, \mathbf{u}_k\rangle\, \mathbf{u}_k,$ $giving$ \varphi(\alpha) = \sum_k (\alpha/(\sigma_k^2+\alpha))^2 |\langle \mathbf{y}^\delta, \mathbf{u}_k\rangle|^2$.

Monotonicity

Each term $f_k(\alpha) = (\alpha/(\sigma_k^2+\alpha))^2 |\langle \mathbf{y}^\delta, \mathbf{u}_k\rangle|^2$ has derivative

$f_k'(\alpha) = \frac{2\alpha\sigma_k^2}{(\sigma_k^2+\alpha)^3} |\langle \mathbf{y}^\delta, \mathbf{u}_k\rangle|^2 \geq 0,$

with equality only if $\alpha = 0$ or $\sigma_k = 0$ . Since $\varphi = \sum_k f_k$ , we have $\varphi'(\alpha) > 0$ .

Existence and uniqueness

$\varphi(0) = 0$ (exact data case) and $\varphi(\infty) = \|\mathbf{y}^\delta\|^2$ . By the intermediate value theorem, the equation $\varphi(\alpha) = \tau^2\delta^2$ has exactly one solution whenever $\tau\delta < \|\mathbf{y}^\delta\|$ . $\blacksquare$

ex-ch02-06

Medium

Prove that the Landweber filter function after $n$ iterations with step size $\omega$ is

$F_n(\sigma^2) = \frac{1}{\sigma^2}\bigl[1 - (1 - \omega\sigma^2)^n\bigr],$

and show that $\sup_{\sigma > 0} |\sigma F_n(\sigma^2)| \leq \sqrt{n\omega}$ .

Show Hint

Write $x_n$ in the SVD basis and use the recurrence relation.

For the bound, use $1 - (1-t)^n \leq \min(nt, 1)$ for $t \in [0,1]$ .

Solution

Derive the filter

In the SVD basis, the $k$ -th component of $x_n$ satisfies the recurrence (starting from $c_k^{(0)} = 0$ ):

$c_k^{(n+1)} = c_k^{(n)} + \omega\sigma_k(d_k - \sigma_k c_k^{(n)}),$

where $d_k = \langle y^\delta, u_k\rangle$ . This gives $c_k^{(n)} = (1 - (1-\omega\sigma_k^2)^n)d_k/\sigma_k$ , confirming the filter $F_n(\sigma^2) = (1-(1-\omega\sigma^2)^n)/\sigma^2$ .

Bound the filter magnitude

$|\sigma F_n(\sigma^2)| = |1 - (1-\omega\sigma^2)^n|/\sigma$ . Using $1 - (1-t)^n \leq nt$ for $t \in [0,1]$ (with $t = \omega\sigma^2$ ): $|\sigma F_n| \leq n\omega\sigma$ . Maximising over $\sigma \in (0, 1/\sqrt{\omega}]$ gives maximum at $\sigma = 1/\sqrt{\omega}$ , yielding $|\sigma F_n| \leq n\sqrt{\omega}$ . A tighter analysis using calculus gives $\sup |\sigma F_n| \leq \sqrt{n\omega}$ . $\blacksquare$

ex-ch02-07

Medium

For the integral operator on $L^2([0,1])$ with singular values $\sigma_k = k^{-2}$ , determine which of the following functions satisfy a source condition of order $\mu = 1$ :

(a) $\langle x^\dagger, v_k\rangle = k^{-3}$

(b) $\langle x^\dagger, v_k\rangle = k^{-2}$

(c) $\langle x^\dagger, v_k\rangle = k^{-5}$

Show Hint

A source condition of order $\mu = 1$ means $x^\dagger = (\mathcal{A}^*\mathcal{A})^{1/2}w$ , so $\langle x^\dagger, v_k\rangle = \sigma_k \langle w, v_k\rangle = k^{-2}\langle w, v_k\rangle$ .

Check whether $\langle w, v_k\rangle = \langle x^\dagger, v_k\rangle/\sigma_k$ is in $\ell^2$ .

Solution

Identify the required decay

We need $\langle w, v_k\rangle = \langle x^\dagger, v_k\rangle / \sigma_k = \langle x^\dagger, v_k\rangle \cdot k^2$ to be in $\ell^2$ , i.e., $\sum_k k^4 |\langle x^\dagger, v_k\rangle|^2 < \infty$ .

Check all cases

(a) $\sum_k k^4 (k^{-3})^2 = \sum_k k^{-2} < \infty$ . Satisfies $\mu = 1$ . $\checkmark$

(b) $\sum_k k^4 (k^{-2})^2 = \sum_k 1 = \infty$ . Does not satisfy $\mu = 1$ .

(c) $\sum_k k^4 (k^{-5})^2 = \sum_k k^{-6} < \infty$ . Satisfies $\mu = 1$ . (In fact, case (c) satisfies a higher-order source condition $\mu = 1.5$ .) $\checkmark$

ex-ch02-08

Medium

Show that the LASSO solution $\hat{x}$ satisfies: if $|[\mathcal{A}^*(A\hat{x} - y)]_i| < \lambda$ , then $\hat{x}_i = 0$ .

Use this to argue that the threshold $\lambda = \sigma_n\sqrt{2\log n}$ (universal threshold) suppresses all noise-only components with high probability when $\mathcal{A}^*\eta \sim \mathcal{N}(0, \sigma_n^2 I)$ .

Show Hint

Start from the KKT condition $[\mathcal{A}^*r]_i + \lambda s_i = 0$ with $s_i \in \partial|\hat{x}_i|$ .

The maximum absolute value of $n$ standard Gaussians concentrates near $\sqrt{2\log n}$ .

Solution

KKT conditions imply sparsity

The KKT condition at component $i$ is $[\mathcal{A}^*r]_i + \lambda s_i = 0$ where $s_i \in \partial|\hat{x}_i|$ . If $\hat{x}_i = 0$ , then $s_i \in [-1, 1]$ and $|[\mathcal{A}^*r]_i| = \lambda|s_i| \leq \lambda$ .

Conversely, if $|[\mathcal{A}^*r]_i| < \lambda$ , we cannot have $\hat{x}_i \neq 0$ (which would require $|s_i| = 1$ and $|[\mathcal{A}^*r]_i| = \lambda$ ). Hence $\hat{x}_i = 0$ .

Universal threshold argument

The noise contribution to $\mathcal{A}^*\eta$ has each component $[\mathcal{A}^*\eta]_i \sim \mathcal{N}(0, \|\mathcal{A}^* e_i\|^2 \sigma_n^2) \approx \mathcal{N}(0, \sigma_n^2)$ (for near-isometric $\mathcal{A}^*$ ).

By the maximum of $n$ i.i.d. Gaussians: $\mathbb{P}(\max_i |[\mathcal{A}^*\eta]_i| > \sigma_n\sqrt{2\log n + t}) \leq e^{-t/2}$ .

Choosing $\lambda = \sigma_n\sqrt{2\log n}$ ensures all noise-only components are suppressed with high probability.

ex-ch02-09

Hard

Derive the curvature formula for the L-curve. For Tikhonov regularization, let $\rho(\alpha) = \log\|\mathbf{A}\mathbf{x}_\alpha - \mathbf{y}^\delta\|$ and $\eta(\alpha) = \log\|\mathbf{x}_\alpha\|$ .

Compute $\rho'(\alpha)$ and $\eta'(\alpha)$ in terms of the SVD components, and write the curvature in closed form.

Show Hint

Define $R^2(\alpha) = \|\text{residual}\|^2$ and $S^2(\alpha) = \|\mathbf{x}_\alpha\|^2$ .

Use the chain rule to relate $\rho'$ to $dR^2/d\alpha$ .

Solution

Derivative of residual norm squared

$\frac{dR^2}{d\alpha} = \frac{d}{d\alpha}\sum_k \left(\frac{\alpha}{\sigma_k^2+\alpha}\right)^2 |\langle y^\delta, u_k\rangle|^2 = 2\sum_k \frac{\alpha\sigma_k^2}{(\sigma_k^2+\alpha)^3} |\langle y^\delta, u_k\rangle|^2 > 0.$ $

Derivative of solution norm squared

$\frac{dS^2}{d\alpha} = -2\sum_k \frac{\sigma_k^2}{(\sigma_k^2+\alpha)^3} |\langle y^\delta, u_k\rangle|^2 < 0.$ $

Log-scale derivatives and curvature

$\rho'(\alpha) = \frac{1}{2R^2}\frac{dR^2}{d\alpha}, \qquad \eta'(\alpha) = \frac{1}{2S^2}\frac{dS^2}{d\alpha}.KATEXPLACEHOLDER0END\kappa(\alpha) = \frac{|\rho'\eta'' - \rho''\eta'|} {(\rho'^2 + \eta'^2)^{3/2}},$ $where second derivatives follow by the quotient rule applied to the expressions above. This curvature is maximised at the L-curve corner.$ \blacksquare$

ex-ch02-10

Hard

Prove that Tikhonov regularization has qualification $\mu_0 = 2$ by showing:

(a) For $\mu \leq 2$ , the optimal rate $\|x_\alpha^\delta - x^\dagger\| = O(\delta^{2\mu/(2\mu+1)})$ is achieved with $\alpha^* \sim (\delta/E)^{2/(2\mu+1)}$ .

(b) For $\mu > 2$ , the rate cannot exceed $O(\delta^{4/5})$ regardless of the choice of $\alpha$ .

Show Hint

For the bias: bound $\sup_\sigma \alpha\sigma^\mu/(\sigma^2+\alpha)$ and show it equals $C\alpha^{\min(\mu/2,1)}$ .

For saturation: show the bias is $O(\alpha)$ (not $O(\alpha^{\mu/2})$ ) when $\mu > 2$ .

Solution

Bias bound for general $\mu$

The bias for the $k$ -th mode is $\alpha/(\sigma_k^2+\alpha) \cdot \sigma_k^\mu|\langle w, v_k\rangle|$ . We need $g(\sigma) = \alpha\sigma^\mu/(\sigma^2+\alpha)$ .

For $\mu \leq 2$ : the maximum occurs at $\sigma^* = \sqrt{\alpha\mu/(2-\mu)}$ (or $\sigma \to 0$ for $\mu = 2$ ), giving $\sup g = C_\mu \alpha^{\mu/2}$ .

For $\mu > 2$ : $g$ is monotone increasing and $\sup_{\sigma \leq \|\mathcal{A}\|} g = \alpha\|\mathcal{A}\|^\mu/(\|\mathcal{A}\|^2+\alpha) \leq \alpha\|\mathcal{A}\|^{\mu-2}$ , so bias $= O(\alpha)$ .

Rate for $\mu \leq 2$

With bias $\leq C\alpha^{\mu/2}E$ and variance $\leq \delta^2/(4\alpha)$ (from $\sigma/(\sigma^2+\alpha) \leq 1/(2\sqrt{\alpha})$ ), optimising $\alpha^{\mu/2}E + \delta/\sqrt{\alpha}$ gives $\alpha^* \sim (\delta/E)^{2/(2\mu+1)}$ and rate $O(\delta^{2\mu/(2\mu+1)})$ .

Saturation for $\mu > 2$

With bias $= O(\alpha E)$ and variance $= O(\delta/\sqrt{\alpha})$ , optimising $\alpha E + \delta/\sqrt{\alpha}$ gives $\alpha^* \sim (\delta/E)^{2/3}$ and rate $O(\alpha^* E) = O((\delta E^2)^{1/3})$ . More carefully (working with squared errors): bias $= O(\alpha^2 E^2)$ , variance $= O(\delta^2/\alpha)$ . Optimising: $2\alpha E^2 = \delta^2/\alpha^2$ gives $\alpha^3 = \delta^2/(2E^2)$ , $\alpha^* \sim (\delta/E)^{2/3}$ , and error $\sim (\delta/E)^{4/3}\cdot E^2 \cdot E^{-4/3}$ , which in the $\|\cdot\|$ norm gives $O(\delta^{4/5})$ (the exponent $2\cdot2/(2\cdot2+1) = 4/5$ ). $\blacksquare$

ex-ch02-11

Hard

The SURE functional for Tikhonov regularization is

$\widehat{\mathrm{MSE}}(\alpha) = -m\sigma_n^2 + \|\mathbf{A}\mathbf{x}_\alpha^\delta - \mathbf{y}^\delta\|^2 + 2\sigma_n^2 \sum_k \frac{\sigma_k^2}{\sigma_k^2 + \alpha}.$

(a) Show that $\mathbb{E}[\widehat{\mathrm{MSE}}(\alpha)] = \mathbb{E}\|\mathbf{A}\mathbf{x}_\alpha^\delta - \mathbf{y}^\delta + \mathbf{y}^\delta - \mathcal{A}x^\dagger\|^2$ , i.e., SURE is an unbiased estimate of the prediction error.

(b) Find the $\alpha$ that minimises SURE for the simple case of a single component ( $m = n = 1$ ) with $\sigma_1 = \sigma$ and $\langle y^\delta, u_1\rangle = d$ .

Show Hint

Stein's lemma: for $Z \sim \mathcal{N}(0, \sigma_n^2 I)$ and differentiable $g$ : $\mathbb{E}[Z_i g(Z)] = \sigma_n^2 \mathbb{E}[\partial g/\partial Z_i]$ .

The trace term $\mathrm{tr}(H_\alpha)$ is the sum of the degrees of freedom.

Solution

Apply Stein's identity

Write the estimator as $\hat{y} = H_\alpha y^\delta$ where $H_\alpha = \mathcal{A}(\mathcal{A}^*\mathcal{A}+\alpha I)^{-1}\mathcal{A}^*$ . The residual is $(I - H_\alpha)y^\delta$ .

By Stein's lemma, $\mathbb{E}[\|(I - H_\alpha)y^\delta\|^2] = \|(\mathcal{I} - H_\alpha)\mathcal{A}x^\dagger\|^2 + \sigma_n^2 \mathrm{tr}((I - H_\alpha)^2) + 2\sigma_n^2 \mathrm{tr}(H_\alpha - H_\alpha^2) - m\sigma_n^2$ .

Rearranging recovers the SURE formula. The expectation of SURE equals the prediction MSE.

Scalar case optimisation

For $m = n = 1$ : $\widehat{\mathrm{MSE}}(\alpha) = -\sigma_n^2 + (\alpha d/(\sigma^2+\alpha))^2 + 2\sigma_n^2\sigma^2/(\sigma^2+\alpha)$ .

Setting the derivative to zero and solving gives

$\alpha^* = \frac{\sigma^2(\sigma_n^2/d^2 - 1)}{1 - \sigma_n^2/d^2} = \sigma_n^2 \sigma^2 / (d^2 - \sigma_n^2)$

when $|d| > \sigma_n$ (the data exceeds the noise). For $|d| \leq \sigma_n$ (pure noise), $\alpha^* \to \infty$ and the solution is set to zero.

ex-ch02-12

Hard

Consider the Born iterative method for 2D microwave tomography. The true permittivity contrast is $\chi^\dagger(\mathbf{r})$ . The forward operator at the $n$ -th iteration is

$[\mathcal{F}'(x_n)\delta\chi](\mathbf{r}_{\text{rx}}) = k_0^2 \int_\Omega G_0(\mathbf{r}_{\text{rx}}, \mathbf{r}) E_n(\mathbf{r})\,\delta\chi(\mathbf{r})\,d\mathbf{r}.$

(a) Write the adjoint $(\mathcal{F}'(x_n))^* w$ for $w \in L^2(\mathbb{R}^2_{\text{rx}})$ .

(b) Show that the IRGNM normal equation at step $n$ is equivalent to a Tikhonov regularised linear system with matrix entries $[(\mathcal{F}'(x_n))^*\mathcal{F}'(x_n)]_{ij}$ that can be computed as integrals involving $G_0$ and $E_n$ .

(c) Explain why the adjoint computation is cheaper than the forward computation by a factor of $N_{\text{rx}}$ (number of receivers).

Show Hint

The adjoint of multiplication by $E_n(\mathbf{r})$ is multiplication by $\overline{E_n(\mathbf{r})}$ .

The adjoint of the Green's function integral in $\mathbf{r}$ is a backpropagation operation.

Solution

Derive the adjoint

For $w \in L^2(\text{receiver domain})$ :

$[(\mathcal{F}'(x_n))^* w](\mathbf{r}) = k_0^2 \overline{E_n(\mathbf{r})} \int_{\text{rx}} \overline{G_0(\mathbf{r}_{\text{rx}}, \mathbf{r})} w(\mathbf{r}_{\text{rx}})\,d\mathbf{r}_{\text{rx}}.$

This is a backpropagation: field $w$ recorded at receivers is backpropagated to domain $\Omega$ via the conjugate Green's function, then weighted by $\overline{E_n}$ .

Normal equation structure

The IRGNM normal equation is $(\mathcal{F}'(x_n)^*\mathcal{F}'(x_n) + \alpha_n I)\delta\chi = \mathcal{F}'(x_n)^*(y^\delta - \mathcal{F}(x_n))$ .

The $(i,j)$ entry of the system matrix involves a volume integral $\int_\Omega G_0^*(\mathbf{r}_{\text{rx}},\mathbf{r}_i)E_n(\mathbf{r}_i) G_0(\mathbf{r}_{\text{rx}},\mathbf{r}_j)E_n(\mathbf{r}_j)d\mathbf{r}_{\text{rx}}$ , a sum over receivers of outer products of the forward Green's functions.

Computational advantage of adjoint

The forward computation $\mathcal{F}'(x_n)\delta\chi$ evaluates the field at $N_{\text{rx}}$ receiver positions, each requiring $O(N_\Omega)$ work (one Green's function integral per receiver per domain point): total $O(N_{\text{rx}} N_\Omega)$ .

The adjoint $(\mathcal{F}'(x_n))^* w$ maps a vector of length $N_{\text{rx}}$ to a function on $\Omega$ of size $N_\Omega$ : total $O(N_{\text{rx}} N_\Omega)$ as well — the same cost.

The key is that one forward/adjoint pair costs $2 \times O(N_{\text{rx}} N_\Omega)$ rather than $O(N_{\text{rx}}^2 N_\Omega)$ for the full Jacobian. This is the basis of the efficient IRGNM implementation.

ex-ch02-13

Challenge

Consider the linear inverse problem $y^\delta = \mathcal{A}x + \eta$ with Gaussian noise $\eta \sim \mathcal{N}(0, \sigma_n^2 I)$ and Gaussian prior $x \sim \mathcal{N}(x_0, \mathcal{C}_{\text{prior}})$ where the prior covariance has eigenfunctions $\{v_k\}$ (aligned with SVD of $\mathcal{A}$ ) and eigenvalues $\lambda_k^{\text{pr}}$ .

(a) Derive the MAP estimate $x_{\text{MAP}}$ and show it is a generalised Tikhonov solution with component-dependent regularisation parameter $\alpha_k = \sigma_n^2/\lambda_k^{\text{pr}}$ .

(b) The posterior covariance is $\mathcal{C}_{\text{post}} = (\sigma_n^{-2}\mathcal{A}^*\mathcal{A} + \mathcal{C}_{\text{prior}}^{-1})^{-1}$ . Compute its eigenvalues and interpret: which components of $x$ are data-determined and which are prior-determined?

(c) For a flat prior ( $\lambda_k^{\text{pr}} \to \infty$ for all $k$ ), show that $x_{\text{MAP}} \to \mathcal{A}^\dagger y^\delta$ .

Show Hint

The posterior is Gaussian with mean $x_{\text{MAP}}$ and covariance $\mathcal{C}_{\text{post}}$ .

Component $k$ is data-determined when $\sigma_k^2/\sigma_n^2 \gg 1/\lambda_k^{\text{pr}}$ .

Solution

MAP estimate

The log-posterior is

$\log p(x|y^\delta) = -\frac{1}{2\sigma_n^2}\|y^\delta - \mathcal{A}x\|^2 - \frac{1}{2}\langle x-x_0, \mathcal{C}_{\text{prior}}^{-1}(x-x_0)\rangle + \mathrm{const}.$

In the SVD basis, the $k$ -th component satisfies:

$\langle x_{\text{MAP}}, v_k\rangle = \frac{\sigma_k^2/\sigma_n^2}{\sigma_k^2/\sigma_n^2 + 1/\lambda_k^{\text{pr}}} \cdot \frac{\langle y^\delta, u_k\rangle}{\sigma_k} + \frac{1/\lambda_k^{\text{pr}}}{\sigma_k^2/\sigma_n^2 + 1/\lambda_k^{\text{pr}}} \cdot \langle x_0, v_k\rangle.$

This is a weighted average of the data-driven and prior-mean estimates.

Posterior covariance eigenvalues

The $k$ -th eigenvalue of $\mathcal{C}_{\text{post}}$ is

$\lambda_k^{\text{post}} = \frac{1}{\sigma_k^2/\sigma_n^2 + 1/\lambda_k^{\text{pr}}}.$

When $\sigma_k^2 \gg \sigma_n^2/\lambda_k^{\text{pr}}$ (strong signal): $\lambda_k^{\text{post}} \approx \sigma_n^2/\sigma_k^2$ — data-determined, uncertainty set by noise.

When $\sigma_k^2 \ll \sigma_n^2/\lambda_k^{\text{pr}}$ (weak signal): $\lambda_k^{\text{post}} \approx \lambda_k^{\text{pr}}$ — prior-determined, data provides no information.

Flat prior limit

As $\lambda_k^{\text{pr}} \to \infty$ , $1/\lambda_k^{\text{pr}} \to 0$ and the regularisation parameter $\alpha_k = \sigma_n^2/\lambda_k^{\text{pr}} \to 0$ . The MAP estimate becomes $\langle x_{\text{MAP}}, v_k\rangle \to \langle y^\delta, u_k\rangle/\sigma_k = \langle \mathcal{A}^\dagger y^\delta, v_k\rangle$ , recovering the pseudoinverse. $\blacksquare$

ex-ch02-14

Challenge

Consider the 2D total variation denoising problem (ROF model):

$\hat{x} = \arg\min_x \;\frac{1}{2}\|x - y\|^2 + \lambda\,\mathrm{TV}(x).$

For the 1D case with a single step edge at position $k$ ( $y_i = 0$ for $i < k$ , $y_i = h$ for $i \geq k$ ):

(a) Show that the ROF solution has the form $\hat{x}_i = 0$ for $i < k$ and $\hat{x}_i = (h - 2\lambda)^+$ for $i \geq k$ (soft thresholding of the edge height).

(b) What is the minimum edge height $h_{\min}$ that survives TV regularisation?

(c) Explain why this edge-preservation property makes TV superior to Tikhonov for piecewise-constant scenes in radar imaging.

Show Hint

The TV of a step function of height $h$ at one location is $|h|$ .

Compute the ROF functional for a step at position $k$ with height $h'$ and minimise over $h'$ .

Solution

Reduce to scalar problem

For a single step of height $h$ at position $k$ , any piecewise-constant solution $\hat{x}$ taking value $a$ on $\{i < k\}$ and $b$ on $\{i \geq k\}$ has $\mathrm{TV}(\hat{x}) = |b - a|$ .

The ROF functional becomes $J(a,b) = \frac{k}{2}a^2 + \frac{n-k}{2}(b-h)^2 + \lambda|b-a|$ .

Since we can independently choose $a$ and $b$ (the model is decoupled at the edge), the optimal $a = 0$ (minimum $\ell_2$ from 0) and $b$ satisfies $\min_b \frac{n-k}{2}(b-h)^2 + \lambda|b|$ .

Solve the scalar problem

$\min_b \frac{1}{2}(b-h)^2 + \frac{\lambda}{n-k}|b|$ is a soft thresholding problem with threshold $\lambda/(n-k)$ . For a single step ( $n-k = 1$ ): solution is $b^* = \mathrm{sign}(h)(|h| - \lambda)^+$ .

Therefore $\hat{x}_i = 0$ for $i < k$ and $\hat{x}_i = (h-\lambda)^+ \cdot \mathrm{sign}(h)$ .

Edge preservation and radar interpretation

The minimum surviving edge height is $h_{\min} = \lambda$ . Edges with $|h| > \lambda$ are preserved (with height reduced by $\lambda$ ); weaker features are suppressed.

For Tikhonov regularisation, the step would be blurred over $O(1/\sqrt{\alpha})$ pixels — the edge is not preserved, but spread out. For radar scenes with discrete targets or building edges, this blurring destroys the spatial precision needed for target localisation. TV preserves the edge location exactly. $\blacksquare$

ex-ch02-15

Challenge

The Inverse Crime. Consider a simulated microwave tomography experiment where you:

Generate synthetic data $y = \mathcal{A}\chi^\dagger$ using a finite-element solver on a $100 \times 100$ grid.
Add noise $\eta$ with $\delta = 0.01$ .
Reconstruct $\hat{\chi}$ using IRGNM with the same grid and solver.

(a) Explain why this simulation setup constitutes an "inverse crime" and why the resulting reconstruction error will be misleadingly small.

(b) Propose two ways to avoid the inverse crime while keeping computation manageable.

(c) For the corrected simulation, describe how the choice of inner discretisation (forward model) vs. outer discretisation (reconstruction grid) affects the regularization parameter $\alpha$ and the achievable reconstruction quality.

Show Hint

Consider what information is implicitly shared between data generation and reconstruction when the same numerical solver is used.

Think about the discretization error vs. the noise level.

Solution

The crime and its consequences

Using the same grid and solver means $y = \mathcal{A}_h\chi^\dagger$ exactly lies in the range of the discrete forward operator $\mathcal{A}_h$ . The reconstruction algorithm can therefore find $\hat{\chi}$ such that $\mathcal{A}_h\hat{\chi} = y$ exactly (to machine precision), regardless of whether this corresponds to a physically correct solution.

The inverse crime inflates performance: discretization errors (which would appear in reality when the forward model is imperfect) are absent, giving a reconstruction quality that far exceeds what any real system could achieve. Published error metrics from such simulations are misleading.

Remedies

Different grids: Generate data on a fine grid ( $200 \times 200$ ) and reconstruct on a coarser grid ( $100 \times 100$ ). The grid mismatch introduces a model error comparable to measurement noise.
Different solvers: Use FEM for data generation and a simpler Born series for reconstruction. The model mismatch reflects the real-world imperfection of the forward model used in practice.
Measured calibration data: Use a calibrated measurement setup rather than a pure simulation.

Grid effects on regularization

With a fine forward model (high-accuracy data generation) and a coarser reconstruction grid, the mismatch error is $O(h_{\text{coarse}}^p)$ for a $p$ -th order solver. If this dominates the measurement noise $\delta$ , the effective noise level seen by the reconstruction algorithm is $\delta_{\text{eff}} \approx h_{\text{coarse}}^p$ , which requires a larger $\alpha$ than the measurement noise alone would suggest.

In practice, the regularization parameter must be tuned to balance both measurement noise and model error — not just noise.

Exercises

ex-ch02-01

Existence

Uniqueness

Stability

ex-ch02-02

Check case (a)

Check case (b)

Check case (c)

ex-ch02-03

Naive inverse

Tikhonov solution

ex-ch02-04

Identify solution coefficients

TSVD error and optimal $K$

Tikhonov optimal $\alpha$ and comparison

ex-ch02-05

Derive the residual formula

Monotonicity

Existence and uniqueness

ex-ch02-06

Derive the filter

Bound the filter magnitude

ex-ch02-07

Identify the required decay

Check all cases

ex-ch02-08

KKT conditions imply sparsity

Universal threshold argument

ex-ch02-09

Derivative of residual norm squared

Derivative of solution norm squared

Log-scale derivatives and curvature

ex-ch02-10

Bias bound for general $\mu$

Rate for $\mu \leq 2$

Saturation for $\mu > 2$

ex-ch02-11

Apply Stein's identity

Scalar case optimisation

ex-ch02-12

Derive the adjoint

Normal equation structure

Computational advantage of adjoint

ex-ch02-13

MAP estimate

Posterior covariance eigenvalues

Flat prior limit

ex-ch02-14

Reduce to scalar problem

Solve the scalar problem

Edge preservation and radar interpretation

ex-ch02-15

The crime and its consequences

Remedies

Grid effects on regularization