Ferkans — Interactive Telecom Tutor

Spectral Methods — Filtering in the SVD Domain

The pseudoinverse amplifies noise because it divides each SVD coefficient by $\sigma_k$ , which tends to zero. The simplest remedy is to filter these divisions: replace $1/\sigma_k$ with a bounded function $F_\alpha(\sigma_k) / \sigma_k$ that approximates $1/\sigma_k$ for large $\sigma_k$ but damps the dangerous small- $\sigma_k$ components.

This leads to the general class of spectral regularization methods:

$x_\alpha^\delta = \sum_{k=1}^{\infty} F_\alpha(\sigma_k^2)\, \langle y^\delta, u_k \rangle\, v_k.$

Different choices of the filter function $F_\alpha$ yield different methods: truncated SVD, Tikhonov, Landweber, and many others. This unified view is powerful: it allows comparative analysis, qualification comparison, and generalisation to new methods.

Definition:
Spectral Filter Function

A spectral filter is a family of functions $\{F_\alpha \colon (0, \|\mathcal{A}\|^2] \to \mathbb{R}\}_{\alpha > 0}$ satisfying:

Boundedness: For each $\alpha > 0$ , there exists $C(\alpha)$ such that $|\sigma \cdot F_\alpha(\sigma^2)| \leq C(\alpha)$ for all $\sigma > 0$ .
Pointwise convergence: For each fixed $\sigma > 0$ ,

$\lim_{\alpha \to 0} \sigma \cdot F_\alpha(\sigma^2) \cdot \sigma = 1 \quad \Longleftrightarrow \quad F_\alpha(\sigma^2) \to 1/\sigma^2 \text{ as } \alpha \to 0.$

The spectral regularization associated with $F_\alpha$ is

$R_\alpha y = \sum_{k} F_\alpha(\sigma_k^2)\, \langle y, u_k \rangle\, v_k = F_\alpha(\mathcal{A}^*\mathcal{A})\,\mathcal{A}^* y.$

This framework unifies many seemingly different methods under a single umbrella. The filter function encodes exactly how each method treats the different singular value components.

Definition:
Truncated SVD (TSVD)

The truncated SVD retains only the first $K$ singular components:

$x_K^\delta = \sum_{k=1}^{K} \frac{1}{\sigma_k}\, \langle y^\delta, u_k \rangle\, v_k.$

The filter function is the sharp cutoff:

$F_K(\sigma^2) = \begin{cases} 1/\sigma^2 & \text{if } \sigma \geq \sigma_K, \\ 0 & \text{if } \sigma < \sigma_K. \end{cases}$

The regularization parameter is the truncation level $K$ ; smaller $K$ means more regularization.

TSVD has infinite qualification: it achieves the minimax optimal convergence rate $O(\delta^{2\mu/(2\mu+1)})$ for any source condition order $\mu > 0$ , provided $K = K(\delta)$ is chosen optimally. The optimal truncation level satisfies approximately $\sigma_K \approx \delta / E$ where $E$ bounds the source condition. Components with $\sigma_k < \delta/E$ are dominated by noise and should be discarded.

,

Definition:
Tikhonov Regularization

The Tikhonov regularized solution is defined as the minimizer of the functional

$x_\alpha^\delta = \arg\min_{x \in \mathcal{X}} \left\{ \|\mathcal{A}x - y^\delta\|^2 + \alpha\,\|x\|^2 \right\},$

where $\alpha > 0$ is the regularization parameter.

The first term $\|\mathcal{A}x - y^\delta\|^2$ is the data fidelity (how well $x$ explains the data). The second term $\alpha\|x\|^2$ is the penalty (how large $x$ is). The parameter $\alpha$ controls the trade-off between fitting the data and keeping the solution small.

The Tikhonov filter function is

$F_\alpha^{\mathrm{Tikh}}(\sigma^2) = \frac{1}{\sigma^2 + \alpha}.$

More generally, one can replace $\|x\|^2$ with $\|Lx\|^2$ for a penalty operator $L$ (e.g., a derivative operator to penalise oscillations). This is generalised Tikhonov regularization. The non-quadratic generalization (replacing $\|x\|^2$ by $R(x)$ ) is treated in Section 2.6.

,

Theorem: Normal Equation for Tikhonov Regularization

The Tikhonov minimizer $x_\alpha^\delta$ satisfies the normal equation

$(\mathcal{A}^*\mathcal{A} + \alpha I)\, x_\alpha^\delta = \mathcal{A}^* y^\delta.$

The operator $\mathcal{A}^*\mathcal{A} + \alpha I$ is boundedly invertible for all $\alpha > 0$ , so

$x_\alpha^\delta = (\mathcal{A}^*\mathcal{A} + \alpha I)^{-1} \mathcal{A}^* y^\delta.$

Proof

Compute the gradient

The functional $\Phi(x) = \|\mathcal{A}x - y^\delta\|^2 + \alpha\|x\|^2$ is Fréchet differentiable with

$\Phi'(x) = 2\mathcal{A}^*(\mathcal{A}x - y^\delta) + 2\alpha x.$

Setting $\Phi'(x) = 0$ gives the normal equation.

Invertibility

For any $x \neq 0$ ,

$\langle (\mathcal{A}^*\mathcal{A} + \alpha I)x, x \rangle = \|\mathcal{A}x\|^2 + \alpha\|x\|^2 \geq \alpha\|x\|^2 > 0.$

Thus $\mathcal{A}^*\mathcal{A} + \alpha I$ is strictly positive definite and invertible with $\|(\mathcal{A}^*\mathcal{A} + \alpha I)^{-1}\| \leq 1/\alpha$ . $\blacksquare$

Theorem: SVD Form of Tikhonov Regularization

In terms of the singular system $\{(\sigma_k, v_k, u_k)\}$ of $\mathcal{A}$ , the Tikhonov solution is

$x_\alpha^\delta = \sum_{k=1}^{\infty} \frac{\sigma_k}{\sigma_k^2 + \alpha}\, \langle y^\delta, u_k \rangle\, v_k.$

The Tikhonov filter function is

$F_\alpha^{\mathrm{Tikh}}(\sigma^2) = \frac{1}{\sigma^2 + \alpha} = \frac{1}{\sigma^2} \cdot \frac{\sigma^2}{\sigma^2 + \alpha}.$

The factor $\sigma^2/(\sigma^2 + \alpha)$ acts as a smooth low-pass filter: it passes components with $\sigma_k^2 \gg \alpha$ and damps those with $\sigma_k^2 \ll \alpha$ .

Proof

Diagonalise in the SVD basis

In the SVD basis, $\mathcal{A}^*\mathcal{A}$ is diagonal with entries $\sigma_k^2$ , and $\mathcal{A}^* y^\delta$ has components $\sigma_k \langle y^\delta, u_k \rangle$ . The normal equation becomes

$(\sigma_k^2 + \alpha) \langle x_\alpha^\delta, v_k \rangle = \sigma_k \langle y^\delta, u_k \rangle,$

yielding the stated formula. $\blacksquare$

,

Theorem: Convergence Rate for Tikhonov Regularization

Let $x^\dagger = (\mathcal{A}^*\mathcal{A})^{\mu/2} w$ with $\|w\| \leq E$ (source condition of order $\mu$ ), and let $\|y^\delta - y\| \leq \delta$ . Then:

(a) For $0 < \mu \leq 2$ and $\alpha \sim (\delta/E)^{2/(2\mu+1)}$ :

$\|x_\alpha^\delta - x^\dagger\| = O\bigl(\delta^{2\mu/(2\mu+1)} E^{1/(2\mu+1)}\bigr).$

(b) For $\mu > 2$ , the rate saturates at $O(\delta^{4/5})$ regardless of how smooth $x^\dagger$ is.

Tikhonov regularization has qualification $\mu_0 = 2$ .

Saturation for $\mu > 2$ occurs because the Tikhonov filter $\sigma^2/(\sigma^2 + \alpha)$ can only approximate $(1 - \alpha/\sigma^2)$ to first order near $\sigma = 0$ . It cannot fully exploit the rapid decay of $\langle x^\dagger, v_k\rangle$ for very smooth solutions. Higher-order iterated Tikhonov and TSVD overcome this limitation.

Proof

Bias–variance decomposition

$\|x_\alpha^\delta - x^\dagger\|^2 = \underbrace{\sum_k \left(\frac{\alpha}{\sigma_k^2+\alpha}\right)^2 |\langle x^\dagger, v_k\rangle|^2}_{\text{bias}^2} + \underbrace{\sum_k \left(\frac{\sigma_k}{\sigma_k^2+\alpha}\right)^2 |\langle y^\delta - y, u_k\rangle|^2}_{\text{variance}}.$ $

Bound the bias

Using the source condition, $|\langle x^\dagger, v_k\rangle| = \sigma_k^\mu|\langle w, v_k\rangle|$ . The bias term is

$\text{bias}^2 = \sum_k \left(\frac{\alpha}{\sigma_k^2+\alpha}\right)^2 \sigma_k^{2\mu} |\langle w, v_k\rangle|^2 \leq \sup_\sigma \left(\frac{\alpha\sigma^\mu}{\sigma^2+\alpha}\right)^2 E^2.$

For $\mu \leq 2$ : $\sup_\sigma \alpha\sigma^\mu/(\sigma^2+\alpha) \leq C_\mu \alpha^{\mu/2}$ , giving bias $\leq C_\mu \alpha^{\mu/2} E$ .

Bound the variance

$\text{variance} \leq \left(\sup_\sigma \frac{\sigma}{\sigma^2 + \alpha}\right)^2 \delta^2 \leq \frac{\delta^2}{4\alpha}$ $(using$ \sigma/(\sigma^2+\alpha) \leq 1/(2\sqrt{\alpha})$).

Optimise and conclude

Setting bias $\sim$ variance: $\alpha^{\mu/2} E \sim \delta/\sqrt{\alpha}$ gives $\alpha^{(\mu+1)/2} \sim \delta/E$ , so $\alpha^* \sim (\delta/E)^{2/(2\mu+1)}$ and the error is $O(\delta^{2\mu/(2\mu+1)})$ .

For $\mu > 2$ : the bias is $O(\alpha)$ (the sup saturates at $C\alpha^1$ ), and optimising $\alpha E + \delta/\sqrt{\alpha}$ gives $\alpha^* \sim (\delta/E)^{2/3}$ and rate $O(\delta^{4/5})$ . $\blacksquare$

Theorem: Convergence Rate for Truncated SVD

Let $x^\dagger = (\mathcal{A}^*\mathcal{A})^{\mu/2} w$ with $\|w\| \leq E$ (source condition of order $\mu$ ). Choose the truncation level $K = K(\delta)$ such that $\sigma_K^{2\mu+1} \approx \delta E$ . Then

$\|x_K^\delta - x^\dagger\| = O\bigl(\delta^{2\mu/(2\mu+1)} E^{1/(2\mu+1)}\bigr).$

This rate is minimax optimal for all $\mu > 0$ .

Proof

Bias–variance decomposition

$\|x_K^\delta - x^\dagger\|^2 = \underbrace{\sum_{k > K} |\langle x^\dagger, v_k \rangle|^2} _{\text{bias}^2} + \underbrace{\sum_{k=1}^{K} \frac{|\langle \eta, u_k \rangle|^2} {\sigma_k^2}}_{\text{variance}}.$ $

Bound the bias

Using the source condition: $\text{bias}^2 = \sum_{k > K} \sigma_k^{2\mu} |\langle w, v_k\rangle|^2 \leq \sigma_K^{2\mu} \|w\|^2 \leq \sigma_K^{2\mu} E^2.$

Bound the variance and optimise

$\text{variance} \leq K \cdot \frac{\delta^2}{\sigma_K^2}.$ $Optimising$ \sigma_K^{2\mu} E^2 + K\delta^2/\sigma_K^2 $yields$ \sigma_K \sim (\delta/E)^{1/(2\mu+1)} $and the stated rate.$ \blacksquare$

Definition:
Landweber Iteration as Spectral Regularization

The Landweber iteration for $\mathcal{A}x = y^\delta$ is

$x_{n+1} = x_n + \omega\,\mathcal{A}^*(y^\delta - \mathcal{A}x_n), \qquad x_0 = 0,$

where $\omega \in (0, 2/\|\mathcal{A}\|^2)$ is a step-size parameter. After $n$ iterations, the spectral filter function is

$F_n(\sigma^2) = \frac{1}{\sigma^2}\bigl[1 - (1 - \omega\sigma^2)^n\bigr].$

The iteration count $n$ plays the role of $1/\alpha$ : early stopping provides regularization. As $n \to \infty$ , $F_n(\sigma^2) \to 1/\sigma^2$ and we recover the pseudoinverse (which diverges for noisy data).

Landweber iteration is computationally attractive because it requires only applications of $\mathcal{A}$ and $\mathcal{A}^*$ — no matrix factorizations. This is crucial for large-scale imaging problems where $\mathcal{A}$ is only available as a matrix-free operator (e.g., a fast forward model). It has infinite qualification, matching TSVD in theoretical performance, but may converge slowly in practice. The first Landweber iterate is $x_1 = \omega\mathcal{A}^* y^\delta$ , which is simply the backprojection image — the standard starting point in SAR and CT imaging.

Example: Bayesian Interpretation of Tikhonov Regularization

Show that the Tikhonov solution $x_\alpha^\delta$ is the maximum a posteriori (MAP) estimate under Gaussian assumptions:

Likelihood: $y^\delta | x \sim \mathcal{N}(\mathcal{A}x, \sigma_n^2 I)$
Prior: $x \sim \mathcal{N}(0, \sigma_p^2 I)$

Identify the relationship between $\alpha$ and the noise/prior variances.

Solution

Write the posterior

By Bayes' theorem, the log-posterior is (up to constants):

$\log p(x | y^\delta) = -\frac{1}{2\sigma_n^2}\|y^\delta - \mathcal{A}x\|^2 - \frac{1}{2\sigma_p^2}\|x\|^2 + \mathrm{const}.$

Maximise

Maximising $\log p(x | y^\delta)$ is equivalent to minimising

$\frac{1}{\sigma_n^2}\|y^\delta - \mathcal{A}x\|^2 + \frac{1}{\sigma_p^2}\|x\|^2.$

This is exactly the Tikhonov functional with

$\alpha = \frac{\sigma_n^2}{\sigma_p^2}.$

Interpretation

The regularization parameter encodes the signal-to-noise ratio:

High noise ( $\sigma_n^2$ large) $\Rightarrow$ large $\alpha$ $\Rightarrow$ strong regularization (trust the prior more).
Confident prior ( $\sigma_p^2$ small) $\Rightarrow$ large $\alpha$ $\Rightarrow$ the solution is pulled toward zero.

This Bayesian viewpoint motivates more sophisticated priors (sparsity, total variation) that lead to the variational methods of Section 2.6.

Spectral Filter Functions Compared

Compares the spectral filter functions of three regularization methods applied to the same ill-posed problem.

Top-left: The filter $\sigma_k \cdot F_\alpha(\sigma_k^2)$ vs. singular value index $k$ . The ideal (pseudoinverse) filter is $1/\sigma_k$ (shown dashed), which diverges. Each method approximates this curve for large $\sigma_k$ but deviates for small ones.

Top-right: SVD coefficients of exact data, noisy data, and regularized reconstruction. The regularized coefficients should follow exact data for small $k$ and suppress noise for large $k$ .

Bottom: The reconstruction compared to the true solution.

Truncated SVD gives a sharp cutoff (all-or-nothing); Tikhonov gives a smooth roll-off; Landweber (with $n = 1/\alpha$ iterations) gives a polynomial filter that gradually engages higher frequencies.

Parameters

Regularization method

Regularization parameter

\alpha

0.1

Noise level

\delta

0.05

Tikhonov Reconstruction — Bias–Variance Trade-Off

Sweep the regularization parameter $\alpha$ to observe the bias–variance trade-off directly.

Top panel: True solution (blue), reconstruction (red), noisy data (gray dots).

Bottom-left: Reconstruction error $\|x_\alpha^\delta - x^\dagger\|$ vs. $\alpha$ (log scale), showing the characteristic U-shape with a minimum at the optimal $\alpha$ .

Bottom-right: The Tikhonov filter function $\sigma_k^2/(\sigma_k^2 + \alpha)$ for each singular value component.

For very small $\alpha$ the reconstruction is noisy (variance-dominated); for very large $\alpha$ it is over-smoothed (bias-dominated). The optimal $\alpha$ sits at the bottom of the U-curve.

Parameters

Regularization parameter

\alpha

0.01

Noise level

\delta

0.05

Spectral Regularization Methods Compared

Property	Truncated SVD	Tikhonov	Landweber
Filter $\sigma F_\alpha(\sigma^2)$	Sharp cutoff at $\sigma_K$	Smooth roll-off $\sigma^2/(\sigma^2+\alpha)$	Polynomial $1-(1-\omega\sigma^2)^n$
Qualification	Infinite	$\mu_0 = 2$	Infinite
Parameter	Truncation level $K$	$\alpha > 0$	Iteration count $n$
Computation	Full SVD $O(mn^2)$	Normal equation solve	Matrix-free $O(n)$ per step
Large-scale problems	Not suitable	Moderate	Ideal
Sensitivity to parameter	High (near gaps)	Smooth	Moderate (semi-convergence)

Common Mistake: TSVD Is Sensitive to Singular Value Clustering

Mistake:

Choosing the truncation level $K$ for TSVD based solely on the noise level, without examining the singular value distribution.

Correction:

When singular values form clusters separated by gaps, truncating within a cluster can lead to artefacts. The optimal truncation should respect the natural gaps in the singular value spectrum.

For example, if $\sigma_{10} = 0.05$ and $\sigma_{11} = 0.049$ (a tight cluster), truncating at $K = 10$ discards a component almost as strong as those retained. A better strategy is to look for a gap in the singular value spectrum and truncate there, or to use Tikhonov regularization which provides a smooth transition.

In imaging practice, the singular value spectrum often has a knee point where the decay steepens — this is typically the optimal truncation region.

Common Mistake: Tikhonov Cannot Exploit High-Smoothness Solutions

Mistake:

Applying standard Tikhonov regularization to an inverse problem where the true solution is very smooth ( $\mu > 2$ ), expecting to achieve faster convergence rates than $O(\delta^{4/5})$ .

Correction:

Standard Tikhonov has qualification $\mu_0 = 2$ : for solutions satisfying a source condition of order $\mu > 2$ , the rate saturates at $O(\delta^{4/5})$ regardless of how much smoothness is available.

For such problems, use iterated Tikhonov (apply the normal equation $m$ times, increasing qualification to $m \cdot \mu_0$ ), TSVD, or Landweber iteration with early stopping — all of which have infinite qualification and can exploit arbitrary smoothness.

Quick Check

For Tikhonov regularization with parameter $\alpha = 0.04$ , what fraction of the signal energy is passed for a component with $\sigma_k = 0.2$ ?

100%

50%

25%

0%

Correction:

50%

The Tikhonov filter is $\sigma_k^2/(\sigma_k^2 + \alpha) = 0.04/(0.04 + 0.04) = 0.5$ (50%). The transition occurs at $\sigma_k = \sqrt{\alpha} = 0.2$ , which is exactly our value. Components with $\sigma_k \gg 0.2$ pass nearly unattenuated; those with $\sigma_k \ll 0.2$ are heavily damped.

Key Takeaway

All spectral regularization methods modify the pseudoinverse by replacing $1/\sigma_k$ with a bounded filter $F_\alpha(\sigma_k^2)\cdot\sigma_k$ . Truncated SVD uses a sharp cutoff (infinite qualification; optimal but sensitive to the spectrum). Tikhonov uses a smooth roll-off at $\sigma \approx \sqrt{\alpha}$ (finite qualification $\mu_0 = 2$ ; closed-form solution via the normal equation). Landweber iteration uses a polynomial filter with the iteration count as regularization parameter (infinite qualification; only needs matrix-vector products — ideal for large-scale imaging). The Bayesian interpretation identifies Tikhonov as MAP estimation with a Gaussian prior: $\alpha = \sigma_n^2/\sigma_p^2$ .

Spectral Regularization Methods

Spectral Methods — Filtering in the SVD Domain

Definition: Spectral Filter Function

Definition: Truncated SVD (TSVD)

Definition: Tikhonov Regularization

Theorem: Normal Equation for Tikhonov Regularization

Compute the gradient

Invertibility

Theorem: SVD Form of Tikhonov Regularization

Diagonalise in the SVD basis

Theorem: Convergence Rate for Tikhonov Regularization

Bias–variance decomposition

Bound the bias

Bound the variance

Optimise and conclude

Theorem: Convergence Rate for Truncated SVD

Bias–variance decomposition

Bound the bias

Bound the variance and optimise

Definition: Landweber Iteration as Spectral Regularization

Example: Bayesian Interpretation of Tikhonov Regularization

Write the posterior

Maximise

Interpretation

Spectral Filter Functions Compared

Parameters

Tikhonov Reconstruction — Bias–Variance Trade-Off

Parameters

Spectral Regularization Methods Compared

Common Mistake: TSVD Is Sensitive to Singular Value Clustering

Common Mistake: Tikhonov Cannot Exploit High-Smoothness Solutions

Quick Check

Key Takeaway

Definition:
Spectral Filter Function

Definition:
Truncated SVD (TSVD)

Definition:
Tikhonov Regularization

Definition:
Landweber Iteration as Spectral Regularization