Spectral Regularization Methods

Spectral Methods — Filtering in the SVD Domain

The pseudoinverse amplifies noise because it divides each SVD coefficient by σk\sigma_k, which tends to zero. The simplest remedy is to filter these divisions: replace 1/σk1/\sigma_k with a bounded function Fα(σk)/σkF_\alpha(\sigma_k) / \sigma_k that approximates 1/σk1/\sigma_k for large σk\sigma_k but damps the dangerous small-σk\sigma_k components.

This leads to the general class of spectral regularization methods:

xαδ=k=1Fα(σk2)yδ,ukvk.x_\alpha^\delta = \sum_{k=1}^{\infty} F_\alpha(\sigma_k^2)\, \langle y^\delta, u_k \rangle\, v_k.

Different choices of the filter function FαF_\alpha yield different methods: truncated SVD, Tikhonov, Landweber, and many others. This unified view is powerful: it allows comparative analysis, qualification comparison, and generalisation to new methods.

Definition:

Spectral Filter Function

A spectral filter is a family of functions {Fα ⁣:(0,A2]R}α>0\{F_\alpha \colon (0, \|\mathcal{A}\|^2] \to \mathbb{R}\}_{\alpha > 0} satisfying:

  1. Boundedness: For each α>0\alpha > 0, there exists C(α)C(\alpha) such that σFα(σ2)C(α)|\sigma \cdot F_\alpha(\sigma^2)| \leq C(\alpha) for all σ>0\sigma > 0.

  2. Pointwise convergence: For each fixed σ>0\sigma > 0,

    limα0σFα(σ2)σ=1Fα(σ2)1/σ2 as α0.\lim_{\alpha \to 0} \sigma \cdot F_\alpha(\sigma^2) \cdot \sigma = 1 \quad \Longleftrightarrow \quad F_\alpha(\sigma^2) \to 1/\sigma^2 \text{ as } \alpha \to 0.

The spectral regularization associated with FαF_\alpha is

Rαy=kFα(σk2)y,ukvk=Fα(AA)Ay.R_\alpha y = \sum_{k} F_\alpha(\sigma_k^2)\, \langle y, u_k \rangle\, v_k = F_\alpha(\mathcal{A}^*\mathcal{A})\,\mathcal{A}^* y.

This framework unifies many seemingly different methods under a single umbrella. The filter function encodes exactly how each method treats the different singular value components.

Definition:

Truncated SVD (TSVD)

The truncated SVD retains only the first KK singular components:

xKδ=k=1K1σkyδ,ukvk.x_K^\delta = \sum_{k=1}^{K} \frac{1}{\sigma_k}\, \langle y^\delta, u_k \rangle\, v_k.

The filter function is the sharp cutoff:

FK(σ2)={1/σ2if σσK,0if σ<σK.F_K(\sigma^2) = \begin{cases} 1/\sigma^2 & \text{if } \sigma \geq \sigma_K, \\ 0 & \text{if } \sigma < \sigma_K. \end{cases}

The regularization parameter is the truncation level KK; smaller KK means more regularization.

TSVD has infinite qualification: it achieves the minimax optimal convergence rate O(δ2μ/(2μ+1))O(\delta^{2\mu/(2\mu+1)}) for any source condition order μ>0\mu > 0, provided K=K(δ)K = K(\delta) is chosen optimally. The optimal truncation level satisfies approximately σKδ/E\sigma_K \approx \delta / E where EE bounds the source condition. Components with σk<δ/E\sigma_k < \delta/E are dominated by noise and should be discarded.

,

Definition:

Tikhonov Regularization

The Tikhonov regularized solution is defined as the minimizer of the functional

xαδ=argminxX{Axyδ2+αx2},x_\alpha^\delta = \arg\min_{x \in \mathcal{X}} \left\{ \|\mathcal{A}x - y^\delta\|^2 + \alpha\,\|x\|^2 \right\},

where α>0\alpha > 0 is the regularization parameter.

The first term Axyδ2\|\mathcal{A}x - y^\delta\|^2 is the data fidelity (how well xx explains the data). The second term αx2\alpha\|x\|^2 is the penalty (how large xx is). The parameter α\alpha controls the trade-off between fitting the data and keeping the solution small.

The Tikhonov filter function is

FαTikh(σ2)=1σ2+α.F_\alpha^{\mathrm{Tikh}}(\sigma^2) = \frac{1}{\sigma^2 + \alpha}.

More generally, one can replace x2\|x\|^2 with Lx2\|Lx\|^2 for a penalty operator LL (e.g., a derivative operator to penalise oscillations). This is generalised Tikhonov regularization. The non-quadratic generalization (replacing x2\|x\|^2 by R(x)R(x)) is treated in Section 2.6.

,

Theorem: Normal Equation for Tikhonov Regularization

The Tikhonov minimizer xαδx_\alpha^\delta satisfies the normal equation

(AA+αI)xαδ=Ayδ.(\mathcal{A}^*\mathcal{A} + \alpha I)\, x_\alpha^\delta = \mathcal{A}^* y^\delta.

The operator AA+αI\mathcal{A}^*\mathcal{A} + \alpha I is boundedly invertible for all α>0\alpha > 0, so

xαδ=(AA+αI)1Ayδ.x_\alpha^\delta = (\mathcal{A}^*\mathcal{A} + \alpha I)^{-1} \mathcal{A}^* y^\delta.

Theorem: SVD Form of Tikhonov Regularization

In terms of the singular system {(σk,vk,uk)}\{(\sigma_k, v_k, u_k)\} of A\mathcal{A}, the Tikhonov solution is

xαδ=k=1σkσk2+αyδ,ukvk.x_\alpha^\delta = \sum_{k=1}^{\infty} \frac{\sigma_k}{\sigma_k^2 + \alpha}\, \langle y^\delta, u_k \rangle\, v_k.

The Tikhonov filter function is

FαTikh(σ2)=1σ2+α=1σ2σ2σ2+α.F_\alpha^{\mathrm{Tikh}}(\sigma^2) = \frac{1}{\sigma^2 + \alpha} = \frac{1}{\sigma^2} \cdot \frac{\sigma^2}{\sigma^2 + \alpha}.

The factor σ2/(σ2+α)\sigma^2/(\sigma^2 + \alpha) acts as a smooth low-pass filter: it passes components with σk2α\sigma_k^2 \gg \alpha and damps those with σk2α\sigma_k^2 \ll \alpha.

,

Theorem: Convergence Rate for Tikhonov Regularization

Let x=(AA)μ/2wx^\dagger = (\mathcal{A}^*\mathcal{A})^{\mu/2} w with wE\|w\| \leq E (source condition of order μ\mu), and let yδyδ\|y^\delta - y\| \leq \delta. Then:

(a) For 0<μ20 < \mu \leq 2 and α(δ/E)2/(2μ+1)\alpha \sim (\delta/E)^{2/(2\mu+1)}:

xαδx=O(δ2μ/(2μ+1)E1/(2μ+1)).\|x_\alpha^\delta - x^\dagger\| = O\bigl(\delta^{2\mu/(2\mu+1)} E^{1/(2\mu+1)}\bigr).

(b) For μ>2\mu > 2, the rate saturates at O(δ4/5)O(\delta^{4/5}) regardless of how smooth xx^\dagger is.

Tikhonov regularization has qualification μ0=2\mu_0 = 2.

Saturation for μ>2\mu > 2 occurs because the Tikhonov filter σ2/(σ2+α)\sigma^2/(\sigma^2 + \alpha) can only approximate (1α/σ2)(1 - \alpha/\sigma^2) to first order near σ=0\sigma = 0. It cannot fully exploit the rapid decay of x,vk\langle x^\dagger, v_k\rangle for very smooth solutions. Higher-order iterated Tikhonov and TSVD overcome this limitation.

Theorem: Convergence Rate for Truncated SVD

Let x=(AA)μ/2wx^\dagger = (\mathcal{A}^*\mathcal{A})^{\mu/2} w with wE\|w\| \leq E (source condition of order μ\mu). Choose the truncation level K=K(δ)K = K(\delta) such that σK2μ+1δE\sigma_K^{2\mu+1} \approx \delta E. Then

xKδx=O(δ2μ/(2μ+1)E1/(2μ+1)).\|x_K^\delta - x^\dagger\| = O\bigl(\delta^{2\mu/(2\mu+1)} E^{1/(2\mu+1)}\bigr).

This rate is minimax optimal for all μ>0\mu > 0.

Definition:

Landweber Iteration as Spectral Regularization

The Landweber iteration for Ax=yδ\mathcal{A}x = y^\delta is

xn+1=xn+ωA(yδAxn),x0=0,x_{n+1} = x_n + \omega\,\mathcal{A}^*(y^\delta - \mathcal{A}x_n), \qquad x_0 = 0,

where ω(0,2/A2)\omega \in (0, 2/\|\mathcal{A}\|^2) is a step-size parameter. After nn iterations, the spectral filter function is

Fn(σ2)=1σ2[1(1ωσ2)n].F_n(\sigma^2) = \frac{1}{\sigma^2}\bigl[1 - (1 - \omega\sigma^2)^n\bigr].

The iteration count nn plays the role of 1/α1/\alpha: early stopping provides regularization. As nn \to \infty, Fn(σ2)1/σ2F_n(\sigma^2) \to 1/\sigma^2 and we recover the pseudoinverse (which diverges for noisy data).

Landweber iteration is computationally attractive because it requires only applications of A\mathcal{A} and A\mathcal{A}^* — no matrix factorizations. This is crucial for large-scale imaging problems where A\mathcal{A} is only available as a matrix-free operator (e.g., a fast forward model). It has infinite qualification, matching TSVD in theoretical performance, but may converge slowly in practice. The first Landweber iterate is x1=ωAyδx_1 = \omega\mathcal{A}^* y^\delta, which is simply the backprojection image — the standard starting point in SAR and CT imaging.

Example: Bayesian Interpretation of Tikhonov Regularization

Show that the Tikhonov solution xαδx_\alpha^\delta is the maximum a posteriori (MAP) estimate under Gaussian assumptions:

  • Likelihood: yδxN(Ax,σn2I)y^\delta | x \sim \mathcal{N}(\mathcal{A}x, \sigma_n^2 I)
  • Prior: xN(0,σp2I)x \sim \mathcal{N}(0, \sigma_p^2 I)

Identify the relationship between α\alpha and the noise/prior variances.

Spectral Filter Functions Compared

Compares the spectral filter functions of three regularization methods applied to the same ill-posed problem.

Top-left: The filter σkFα(σk2)\sigma_k \cdot F_\alpha(\sigma_k^2) vs. singular value index kk. The ideal (pseudoinverse) filter is 1/σk1/\sigma_k (shown dashed), which diverges. Each method approximates this curve for large σk\sigma_k but deviates for small ones.

Top-right: SVD coefficients of exact data, noisy data, and regularized reconstruction. The regularized coefficients should follow exact data for small kk and suppress noise for large kk.

Bottom: The reconstruction compared to the true solution.

Truncated SVD gives a sharp cutoff (all-or-nothing); Tikhonov gives a smooth roll-off; Landweber (with n=1/αn = 1/\alpha iterations) gives a polynomial filter that gradually engages higher frequencies.

Parameters
0.1
0.05

Tikhonov Reconstruction — Bias–Variance Trade-Off

Sweep the regularization parameter α\alpha to observe the bias–variance trade-off directly.

Top panel: True solution (blue), reconstruction (red), noisy data (gray dots).

Bottom-left: Reconstruction error xαδx\|x_\alpha^\delta - x^\dagger\| vs. α\alpha (log scale), showing the characteristic U-shape with a minimum at the optimal α\alpha.

Bottom-right: The Tikhonov filter function σk2/(σk2+α)\sigma_k^2/(\sigma_k^2 + \alpha) for each singular value component.

For very small α\alpha the reconstruction is noisy (variance-dominated); for very large α\alpha it is over-smoothed (bias-dominated). The optimal α\alpha sits at the bottom of the U-curve.

Parameters
0.01
0.05

Spectral Regularization Methods Compared

PropertyTruncated SVDTikhonovLandweber
Filter σFα(σ2)\sigma F_\alpha(\sigma^2)Sharp cutoff at σK\sigma_KSmooth roll-off σ2/(σ2+α)\sigma^2/(\sigma^2+\alpha)Polynomial 1(1ωσ2)n1-(1-\omega\sigma^2)^n
QualificationInfiniteμ0=2\mu_0 = 2Infinite
ParameterTruncation level KKα>0\alpha > 0Iteration count nn
ComputationFull SVD O(mn2)O(mn^2)Normal equation solveMatrix-free O(n)O(n) per step
Large-scale problemsNot suitableModerateIdeal
Sensitivity to parameterHigh (near gaps)SmoothModerate (semi-convergence)

Common Mistake: TSVD Is Sensitive to Singular Value Clustering

Mistake:

Choosing the truncation level KK for TSVD based solely on the noise level, without examining the singular value distribution.

Correction:

When singular values form clusters separated by gaps, truncating within a cluster can lead to artefacts. The optimal truncation should respect the natural gaps in the singular value spectrum.

For example, if σ10=0.05\sigma_{10} = 0.05 and σ11=0.049\sigma_{11} = 0.049 (a tight cluster), truncating at K=10K = 10 discards a component almost as strong as those retained. A better strategy is to look for a gap in the singular value spectrum and truncate there, or to use Tikhonov regularization which provides a smooth transition.

In imaging practice, the singular value spectrum often has a knee point where the decay steepens — this is typically the optimal truncation region.

Common Mistake: Tikhonov Cannot Exploit High-Smoothness Solutions

Mistake:

Applying standard Tikhonov regularization to an inverse problem where the true solution is very smooth (μ>2\mu > 2), expecting to achieve faster convergence rates than O(δ4/5)O(\delta^{4/5}).

Correction:

Standard Tikhonov has qualification μ0=2\mu_0 = 2: for solutions satisfying a source condition of order μ>2\mu > 2, the rate saturates at O(δ4/5)O(\delta^{4/5}) regardless of how much smoothness is available.

For such problems, use iterated Tikhonov (apply the normal equation mm times, increasing qualification to mμ0m \cdot \mu_0), TSVD, or Landweber iteration with early stopping — all of which have infinite qualification and can exploit arbitrary smoothness.

Quick Check

For Tikhonov regularization with parameter α=0.04\alpha = 0.04, what fraction of the signal energy is passed for a component with σk=0.2\sigma_k = 0.2?

100%

50%

25%

0%

Key Takeaway

All spectral regularization methods modify the pseudoinverse by replacing 1/σk1/\sigma_k with a bounded filter Fα(σk2)σkF_\alpha(\sigma_k^2)\cdot\sigma_k. Truncated SVD uses a sharp cutoff (infinite qualification; optimal but sensitive to the spectrum). Tikhonov uses a smooth roll-off at σα\sigma \approx \sqrt{\alpha} (finite qualification μ0=2\mu_0 = 2; closed-form solution via the normal equation). Landweber iteration uses a polynomial filter with the iteration count as regularization parameter (infinite qualification; only needs matrix-vector products — ideal for large-scale imaging). The Bayesian interpretation identifies Tikhonov as MAP estimation with a Gaussian prior: α=σn2/σp2\alpha = \sigma_n^2/\sigma_p^2.