Ferkans — Interactive Telecom Tutor

The Central Practical Challenge — Choosing $\alpha$

The regularization theory of Sections 2.3–2.4 shows that good reconstructions are possible if $\alpha$ is chosen correctly. But how does one choose $\alpha$ in practice?

Two regimes exist. When the noise level $\delta$ is known (or reliably estimated), the Morozov discrepancy principle provides an order-optimal, principled choice. When $\delta$ is unknown — which is the case in many experimental setups — the L-curve and generalized cross-validation (GCV) provide data-driven alternatives.

Stein's Unbiased Risk Estimator (SURE) provides yet another approach: it directly estimates the mean squared error without knowing the truth $x^\dagger$ , using only the measured data.

Definition:
Morozov's Discrepancy Principle

Morozov's discrepancy principle selects $\alpha > 0$ such that the residual matches the noise level:

$\|\mathcal{A}x_\alpha^\delta - y^\delta\| = \tau\,\delta,$

where $\tau > 1$ is a safety factor (typically $\tau \in [1, 2]$ ).

The rationale: we should fit the data only to the accuracy warranted by the noise — fitting more closely than $\delta$ means fitting noise.

Under source conditions, the discrepancy principle is provably order-optimal:

$\|x_{\alpha(\delta)}^\delta - x^\dagger\| = O\bigl(\delta^{2\mu/(2\mu+1)}\bigr)$

for $\mu \leq \mu_0$ (the method's qualification). In practice, $\delta$ may be estimated from the data (e.g., from a noise-only region of the measurement, or from repeated measurements).

,

Theorem: Existence and Monotonicity for the Discrepancy Principle

For Tikhonov regularization, the discrepancy function

$\varphi(\alpha) = \|\mathcal{A}x_\alpha^\delta - y^\delta\|^2 = \sum_{k=1}^{n} \left(\frac{\alpha}{\sigma_k^2 + \alpha}\right)^2 |\langle \mathbf{y}^\delta, \mathbf{u}_k \rangle|^2$

is monotonically increasing in $\alpha$ , with $\varphi(0) = 0$ (if $y^\delta \in \mathcal{R}(\mathcal{A})$ ) and $\varphi(\infty) = \|y^\delta\|^2$ .

Therefore the equation $\varphi(\alpha) = \tau^2\delta^2$ has a unique solution $\alpha^* > 0$ whenever $\tau\delta < \|y^\delta\|$ , which holds if $\tau^2\delta^2 < \|y^\delta\|^2$ (the data contains signal above the noise).

Proof

Derive the residual formula

From the SVD form of Tikhonov (TSVD Form of Tikhonov Regularization):

$\mathcal{A}x_\alpha^\delta - y^\delta = -\sum_k \frac{\alpha}{\sigma_k^2 + \alpha} \langle y^\delta, u_k \rangle\, u_k,$

giving $\varphi(\alpha) = \sum_k (\alpha/(\sigma_k^2+\alpha))^2 |\langle y^\delta, u_k\rangle|^2$ .

Monotonicity

Each term $f_k(\alpha) = (\alpha/(\sigma_k^2+\alpha))^2$ has derivative

$f_k'(\alpha) = \frac{2\alpha\sigma_k^2}{(\sigma_k^2 + \alpha)^3} > 0$

for $\alpha > 0$ . Since $\varphi$ is a sum of such terms, $\varphi'(\alpha) > 0$ and the equation has at most one solution. By the intermediate value theorem and the boundary values, it has exactly one solution in $(0, \infty)$ . $\blacksquare$

,

Definition:
The L-Curve Method

The L-curve is the parametric plot of

$\bigl(\log\|\mathcal{A}x_\alpha^\delta - y^\delta\|,\; \log\|x_\alpha^\delta\|\bigr)$

as $\alpha$ varies over $(0, \infty)$ . This curve typically has an "L" shape:

Horizontal branch (small $\alpha$ ): The residual is small but the solution norm is large (overfitting, noise amplification).
Vertical branch (large $\alpha$ ): The solution norm is small but the residual is large (oversmoothing).
Corner (optimal $\alpha$ ): The best trade-off between fidelity and regularity.

The L-curve criterion selects $\alpha$ at the point of maximum curvature of the L-curve.

The L-curve is a heuristic without rigorous convergence guarantees in full generality, but it is extremely popular in practice because it requires no knowledge of the noise level $\delta$ . It provides an intuitive visual diagnostic and often gives good results for moderately ill-posed problems. For severely ill-posed problems or very small noise, the corner can be ill-defined.

,

Definition:
Generalized Cross-Validation (GCV)

Generalized cross-validation selects $\alpha$ by minimising the GCV functional

$V(\alpha) = \frac{\|\mathcal{A}x_\alpha^\delta - y^\delta\|^2} {\bigl(\operatorname{tr}(I - \mathcal{A}(\mathcal{A}^*\mathcal{A} + \alpha I)^{-1}\mathcal{A}^*)\bigr)^2}.$

For the Tikhonov solution in the finite-dimensional case with SVD $\mathbf{A} = \mathbf{U}\boldsymbol{\Sigma}\mathbf{V}^T$ :

$V(\alpha) = \frac{\sum_k (\alpha/(\sigma_k^2+\alpha))^2 |\langle y^\delta, u_k\rangle|^2} {\left(\sum_k \alpha/(\sigma_k^2+\alpha)\right)^2}.$

GCV has a statistical interpretation: it estimates the expected prediction error when one data point is left out. Under standard conditions, minimising $V(\alpha)$ is asymptotically optimal in the same sense as the discrepancy principle, but without requiring knowledge of $\delta$ . In practice, GCV can be sensitive to noise correlations and tends to underestimate $\alpha$ for severely ill-posed problems.

,

Definition:
Stein's Unbiased Risk Estimator (SURE)

For a linear estimator $\hat{x} = H_\alpha y^\delta$ with noise $\eta \sim \mathcal{N}(0, \sigma_n^2 I)$ , Stein's Unbiased Risk Estimator (SURE) provides an unbiased estimate of the mean squared error $\mathbb{E}\|H_\alpha y^\delta - x^\dagger\|^2$ :

$\widehat{\mathrm{MSE}}(\alpha) = -m\sigma_n^2 + \|H_\alpha y^\delta - y^\delta\|^2 + 2\sigma_n^2\,\mathrm{tr}(H_\alpha),$

where $m = \dim(\mathcal{Y})$ . For Tikhonov regularization, $H_\alpha = \mathcal{A}(\mathcal{A}^*\mathcal{A}+\alpha I)^{-1}\mathcal{A}^*$ and $\mathrm{tr}(H_\alpha) = \sum_k \sigma_k^2/(\sigma_k^2+\alpha)$ .

The SURE-optimal $\alpha$ minimises $\widehat{\mathrm{MSE}}(\alpha)$ .

SURE is unbiased: $\mathbb{E}[\widehat{\mathrm{MSE}}(\alpha)] = \mathbb{E}\|H_\alpha y^\delta - x^\dagger\|^2$ . This makes it particularly attractive for problems with known noise variance. It is the theoretically rigorous analogue of cross-validation for Gaussian noise, and it is closely related to GCV.

,

Example: Computing the L-Curve for a Discrete Problem

For the discrete system $\mathbf{A}\mathbf{x} = \mathbf{y}^\delta$ with SVD $\mathbf{A} = \sum_{k=1}^n \sigma_k \mathbf{u}_k \mathbf{v}_k^T$ , express the residual norm and solution norm as functions of $\alpha$ in closed form, and derive the curvature formula for the L-curve corner.

Solution

Derive the residual norm

$\|\mathbf{A}\mathbf{x}_\alpha - \mathbf{y}^\delta\|^2 = \sum_{k=1}^n \left(\frac{\alpha}{\sigma_k^2 + \alpha}\right)^2 |\langle \mathbf{y}^\delta, \mathbf{u}_k \rangle|^2 + \sum_{k=n+1}^m |\langle \mathbf{y}^\delta, \mathbf{u}_k \rangle|^2.$ $The first sum captures the regularized residual; the second (if$ m > n $) captures the component outside$ \mathcal{R}(\mathbf{A})$.

Derive the solution norm

$\|\mathbf{x}_\alpha\|^2 = \sum_{k=1}^n \left(\frac{\sigma_k}{\sigma_k^2 + \alpha}\right)^2 |\langle \mathbf{y}^\delta, \mathbf{u}_k \rangle|^2.$ $

L-curve parametrisation and curvature

The L-curve is the parametric curve $\alpha \mapsto (r(\alpha), s(\alpha)) = (\log\|\text{residual}\|, \log\|\text{solution}\|)$ for $\alpha \in (0, \infty)$ .

The curvature is

$\kappa(\alpha) = \frac{|r' s'' - r'' s'|}{(r'^2 + s'^2)^{3/2}},$

where primes denote derivatives with respect to $\log\alpha$ . The L-curve criterion selects $\alpha^* = \arg\max_\alpha \kappa(\alpha)$ .

,

Parameter Choice Methods for Tikhonov Regularization

Compares Morozov's discrepancy principle, the L-curve, and GCV for selecting the Tikhonov parameter $\alpha$ .

Left panel: The L-curve in log-log coordinates. The red dot marks the corner (maximum curvature point). The blue dot marks the discrepancy principle selection. The green dot marks the GCV minimum.

Right panel: The corresponding reconstruction at the selected $\alpha$ compared to the true solution.

Bottom panel: The GCV functional $V(\alpha)$ vs. $\alpha$ (log scale).

Try different noise levels. At low noise, all three methods agree. At high noise, the methods diverge — the discrepancy principle is most reliable when $\delta$ is known accurately; the L-curve is most interpretable visually; GCV has the best theoretical properties when the noise model is Gaussian.

Parameters

Alpha grid density

Noise level

\delta

0.05

Highlighted method

Parameter Choice Methods Compared

Method	Requires $\delta$ ?	Order-optimal?	Computationally cheap?	Best for
Discrepancy principle	Yes	Yes (provably)	Yes (one 1D root)	When $\delta$ is reliably known
L-curve	No	No (heuristic)	Moderate (need curvature)	Visual diagnostics; moderate ill-posedness
GCV	No	Asymptotically	Yes (one 1D min)	Gaussian noise; independent observations
SURE	$\sigma_n^2$ required	Yes (under Gaussian model)	Yes	Gaussian noise with known variance

Common Mistake: The L-Curve Can Fail for Severely Ill-Posed Problems

Mistake:

Relying solely on the L-curve to select $\alpha$ for severely ill-posed problems (e.g., Gaussian deblurring with $\sigma_k \sim e^{-ck^2}$ ) or at very small noise levels.

Correction:

For severely ill-posed problems, the L-curve often does not have a well-defined corner — both the horizontal and vertical branches may meet at a smooth curve with no pronounced kink. In this regime:

If $\delta$ is known, use the discrepancy principle.
If $\delta$ is unknown, use GCV or SURE.
Examine the L-curve visually across a broad range of $\alpha$ to check whether a corner exists before trusting the maximum-curvature selection.

References in the literature report L-curve failures for severely ill-posed problems at noise levels below $10^{-5}$ .

⚠️Engineering Note

Estimating the Noise Level $\delta$ in Practice

The discrepancy principle requires knowledge of $\delta = \|\eta\|$ . In RF imaging systems, noise level estimation can be performed by:

Noise-only measurements: Record $N_{\text{avg}}$ samples with the transmitter off (system noise) or pointing away from the scene. Estimate $\sigma_n^2 = \frac{1}{m}\|y_{\text{noise}}\|^2$ .
Residual-based estimation: If an initial rough reconstruction $\hat{x}$ is available, estimate $\delta$ from $\|y - \mathcal{A}\hat{x}\|$ .
Cramer–Rao bound: For a calibrated radar with known transmit power and receiver noise figure, the noise variance is determined by the link budget equation.

A mismatch between the true $\delta$ and the estimated $\hat{\delta}$ shifts the discrepancy selection: overestimating $\delta$ leads to oversmoothing; underestimating leads to noise-contaminated reconstructions. The safety factor $\tau > 1$ provides robustness against underestimation.

Key Takeaway

Four practical methods exist for choosing the regularization parameter: Morozov's discrepancy principle (order-optimal when $\delta$ is known), the L-curve (visual, no $\delta$ needed, heuristic), GCV (asymptotically optimal, no $\delta$ needed), and SURE (unbiased risk estimate under Gaussian noise). For the discrepancy principle, the residual equation $\varphi(\alpha) = \tau^2\delta^2$ has a unique solution because the Tikhonov residual is monotonically increasing in $\alpha$ . In practical RF imaging, the discrepancy principle with $\tau \in [1.1, 1.5]$ is the standard choice when the noise level can be measured from calibration data.

Parameter Choice Rules

The Central Practical Challenge — Choosing α\alphaα

Definition: Morozov's Discrepancy Principle