Parameter Choice Rules
The Central Practical Challenge — Choosing
The regularization theory of Sections 2.3–2.4 shows that good reconstructions are possible if is chosen correctly. But how does one choose in practice?
Two regimes exist. When the noise level is known (or reliably estimated), the Morozov discrepancy principle provides an order-optimal, principled choice. When is unknown — which is the case in many experimental setups — the L-curve and generalized cross-validation (GCV) provide data-driven alternatives.
Stein's Unbiased Risk Estimator (SURE) provides yet another approach: it directly estimates the mean squared error without knowing the truth , using only the measured data.
Definition: Morozov's Discrepancy Principle
Morozov's Discrepancy Principle
Morozov's discrepancy principle selects such that the residual matches the noise level:
where is a safety factor (typically ).
The rationale: we should fit the data only to the accuracy warranted by the noise — fitting more closely than means fitting noise.
Under source conditions, the discrepancy principle is provably order-optimal:
for (the method's qualification). In practice, may be estimated from the data (e.g., from a noise-only region of the measurement, or from repeated measurements).
Theorem: Existence and Monotonicity for the Discrepancy Principle
For Tikhonov regularization, the discrepancy function
is monotonically increasing in , with (if ) and .
Therefore the equation has a unique solution whenever , which holds if (the data contains signal above the noise).
Derive the residual formula
Monotonicity
Each term has derivative
for . Since is a sum of such terms, and the equation has at most one solution. By the intermediate value theorem and the boundary values, it has exactly one solution in .
Definition: The L-Curve Method
The L-Curve Method
The L-curve is the parametric plot of
as varies over . This curve typically has an "L" shape:
- Horizontal branch (small ): The residual is small but the solution norm is large (overfitting, noise amplification).
- Vertical branch (large ): The solution norm is small but the residual is large (oversmoothing).
- Corner (optimal ): The best trade-off between fidelity and regularity.
The L-curve criterion selects at the point of maximum curvature of the L-curve.
The L-curve is a heuristic without rigorous convergence guarantees in full generality, but it is extremely popular in practice because it requires no knowledge of the noise level . It provides an intuitive visual diagnostic and often gives good results for moderately ill-posed problems. For severely ill-posed problems or very small noise, the corner can be ill-defined.
Definition: Generalized Cross-Validation (GCV)
Generalized Cross-Validation (GCV)
Generalized cross-validation selects by minimising the GCV functional
For the Tikhonov solution in the finite-dimensional case with SVD :
GCV has a statistical interpretation: it estimates the expected prediction error when one data point is left out. Under standard conditions, minimising is asymptotically optimal in the same sense as the discrepancy principle, but without requiring knowledge of . In practice, GCV can be sensitive to noise correlations and tends to underestimate for severely ill-posed problems.
Definition: Stein's Unbiased Risk Estimator (SURE)
Stein's Unbiased Risk Estimator (SURE)
For a linear estimator with noise , Stein's Unbiased Risk Estimator (SURE) provides an unbiased estimate of the mean squared error :
where . For Tikhonov regularization, and .
The SURE-optimal minimises .
SURE is unbiased: . This makes it particularly attractive for problems with known noise variance. It is the theoretically rigorous analogue of cross-validation for Gaussian noise, and it is closely related to GCV.
Example: Computing the L-Curve for a Discrete Problem
For the discrete system with SVD , express the residual norm and solution norm as functions of in closed form, and derive the curvature formula for the L-curve corner.
Derive the residual norm
m > n\mathcal{R}(\mathbf{A})$.
Derive the solution norm
$
L-curve parametrisation and curvature
The L-curve is the parametric curve for .
The curvature is
where primes denote derivatives with respect to . The L-curve criterion selects .
Parameter Choice Methods for Tikhonov Regularization
Compares Morozov's discrepancy principle, the L-curve, and GCV for selecting the Tikhonov parameter .
Left panel: The L-curve in log-log coordinates. The red dot marks the corner (maximum curvature point). The blue dot marks the discrepancy principle selection. The green dot marks the GCV minimum.
Right panel: The corresponding reconstruction at the selected compared to the true solution.
Bottom panel: The GCV functional vs. (log scale).
Try different noise levels. At low noise, all three methods agree. At high noise, the methods diverge — the discrepancy principle is most reliable when is known accurately; the L-curve is most interpretable visually; GCV has the best theoretical properties when the noise model is Gaussian.
Parameters
Parameter Choice Methods Compared
| Method | Requires ? | Order-optimal? | Computationally cheap? | Best for |
|---|---|---|---|---|
| Discrepancy principle | Yes | Yes (provably) | Yes (one 1D root) | When is reliably known |
| L-curve | No | No (heuristic) | Moderate (need curvature) | Visual diagnostics; moderate ill-posedness |
| GCV | No | Asymptotically | Yes (one 1D min) | Gaussian noise; independent observations |
| SURE | required | Yes (under Gaussian model) | Yes | Gaussian noise with known variance |
Common Mistake: The L-Curve Can Fail for Severely Ill-Posed Problems
Mistake:
Relying solely on the L-curve to select for severely ill-posed problems (e.g., Gaussian deblurring with ) or at very small noise levels.
Correction:
For severely ill-posed problems, the L-curve often does not have a well-defined corner — both the horizontal and vertical branches may meet at a smooth curve with no pronounced kink. In this regime:
- If is known, use the discrepancy principle.
- If is unknown, use GCV or SURE.
- Examine the L-curve visually across a broad range of to check whether a corner exists before trusting the maximum-curvature selection.
References in the literature report L-curve failures for severely ill-posed problems at noise levels below .
Estimating the Noise Level in Practice
The discrepancy principle requires knowledge of . In RF imaging systems, noise level estimation can be performed by:
-
Noise-only measurements: Record samples with the transmitter off (system noise) or pointing away from the scene. Estimate .
-
Residual-based estimation: If an initial rough reconstruction is available, estimate from .
-
Cramer–Rao bound: For a calibrated radar with known transmit power and receiver noise figure, the noise variance is determined by the link budget equation.
A mismatch between the true and the estimated shifts the discrepancy selection: overestimating leads to oversmoothing; underestimating leads to noise-contaminated reconstructions. The safety factor provides robustness against underestimation.
Key Takeaway
Four practical methods exist for choosing the regularization parameter: Morozov's discrepancy principle (order-optimal when is known), the L-curve (visual, no needed, heuristic), GCV (asymptotically optimal, no needed), and SURE (unbiased risk estimate under Gaussian noise). For the discrepancy principle, the residual equation has a unique solution because the Tikhonov residual is monotonically increasing in . In practical RF imaging, the discrepancy principle with is the standard choice when the noise level can be measured from calibration data.