Convergence Diagnostics in Practice
Knowing When to Stop
An iterative reconstruction algorithm without a principled stopping criterion is incomplete. Running too few iterations yields a poor reconstruction; running too many wastes computation and may even degrade quality through noise amplification (semi-convergence). This section develops practical diagnostics that:
- Certify solution quality β providing an upper bound on the distance to optimality.
- Detect pathological behavior β stagnation, oscillation, or divergence.
- Enable fair comparison β across different algorithms and parameter settings.
These diagnostics apply to every reconstruction algorithm in Parts IV--VI of this book. The tools are algorithm-agnostic: we present each diagnostic once and explain which algorithms it applies to.
Definition: Primal Residual
Primal Residual
For ADMM applied to the problem subject to , the primal residual at iteration is
It measures the violation of the equality constraint. At the solution, .
For TV-regularized imaging with and , the constraint is (discrete gradient equals auxiliary variable). The primal residual measures how well agrees with the actual gradient of the current image.
Definition: Dual Residual
Dual Residual
The dual residual at iteration is
where is the augmented Lagrangian penalty parameter. It measures the violation of the dual feasibility condition (stationarity of the Lagrangian with respect to ).
The standard ADMM stopping criterion terminates when both residuals are small:
with typical values and .
Definition: Fixed-Point Residual
Fixed-Point Residual
For a proximal algorithm with update operator (i.e., ), the fixed-point residual at iteration is
This is a universal diagnostic: it applies to ISTA, FISTA, ADMM, Chambolle--Pock, Douglas--Rachford, and any other fixed-point iteration. If is nonexpansive, is necessary (though not sufficient) for convergence to a fixed point.
The fixed-point residual is cheap to compute (one vector difference and norm) and requires no knowledge of the algorithm's internal structure. It is the first quantity to monitor for any new algorithm.
Theorem: Morozov's Discrepancy Principle
For the noisy inverse problem with , Morozov's discrepancy principle selects the regularization parameter (or the iteration count ) such that the data-fidelity residual matches the noise level:
Specifically, one chooses the largest (or the earliest ) satisfying
for a safety factor (typically --).
We should fit the data only as well as the noise allows. Fitting below the noise level means fitting noise β overfitting in the statistical sense. The discrepancy principle provides a principled way to stop without requiring ground truth.
Lower bound on the residual
Since , any estimator satisfies
For an exact recovery , the residual equals .
Optimality of the discrepancy level
Under standard regularity conditions on and the true solution, choosing by the discrepancy principle yields a regularized solution that converges to as (Engl, Hanke, and Neubauer, 1996).
Definition: Warm-Starting from the Matched-Filter Image
Warm-Starting from the Matched-Filter Image
The matched-filter image (backpropagation image) is
where is the diagonal of the normal matrix. This is a single adjoint application followed by a normalization and costs .
Warm-starting uses as the initial iterate for any iterative algorithm, rather than the default .
The matched-filter image is far from optimal β it suffers from sidelobes and lacks sparsity β but it is already correlated with the true scene. Starting from it rather than zero typically saves 20--50% of the iterations needed to reach a given reconstruction quality.
In our RF imaging system, is the standard baseline (Matched-Filter Imaging).
Example: Stopping Criterion for ADMM in TV-Regularized Imaging
Consider ADMM applied to the TV-regularized imaging problem
with voxels, noise standard deviation , and . Design a stopping criterion using primal and dual residuals.
Set tolerance levels
The absolute tolerance scales with the problem dimension: . The relative tolerance: .
Compute residuals at each iteration
At iteration :
Adaptive threshold
Primal threshold: .
Dual threshold: .
Stop when and .
Convergence Diagnostics Dashboard
Visualize the convergence behavior of iterative algorithms on a sparse recovery problem. Panel 1: objective value vs iteration. Panel 2: fixed-point residual vs iteration. Panel 3: data-fidelity residual with the discrepancy level marked.
Compare warm-started (from matched filter) vs cold-started (from zero) initialization.
Parameters
Noise standard deviation
Adaptive Penalty Parameter Selection in ADMM
The convergence speed of ADMM depends strongly on the penalty parameter . A practical adaptive scheme (Boyd et al., 2011) adjusts at each iteration based on the ratio of primal and dual residuals:
with , .
The intuition: if the primal residual is much larger than the dual residual, increase to penalize constraint violation more heavily; conversely, if the dual residual dominates, decrease to allow more flexibility in the primal update.
Stopping Criteria for Imaging Algorithms
| Criterion | Applicable To | Requires Ground Truth? | Provides Certificate? |
|---|---|---|---|
| Fixed-point residual | All iterative algorithms | No | Necessary but not sufficient |
| Primal + dual residual | ADMM and variants | No | Yes (primal-dual optimality) |
| Primal-dual gap | Chambolle--Pock | No | Yes (absolute -optimality) |
| Discrepancy principle | All (with known ) | No (needs noise level) | Yes (regularization theory) |
| PSNR vs iteration | Simulation only | Yes | Detects semi-convergence |
| Objective decrease | All with known objective | No | Sanity check only |
Common Mistake: Semi-Convergence in Unregularized Iterations
Mistake:
Running an iterative algorithm (such as Landweber iteration or conjugate gradient on the normal equations) until the fixed-point residual is very small, without explicit regularization.
Correction:
For unregularized iterative methods, the number of iterations itself acts as a regularization parameter. PSNR initially improves as the algorithm fits the signal component, then decreases as it starts fitting the noise. This is semi-convergence.
Detection: plot PSNR vs iteration (in simulation) or monitor the data residual. If drops significantly below the noise level , the algorithm is overfitting.
Prevention: use the discrepancy principle to stop, or add explicit regularization (TV, , etc.) so that the regularized objective has a unique minimum and the algorithm converges to it monotonically.
Quick Check
In Morozov's discrepancy principle with noise level and safety factor , the algorithm should stop when the data residual satisfies:
The discrepancy principle stops when the residual reaches , fitting the data to the level justified by the noise.
Historical Note: Vladimir Morozov and the Discrepancy Principle
1966Vladimir Morozov proposed the discrepancy principle in 1966 while working on ill-posed problems at Moscow State University. The idea β that one should trust the data only to the level of the noise β was revolutionary in an era when many practitioners believed that more iterations (or smaller regularization) always meant better results.
Morozov's principle formalized the intuition that every inverse problem has an information limit set by the noise. Together with Tikhonov's regularization theory (developed at the same institution), it established the mathematical foundation for what we now call regularization theory β a foundation that every modern imaging algorithm builds upon.
Example: Warm-Starting FISTA for RF Image Reconstruction
Compare the convergence of FISTA for -regularized RF image reconstruction when initialized with: (a) (cold start), and (b) (matched-filter warm start). The problem has voxels, measurements, and dB.
Cold-start behavior
Starting from zero, FISTA must first identify the support of the sparse scene (which voxels are nonzero). The early iterations produce large objective decreases as the major scatterers are identified, followed by slow refinement. Typical convergence: iterations to reach relative change .
Warm-start behavior
The matched-filter image already identifies the approximate support and amplitudes of the major scatterers. FISTA refines the amplitudes and suppresses sidelobes. Typical convergence: -- iterations β a -- reduction.
Cost analysis
The matched-filter image costs one adjoint evaluation (), which is the same as one iteration of FISTA. The net savings: -- fewer iterations at the cost of one additional adjoint β a substantial improvement.
Common Mistake: Relative Change Can Be Misleading
Mistake:
Using only the relative change as a stopping criterion, without checking the data residual.
Correction:
Relative change can be small for two very different reasons:
- The algorithm has converged to a good solution.
- The algorithm is stuck at a bad solution (e.g., a local minimum in nonconvex problems, or a very flat objective landscape).
Always combine relative change with at least one of:
- Data residual (compared to the noise level via the discrepancy principle).
- Objective value (should be decreasing).
- Algorithm-specific residuals (primal/dual for ADMM).
Practical Monitoring Checklist
Always monitor (regardless of algorithm):
-
Objective value β should decrease monotonically for descent methods (ISTA, ADMM). FISTA may oscillate slightly due to momentum.
-
Fixed-point residual β stagnation below usually indicates convergence.
-
Data residual β compare against the noise level .
Warning signs:
- Objective increasing: step size too large or bug.
- Data residual dropping far below : overfitting (stop earlier or increase regularization).
- Primal residual stagnating (ADMM): too small.
- Dual residual stagnating (ADMM): too large.
Primal residual
In ADMM, the norm of the constraint violation . Measures how well the auxiliary variable agrees with the constraint.
Warm starting
Initializing an iterative algorithm from an approximate solution (e.g., the matched-filter image) rather than from zero. Reduces the number of iterations to convergence.
Key Takeaway
Convergence diagnostics are not optional β they are integral to every reconstruction pipeline. The fixed-point residual is universal and cheap. Primal and dual residuals provide ADMM-specific optimality certificates. The discrepancy principle sets the noise-appropriate stopping level without requiring ground truth. Warm-starting from the matched-filter image saves 40--60% of iterations. Always monitor the objective and data residual; any increase signals a step-size error or implementation bug.