Convergence Theory for PnP
The Convergence Challenge: When Is PnP Safe?
Standard ADMM and PGD convergence theory requires the proximal operator to be the exact proximal of a convex function. Deep denoisers violate this assumption in two ways:
- They may not be the proximal operator of any function.
- Even if they are, the implicit function may not be convex.
Without convergence guarantees, PnP iterations can oscillate, diverge, or converge to artefact-laden images. This section develops three lines of theory with increasing structural requirements:
- Non-expansive denoisers (Ryu et al., 2019): weakest condition, convergence to a fixed point
- Gradient-step denoisers (Hurault et al., 2022): denoiser is a gradient, convergence to an objective minimum
- ICNN-based denoisers (Pesquet et al., 2021): strongest, global convergence guarantees
Definition: Lipschitz Condition for Denoisers
Lipschitz Condition for Denoisers
A denoiser has Lipschitz constant if
Key regimes:
| Condition | Meaning | PnP implication |
|---|---|---|
| Non-expansive | PnP-PGD convergence (fixed point) | |
| Contractive | PnP-ADMM convergence + uniqueness | |
| Expansive | PnP may diverge |
For a -layer network: .
Most trained DnCNN variants have – without spectral normalisation. This is why unconstrained PnP can diverge for small step sizes or weak penalty parameters.
Theorem: PnP-PGD Convergence via Non-Expansiveness
If the denoiser is non-expansive (Lipschitz constant ) and the gradient step is non-expansive (), then the PnP-PGD iterates converge to a fixed point.
PnP-PGD is the composition of two non-expansive maps. When one of them is averaged (strictly contractive in a relaxed sense), the composition is also averaged, and the Krasnoselskii-Mann iteration theorem guarantees convergence to a fixed point.
Averaged operator formulation
The gradient step is -averaged with when .
An averaged operator satisfies:
Composition is averaged
Since is non-expansive and is averaged, their composition is averaged by Bauschke–Combettes composition theory.
Apply Krasnoselskii-Mann
The Krasnoselskii-Mann theorem states that iterates of an averaged operator converge weakly to a fixed point, provided the fixed-point set is non-empty.
Theorem: PnP-ADMM Convergence via Strong Monotonicity
PnP-ADMM converges to a unique fixed point if the denoiser satisfies:
- Boundedness:
- Strong monotonicity of the residual: for some
- Positive penalty:
Strong monotonicity of is equivalent to being contractive in a certain sense. It prevents the denoiser from moving iterates "too far," stopping oscillation.
Operator splitting interpretation
PnP-ADMM is Douglas–Rachford splitting applied to (subdifferential of data fidelity) and (the implicit regularisation operator).
Convergence via monotonicity
If is -strongly monotone and is maximal monotone, then is -strongly monotone, guaranteeing a unique zero and convergence of Douglas–Rachford.
Definition: Gradient-Step Denoiser (Hurault et al., 2022)
Gradient-Step Denoiser (Hurault et al., 2022)
A gradient-step denoiser is trained so that the denoising correction is the gradient of a learned scalar potential :
The potential is parameterised by a neural network and trained to satisfy (the noise residual under the MMSE denoiser).
Consequence: PnP-PGD with a gradient-step denoiser provably minimises:
Gradient-step denoisers close the gap between PnP and variational methods: they achieve near-DRUNet denoising quality while providing the convergence guarantees of a proper proximal algorithm. The key training cost is computing via automatic differentiation.
Definition: ICNN-Based Denoiser
ICNN-Based Denoiser
An Input-Convex Neural Network (ICNN) is convex in its input by construction:
- Non-negative passthrough weights:
- Convex, non-decreasing activations: ReLU or softplus
- Arbitrary skip connections from input to each layer
The ICNN-based denoiser is the proximal operator of :
By construction, is convex, so is the exact proximal of a convex function.
ICNN denoisers guarantee global convergence of PnP-ADMM and PnP-PGD to a unique minimiser of . The price is an expressivity gap of approximately 1–1.5 dB PSNR compared to unconstrained networks.
Theorem: Global Convergence of PnP with ICNN Denoisers
If where is an ICNN, then PnP-ADMM and PnP-PGD converge to the unique global minimiser of:
The sum of two convex functions is convex. ADMM and PGD with exact proximal operators of convex functions converge to the global minimum by standard theory. Since is exactly the proximal of , all standard guarantees apply.
Convexity of the composite objective
is convex (quadratic). is convex by the ICNN architecture. Their sum is convex with a unique minimiser (when has full column rank).
Apply standard ADMM convergence
ADMM with exact proximal operators of convex and convex converges to a minimiser by the standard ADMM convergence theorem (Boyd et al., 2011).
Convergence Properties of PnP Variants
| Denoiser type | Convergence guarantee | Objective minimised | PSNR (relative) |
|---|---|---|---|
| BM3D / unconstrained DnCNN | None in general | None (fixed-point iteration) | Best |
| Non-expansive DnCNN () | Fixed point (PnP-PGD) | None explicit | Good (1 dB below BM3D) |
| Gradient-step denoiser | Convergence to objective minimum | Near-best | |
| ICNN denoiser | Global minimum (unique) | Moderate (1.5 dB below best) |
Example: Training a Lipschitz-Constrained DnCNN
Describe how to train a DnCNN with Lipschitz constant using spectral normalisation.
Spectral normalisation per layer
For each convolutional layer with weight , compute the spectral norm via power iteration and normalise: This ensures .
Global Lipschitz bound
For a -layer network with ReLU activations (1-Lipschitz):
Residual learning caveat
DnCNN uses . The Lipschitz constant of is at most . To enforce , one needs , which is too restrictive. In practice, constrain for small and accept a mild over-1 bound.
Common Mistake: Theoretical Convergence vs. Practical Performance
Mistake:
Strictly enforcing the Lipschitz constraint during training, leading to a weakened denoiser and poor reconstruction.
Correction:
Enforcing limits the denoiser's expressivity. In practice, the best PnP results use denoisers with slightly above 1 (typically 1.0–1.5). These may violate the theoretical convergence conditions but converge empirically for well-chosen .
Practical approach: Train without strict Lipschitz constraints, then monitor the PnP primal residual . If it diverges, increase (reducing the effective denoiser input variance) or apply early stopping.
PnP-ADMM Convergence vs Noise Level and Penalty
Explore how the denoiser noise level and ADMM penalty affect PnP-ADMM convergence. The plot shows the primal residual vs iteration number.
For (under-regularised), the algorithm may oscillate. For (over-regularised), convergence is fast but the reconstruction is blurry.
Parameters
Key Takeaway
PnP convergence theory offers three levels: (1) non-expansive denoisers guarantee fixed-point convergence for PnP-PGD; (2) gradient-step denoisers guarantee minimisation of an explicit objective; (3) ICNN denoisers guarantee global minimum convergence at the cost of reduced expressivity. In practice, the best results often use unconstrained DRUNet with empirical convergence monitoring, accepting the gap between theory and performance.