Convergence Theory for PnP

The Convergence Challenge: When Is PnP Safe?

Standard ADMM and PGD convergence theory requires the proximal operator to be the exact proximal of a convex function. Deep denoisers violate this assumption in two ways:

  1. They may not be the proximal operator of any function.
  2. Even if they are, the implicit function may not be convex.

Without convergence guarantees, PnP iterations can oscillate, diverge, or converge to artefact-laden images. This section develops three lines of theory with increasing structural requirements:

  • Non-expansive denoisers (Ryu et al., 2019): weakest condition, convergence to a fixed point
  • Gradient-step denoisers (Hurault et al., 2022): denoiser is a gradient, convergence to an objective minimum
  • ICNN-based denoisers (Pesquet et al., 2021): strongest, global convergence guarantees
,

Definition:

Lipschitz Condition for Denoisers

A denoiser Dσ\mathcal{D}_\sigma has Lipschitz constant LL if

Dσ(a)Dσ(b)Laba,b.\|\mathcal{D}_\sigma(\mathbf{a}) - \mathcal{D}_\sigma(\mathbf{b})\| \leq L\,\|\mathbf{a} - \mathbf{b}\| \qquad \forall\, \mathbf{a}, \mathbf{b}.

Key regimes:

Condition Meaning PnP implication
L1L \leq 1 Non-expansive PnP-PGD convergence (fixed point)
L<1L < 1 Contractive PnP-ADMM convergence + uniqueness
L>1L > 1 Expansive PnP may diverge

For a DD-layer network: L=1DWL \leq \prod_{\ell=1}^D \|\mathbf{W}_\ell\|.

Most trained DnCNN variants have L1.5L \approx 1.533 without spectral normalisation. This is why unconstrained PnP can diverge for small step sizes or weak penalty parameters.

Theorem: PnP-PGD Convergence via Non-Expansiveness

If the denoiser Dσ\mathcal{D}_\sigma is non-expansive (Lipschitz constant L1L \leq 1) and the gradient step Gα(x)=xαAH(Axy)\mathcal{G}_\alpha(\mathbf{x}) = \mathbf{x} - \alpha\mathbf{A}^{H}(\mathbf{A}\mathbf{x} - \mathbf{y}) is non-expansive (α2/A2\alpha \leq 2/\|\mathbf{A}\|^2), then the PnP-PGD iterates c(k+1)=Dσ(Gα(c(k)))\mathbf{c}^{(k+1)} = \mathcal{D}_\sigma(\mathcal{G}_\alpha(\mathbf{c}^{(k)})) converge to a fixed point.

PnP-PGD is the composition of two non-expansive maps. When one of them is averaged (strictly contractive in a relaxed sense), the composition is also averaged, and the Krasnoselskii-Mann iteration theorem guarantees convergence to a fixed point.

,

Theorem: PnP-ADMM Convergence via Strong Monotonicity

PnP-ADMM converges to a unique fixed point if the denoiser Dσ\mathcal{D}_\sigma satisfies:

  1. Boundedness: Dσ(v)C(1+v)\|\mathcal{D}_\sigma(\mathbf{v})\| \leq C(1 + \|\mathbf{v}\|)
  2. Strong monotonicity of the residual: (IDσ)(a)(IDσ)(b),  abγab2\langle (\mathbf{I} - \mathcal{D}_\sigma)(\mathbf{a}) - (\mathbf{I} - \mathcal{D}_\sigma)(\mathbf{b}),\; \mathbf{a} - \mathbf{b}\rangle \geq \gamma\|\mathbf{a} - \mathbf{b}\|^2 for some γ>0\gamma > 0
  3. Positive penalty: ρ>0\rho > 0

Strong monotonicity of IDσ\mathbf{I} - \mathcal{D}_\sigma is equivalent to Dσ\mathcal{D}_\sigma being contractive in a certain sense. It prevents the denoiser from moving iterates "too far," stopping oscillation.

Definition:

Gradient-Step Denoiser (Hurault et al., 2022)

A gradient-step denoiser is trained so that the denoising correction is the gradient of a learned scalar potential gσg_\sigma:

Dσ(v)=vgσ(v).\mathcal{D}_\sigma(\mathbf{v}) = \mathbf{v} - \nabla g_\sigma(\mathbf{v}).

The potential gσg_\sigma is parameterised by a neural network and trained to satisfy gσvE[xv]\nabla g_\sigma \approx \mathbf{v} - \mathbb{E}[\mathbf{x}|\mathbf{v}] (the noise residual under the MMSE denoiser).

Consequence: PnP-PGD with a gradient-step denoiser provably minimises: minx  12yAx2+gσ(x).\min_{\mathbf{x}} \; \frac{1}{2}\|\mathbf{y} - \mathbf{A}\mathbf{x}\|^2 + g_\sigma(\mathbf{x}).

Gradient-step denoisers close the gap between PnP and variational methods: they achieve near-DRUNet denoising quality while providing the convergence guarantees of a proper proximal algorithm. The key training cost is computing gσ\nabla g_\sigma via automatic differentiation.

Definition:

ICNN-Based Denoiser

An Input-Convex Neural Network (ICNN) Φθ ⁣:RNR\Phi_\theta \colon \mathbb{R}^N \to \mathbb{R} is convex in its input by construction:

  1. Non-negative passthrough weights: W(z)0\mathbf{W}_\ell^{(z)} \geq 0
  2. Convex, non-decreasing activations: ReLU or softplus
  3. Arbitrary skip connections from input to each layer

The ICNN-based denoiser is the proximal operator of Φθ\Phi_\theta: DICNN(v)=proxσ2Φθ(v)=argminx12xv2+σ2Φθ(x).\mathcal{D}_\text{ICNN}(\mathbf{v}) = \operatorname{prox}_{\sigma^2\Phi_\theta}(\mathbf{v}) = \arg\min_{\mathbf{x}} \frac{1}{2}\|\mathbf{x} - \mathbf{v}\|^2 + \sigma^2\Phi_\theta(\mathbf{x}).

By construction, Φθ\Phi_\theta is convex, so DICNN\mathcal{D}_\text{ICNN} is the exact proximal of a convex function.

ICNN denoisers guarantee global convergence of PnP-ADMM and PnP-PGD to a unique minimiser of f+σ2Φθf + \sigma^2\Phi_\theta. The price is an expressivity gap of approximately 1–1.5 dB PSNR compared to unconstrained networks.

Theorem: Global Convergence of PnP with ICNN Denoisers

If Dσ=proxσ2Φθ\mathcal{D}_\sigma = \operatorname{prox}_{\sigma^2\Phi_\theta} where Φθ\Phi_\theta is an ICNN, then PnP-ADMM and PnP-PGD converge to the unique global minimiser of: minx  12yAx2+σ2Φθ(x).\min_{\mathbf{x}} \; \frac{1}{2}\|\mathbf{y} - \mathbf{A}\mathbf{x}\|^2 + \sigma^2\Phi_\theta(\mathbf{x}).

The sum of two convex functions is convex. ADMM and PGD with exact proximal operators of convex functions converge to the global minimum by standard theory. Since DICNN\mathcal{D}_\text{ICNN} is exactly the proximal of σ2Φθ\sigma^2\Phi_\theta, all standard guarantees apply.

,

Convergence Properties of PnP Variants

Denoiser typeConvergence guaranteeObjective minimisedPSNR (relative)
BM3D / unconstrained DnCNNNone in generalNone (fixed-point iteration)Best
Non-expansive DnCNN (L1L \leq 1)Fixed point (PnP-PGD)None explicitGood (\sim1 dB below BM3D)
Gradient-step denoiserConvergence to objective minimumf(x)+gσ(x)f(\mathbf{x}) + g_\sigma(\mathbf{x})Near-best
ICNN denoiserGlobal minimum (unique)f(x)+σ2Φθ(x)f(\mathbf{x}) + \sigma^2\Phi_\theta(\mathbf{x})Moderate (\sim1.5 dB below best)

Example: Training a Lipschitz-Constrained DnCNN

Describe how to train a DnCNN with Lipschitz constant L1L \leq 1 using spectral normalisation.

Common Mistake: Theoretical Convergence vs. Practical Performance

Mistake:

Strictly enforcing the Lipschitz constraint L1L \leq 1 during training, leading to a weakened denoiser and poor reconstruction.

Correction:

Enforcing L1L \leq 1 limits the denoiser's expressivity. In practice, the best PnP results use denoisers with LL slightly above 1 (typically 1.0–1.5). These may violate the theoretical convergence conditions but converge empirically for well-chosen ρ\rho.

Practical approach: Train without strict Lipschitz constraints, then monitor the PnP primal residual c(k+1)z(k+1)\|\mathbf{c}^{(k+1)} - \mathbf{z}^{(k+1)}\|. If it diverges, increase ρ\rho (reducing the effective denoiser input variance) or apply early stopping.

PnP-ADMM Convergence vs Noise Level and Penalty

Explore how the denoiser noise level σ\sigma and ADMM penalty ρ\rho affect PnP-ADMM convergence. The plot shows the primal residual x(k+1)z(k+1)\|\mathbf{x}^{(k+1)} - \mathbf{z}^{(k+1)}\| vs iteration number.

For σρ<λ\sigma\sqrt{\rho} < \lambda (under-regularised), the algorithm may oscillate. For σρλ\sigma\sqrt{\rho} \gg \lambda (over-regularised), convergence is fast but the reconstruction is blurry.

Parameters
0.05
1
50

Key Takeaway

PnP convergence theory offers three levels: (1) non-expansive denoisers guarantee fixed-point convergence for PnP-PGD; (2) gradient-step denoisers guarantee minimisation of an explicit objective; (3) ICNN denoisers guarantee global minimum convergence at the cost of reduced expressivity. In practice, the best results often use unconstrained DRUNet with empirical convergence monitoring, accepting the gap between theory and performance.