Ferkans — Interactive Telecom Tutor

RED: Closing the Gap Between PnP and Variational Methods

PnP algorithms are powerful but lack an explicit objective function in general — it is unclear what they are minimising. This makes convergence analysis difficult and prevents the use of standard optimisation tools.

Regularization by Denoising (RED) (Romano, Elad, Milanfar, 2017) bridges this gap by constructing an explicit regulariser directly from the denoiser. RED converts the denoiser into a concrete penalty term, enabling standard gradient-descent convergence theory to apply.

Definition:
Regularization by Denoising (RED)

The RED regulariser derived from a denoiser $\mathcal{D}_\sigma$ is:

$R_\text{RED}(\mathbf{x}) = \frac{1}{2}\mathbf{x}^T(\mathbf{x} - \mathcal{D}_\sigma(\mathbf{x})) = \frac{1}{2}\|\mathbf{x}\|^2 - \frac{1}{2}\mathbf{x}^T\mathcal{D}_\sigma(\mathbf{x}).$

Under the Jacobian symmetry assumption ( $\nabla_\mathbf{x}\mathcal{D}_\sigma = [\nabla_\mathbf{x}\mathcal{D}_\sigma]^T$ ), its gradient simplifies to:

$\nabla R_\text{RED}(\mathbf{x}) = \mathbf{x} - \mathcal{D}_\sigma(\mathbf{x}).$

RED solves the variational problem: $\min_{\mathbf{x}} \; \frac{1}{2}\|\mathbf{y} - \mathbf{A}\mathbf{x}\|^2 + \lambda\, R_\text{RED}(\mathbf{x}).$

Unlike PnP, RED defines an explicit objective. This makes it amenable to standard optimisation theory: fixed points are stationary points of a well-defined objective, and convergence can be analysed with standard tools.

Theorem: RED Gradient Under Jacobian Symmetry

If $\mathcal{D}_\sigma$ has a symmetric Jacobian ( $\nabla_\mathbf{x}\mathcal{D}_\sigma(\mathbf{x}) = [\nabla_\mathbf{x}\mathcal{D}_\sigma(\mathbf{x})]^T$ ) and satisfies local homogeneity ( $\nabla_\mathbf{x}\mathcal{D}_\sigma(\mathbf{x}) \cdot \mathbf{x} \approx \mathcal{D}_\sigma(\mathbf{x})$ ), then:

$\nabla R_\text{RED}(\mathbf{x}) = \mathbf{x} - \mathcal{D}_\sigma(\mathbf{x}).$

The RED gradient descent update is: $\mathbf{x}^{(k+1)} = \mathbf{x}^{(k)} - \alpha\bigl[ \mathbf{A}^{H}(\mathbf{A}\mathbf{x}^{(k)} - \mathbf{y}) + \lambda(\mathbf{x}^{(k)} - \mathcal{D}_\sigma(\mathbf{x}^{(k)}))\bigr].$

The RED update combines two gradient corrections:

$-\alpha\mathbf{A}^{H}(\mathbf{A}\mathbf{x} - \mathbf{y})$ : move toward data consistency
$-\alpha\lambda(\mathbf{x} - \mathcal{D}_\sigma(\mathbf{x}))$ : move toward the denoiser output

The denoiser residual $\mathbf{x} - \mathcal{D}_\sigma(\mathbf{x})$ points from the current estimate away from the clean image manifold, so subtracting it pushes the iterate back toward the manifold.

Proof

Compute the full gradient

$\nabla R_\text{RED} = \nabla\!\left[\frac{1}{2}\|\mathbf{x}\|^2 - \frac{1}{2}\mathbf{x}^T\mathcal{D}_\sigma(\mathbf{x})\right] = \mathbf{x} - \frac{1}{2}\mathcal{D}_\sigma(\mathbf{x}) - \frac{1}{2}[\nabla_\mathbf{x}\mathcal{D}_\sigma(\mathbf{x})]^T\mathbf{x}.$ $

Apply Jacobian symmetry and local homogeneity

If $\nabla_\mathbf{x}\mathcal{D}_\sigma = [\nabla_\mathbf{x}\mathcal{D}_\sigma]^T$ and $\nabla_\mathbf{x}\mathcal{D}_\sigma \cdot \mathbf{x} \approx \mathcal{D}_\sigma(\mathbf{x})$ : $\nabla R_\text{RED} = \mathbf{x} - \frac{1}{2}\mathcal{D}_\sigma(\mathbf{x}) - \frac{1}{2}\mathcal{D}_\sigma(\mathbf{x}) = \mathbf{x} - \mathcal{D}_\sigma(\mathbf{x}). \quad\blacksquare$

Definition:
Score-Based Interpretation of RED

The RED gradient $\mathbf{x} - \mathcal{D}_\sigma(\mathbf{x})$ has a natural score-function interpretation via Tweedie's formula.

For a denoising model $\mathbf{v} = \mathbf{x} + \sigma\mathbf{n}$ , the MMSE denoiser satisfies: $\mathcal{D}_\sigma(\mathbf{v}) = \mathbf{v} + \sigma^2\nabla_\mathbf{v}\log p_\sigma(\mathbf{v}).$

Evaluated at $\mathbf{x}$ (clean image): $\mathbf{x} - \mathcal{D}_\sigma(\mathbf{x}) \approx -\sigma^2\nabla_\mathbf{x}\log p_\sigma(\mathbf{x}).$

The RED gradient is thus proportional to the negative score function of the (slightly blurred) image distribution, pointing away from high-probability regions of the prior. RED gradient descent pushes iterates toward higher probability under the implicit prior.

,

Historical Note: Impact and Limitations of RED

2017–present

Romano, Elad, and Milanfar introduced RED in SIAM Journal on Imaging Sciences (2017), framing it as a principled framework for converting any denoiser into a regulariser. The paper generated significant excitement for providing an explicit objective — something PnP lacked.

However, Reehorst and Schniter (2019) showed that the Jacobian symmetry assumption is rarely satisfied for deep denoisers, meaning the RED gradient formula is approximate. The exact gradient involves the Jacobian $\mathbf{J}_\mathcal{D}^T\mathbf{x}$ , which is expensive to compute. Despite this, RED continues to be used as an effective algorithm (with monitored convergence) even when the theoretical conditions are not met.

,

Example: RED vs. PnP-PGD: Comparison

Compare the per-iteration computational cost, theoretical properties, and step-size conditions of RED gradient descent and PnP-PGD.

Solution

Per-iteration cost

PnP-PGD: One gradient step + one denoiser call = $O(N\log N) + O(C_\mathcal{D})$ .

RED-GD: One gradient step + one denoiser call = identical cost.

The per-iteration computational cost is exactly the same.

Theoretical properties

Property	PnP-PGD	RED-GD
Explicit objective	Only if $\mathcal{D} = \nabla\Phi$	Yes (if Jacobian symmetric)
Convergence guarantee	Non-expansive $\mathcal{D}$	Convex $R_\text{RED}$
Fixed point type	Operator fixed point	Stationary point of objective
Step-size condition	$\alpha \leq 2/\\|\mathbf{A}\\|^2$	$\alpha \leq 2/(\\|\mathbf{A}\\|^2 + \lambda)$

Practical recommendation

Use PnP-ADMM when the data-fidelity subproblem has an efficient closed-form solver (Fourier operators). Use RED-GD when you want an explicit objective for monitoring and when the Jacobian symmetry assumption is approximately satisfied (e.g., linear denoisers, NLM). $\square$

Common Mistake: Deep Denoisers Rarely Have Symmetric Jacobians

Mistake:

Assuming that a DnCNN or DRUNet denoiser satisfies the Jacobian symmetry condition required for the RED gradient to be exact.

Correction:

Most deep denoisers (DnCNN, DRUNet, SwinIR) do not have symmetric Jacobians. This means:

The RED "gradient" $\mathbf{x} - \mathcal{D}_\sigma(\mathbf{x})$ may not be the gradient of any scalar function.
The algorithm may not be minimising the RED objective.
Convergence is not guaranteed by standard gradient descent theory.

Mitigations:

Use architecturally-enforced symmetric denoisers (e.g., spectral-normalised)
Accept the approximation and monitor convergence empirically
Use gradient-step denoisers (Section 21.3) for a principled alternative

RED Gradient $\mathbf{x} - \mathcal{D}_\sigma(\mathbf{x})$ Visualisation

Visualise the RED gradient $\mathbf{x} - \mathcal{D}_\sigma(\mathbf{x})$ for a 1-D signal with varying noise level $\sigma$ . The gradient field shows which direction the RED update pushes the signal at each point.

Observe that the gradient is small where the signal is smooth (the denoiser makes little change) and large where the signal has noise-like fluctuations (the denoiser makes large corrections).

Parameters

Denoiser noise level

\sigma

0.1

Signal type

Quick Check

Under what conditions does the RED gradient simplify to $\nabla R_\text{RED}(\mathbf{x}) = \mathbf{x} - \mathcal{D}_\sigma(\mathbf{x})$ ?

When the denoiser is a CNN

When the denoiser Jacobian is symmetric and locally homogeneous

When the denoiser is non-expansive

Always, regardless of denoiser type

Correction:

When the denoiser Jacobian is symmetric and locally homogeneous

The full gradient is $\mathbf{x} - \tfrac{1}{2}\mathcal{D}_\sigma(\mathbf{x}) - \tfrac{1}{2}[\nabla\mathcal{D}_\sigma]^T\mathbf{x}$ . Jacobian symmetry and local homogeneity ( $\nabla\mathcal{D}_\sigma \cdot \mathbf{x} \approx \mathcal{D}_\sigma(\mathbf{x})$ ) cause the two remaining terms to equal $\mathcal{D}_\sigma(\mathbf{x})$ .

Key Takeaway

RED defines the explicit regulariser $R_\text{RED}(\mathbf{x}) = \tfrac{1}{2}\mathbf{x}^T(\mathbf{x} - \mathcal{D}_\sigma(\mathbf{x}))$ and its gradient $\nabla R = \mathbf{x} - \mathcal{D}_\sigma(\mathbf{x})$ (under Jacobian symmetry). RED has the same per-iteration cost as PnP-PGD but provides an explicit objective. In practice, the Jacobian symmetry assumption is approximately satisfied for many denoisers, and RED is used as an effective algorithm even when the exact conditions fail.

RED: Regularization by Denoising

RED: Closing the Gap Between PnP and Variational Methods

Definition: Regularization by Denoising (RED)

Theorem: RED Gradient Under Jacobian Symmetry

Compute the full gradient

Apply Jacobian symmetry and local homogeneity

Definition: Score-Based Interpretation of RED

Historical Note: Impact and Limitations of RED

Example: RED vs. PnP-PGD: Comparison

Per-iteration cost

Theoretical properties

Practical recommendation

Common Mistake: Deep Denoisers Rarely Have Symmetric Jacobians

RED Gradient x−Dσ(x)\mathbf{x} - \mathcal{D}_\sigma(\mathbf{x})x−Dσ​(x) Visualisation

Parameters

Quick Check

Key Takeaway

Definition:
Regularization by Denoising (RED)

Definition:
Score-Based Interpretation of RED

RED Gradient $\mathbf{x} - \mathcal{D}_\sigma(\mathbf{x})$ Visualisation