RED: Regularization by Denoising

RED: Closing the Gap Between PnP and Variational Methods

PnP algorithms are powerful but lack an explicit objective function in general β€” it is unclear what they are minimising. This makes convergence analysis difficult and prevents the use of standard optimisation tools.

Regularization by Denoising (RED) (Romano, Elad, Milanfar, 2017) bridges this gap by constructing an explicit regulariser directly from the denoiser. RED converts the denoiser into a concrete penalty term, enabling standard gradient-descent convergence theory to apply.

Definition:

Regularization by Denoising (RED)

The RED regulariser derived from a denoiser Dσ\mathcal{D}_\sigma is:

RRED(x)=12xT(xβˆ’DΟƒ(x))=12βˆ₯xβˆ₯2βˆ’12xTDΟƒ(x).R_\text{RED}(\mathbf{x}) = \frac{1}{2}\mathbf{x}^T(\mathbf{x} - \mathcal{D}_\sigma(\mathbf{x})) = \frac{1}{2}\|\mathbf{x}\|^2 - \frac{1}{2}\mathbf{x}^T\mathcal{D}_\sigma(\mathbf{x}).

Under the Jacobian symmetry assumption (βˆ‡xDΟƒ=[βˆ‡xDΟƒ]T\nabla_\mathbf{x}\mathcal{D}_\sigma = [\nabla_\mathbf{x}\mathcal{D}_\sigma]^T), its gradient simplifies to:

βˆ‡RRED(x)=xβˆ’DΟƒ(x).\nabla R_\text{RED}(\mathbf{x}) = \mathbf{x} - \mathcal{D}_\sigma(\mathbf{x}).

RED solves the variational problem: min⁑xβ€…β€Š12βˆ₯yβˆ’Axβˆ₯2+λ RRED(x).\min_{\mathbf{x}} \; \frac{1}{2}\|\mathbf{y} - \mathbf{A}\mathbf{x}\|^2 + \lambda\, R_\text{RED}(\mathbf{x}).

Unlike PnP, RED defines an explicit objective. This makes it amenable to standard optimisation theory: fixed points are stationary points of a well-defined objective, and convergence can be analysed with standard tools.

Theorem: RED Gradient Under Jacobian Symmetry

If DΟƒ\mathcal{D}_\sigma has a symmetric Jacobian (βˆ‡xDΟƒ(x)=[βˆ‡xDΟƒ(x)]T\nabla_\mathbf{x}\mathcal{D}_\sigma(\mathbf{x}) = [\nabla_\mathbf{x}\mathcal{D}_\sigma(\mathbf{x})]^T) and satisfies local homogeneity (βˆ‡xDΟƒ(x)β‹…xβ‰ˆDΟƒ(x)\nabla_\mathbf{x}\mathcal{D}_\sigma(\mathbf{x}) \cdot \mathbf{x} \approx \mathcal{D}_\sigma(\mathbf{x})), then:

βˆ‡RRED(x)=xβˆ’DΟƒ(x).\nabla R_\text{RED}(\mathbf{x}) = \mathbf{x} - \mathcal{D}_\sigma(\mathbf{x}).

The RED gradient descent update is: x(k+1)=x(k)βˆ’Ξ±[AH(Ax(k)βˆ’y)+Ξ»(x(k)βˆ’DΟƒ(x(k)))].\mathbf{x}^{(k+1)} = \mathbf{x}^{(k)} - \alpha\bigl[ \mathbf{A}^{H}(\mathbf{A}\mathbf{x}^{(k)} - \mathbf{y}) + \lambda(\mathbf{x}^{(k)} - \mathcal{D}_\sigma(\mathbf{x}^{(k)}))\bigr].

The RED update combines two gradient corrections:

  1. βˆ’Ξ±AH(Axβˆ’y)-\alpha\mathbf{A}^{H}(\mathbf{A}\mathbf{x} - \mathbf{y}): move toward data consistency
  2. βˆ’Ξ±Ξ»(xβˆ’DΟƒ(x))-\alpha\lambda(\mathbf{x} - \mathcal{D}_\sigma(\mathbf{x})): move toward the denoiser output

The denoiser residual xβˆ’DΟƒ(x)\mathbf{x} - \mathcal{D}_\sigma(\mathbf{x}) points from the current estimate away from the clean image manifold, so subtracting it pushes the iterate back toward the manifold.

Definition:

Score-Based Interpretation of RED

The RED gradient xβˆ’DΟƒ(x)\mathbf{x} - \mathcal{D}_\sigma(\mathbf{x}) has a natural score-function interpretation via Tweedie's formula.

For a denoising model v=x+Οƒn\mathbf{v} = \mathbf{x} + \sigma\mathbf{n}, the MMSE denoiser satisfies: DΟƒ(v)=v+Οƒ2βˆ‡vlog⁑pΟƒ(v).\mathcal{D}_\sigma(\mathbf{v}) = \mathbf{v} + \sigma^2\nabla_\mathbf{v}\log p_\sigma(\mathbf{v}).

Evaluated at x\mathbf{x} (clean image): xβˆ’DΟƒ(x)β‰ˆβˆ’Οƒ2βˆ‡xlog⁑pΟƒ(x).\mathbf{x} - \mathcal{D}_\sigma(\mathbf{x}) \approx -\sigma^2\nabla_\mathbf{x}\log p_\sigma(\mathbf{x}).

The RED gradient is thus proportional to the negative score function of the (slightly blurred) image distribution, pointing away from high-probability regions of the prior. RED gradient descent pushes iterates toward higher probability under the implicit prior.

,

Historical Note: Impact and Limitations of RED

2017–present

Romano, Elad, and Milanfar introduced RED in SIAM Journal on Imaging Sciences (2017), framing it as a principled framework for converting any denoiser into a regulariser. The paper generated significant excitement for providing an explicit objective β€” something PnP lacked.

However, Reehorst and Schniter (2019) showed that the Jacobian symmetry assumption is rarely satisfied for deep denoisers, meaning the RED gradient formula is approximate. The exact gradient involves the Jacobian JDTx\mathbf{J}_\mathcal{D}^T\mathbf{x}, which is expensive to compute. Despite this, RED continues to be used as an effective algorithm (with monitored convergence) even when the theoretical conditions are not met.

,

Example: RED vs. PnP-PGD: Comparison

Compare the per-iteration computational cost, theoretical properties, and step-size conditions of RED gradient descent and PnP-PGD.

Common Mistake: Deep Denoisers Rarely Have Symmetric Jacobians

Mistake:

Assuming that a DnCNN or DRUNet denoiser satisfies the Jacobian symmetry condition required for the RED gradient to be exact.

Correction:

Most deep denoisers (DnCNN, DRUNet, SwinIR) do not have symmetric Jacobians. This means:

  1. The RED "gradient" xβˆ’DΟƒ(x)\mathbf{x} - \mathcal{D}_\sigma(\mathbf{x}) may not be the gradient of any scalar function.
  2. The algorithm may not be minimising the RED objective.
  3. Convergence is not guaranteed by standard gradient descent theory.

Mitigations:

  • Use architecturally-enforced symmetric denoisers (e.g., spectral-normalised)
  • Accept the approximation and monitor convergence empirically
  • Use gradient-step denoisers (Section 21.3) for a principled alternative

RED Gradient xβˆ’DΟƒ(x)\mathbf{x} - \mathcal{D}_\sigma(\mathbf{x}) Visualisation

Visualise the RED gradient xβˆ’DΟƒ(x)\mathbf{x} - \mathcal{D}_\sigma(\mathbf{x}) for a 1-D signal with varying noise level Οƒ\sigma. The gradient field shows which direction the RED update pushes the signal at each point.

Observe that the gradient is small where the signal is smooth (the denoiser makes little change) and large where the signal has noise-like fluctuations (the denoiser makes large corrections).

Parameters
0.1

Quick Check

Under what conditions does the RED gradient simplify to βˆ‡RRED(x)=xβˆ’DΟƒ(x)\nabla R_\text{RED}(\mathbf{x}) = \mathbf{x} - \mathcal{D}_\sigma(\mathbf{x})?

When the denoiser is a CNN

When the denoiser Jacobian is symmetric and locally homogeneous

When the denoiser is non-expansive

Always, regardless of denoiser type

Key Takeaway

RED defines the explicit regulariser RRED(x)=12xT(xβˆ’DΟƒ(x))R_\text{RED}(\mathbf{x}) = \tfrac{1}{2}\mathbf{x}^T(\mathbf{x} - \mathcal{D}_\sigma(\mathbf{x})) and its gradient βˆ‡R=xβˆ’DΟƒ(x)\nabla R = \mathbf{x} - \mathcal{D}_\sigma(\mathbf{x}) (under Jacobian symmetry). RED has the same per-iteration cost as PnP-PGD but provides an explicit objective. In practice, the Jacobian symmetry assumption is approximately satisfied for many denoisers, and RED is used as an effective algorithm even when the exact conditions fail.