Ferkans — Interactive Telecom Tutor

The Plug-and-Play Idea: Denoisers as Proximal Operators

Iterative algorithms like ADMM and proximal gradient descent split the reconstruction problem into a data-fidelity step (enforcing measurement consistency) and a proximal/denoising step (imposing the prior). The proximal step $\operatorname{prox}_{\lambda R}$ is equivalent to Gaussian denoising:

$\operatorname{prox}_{\lambda R}(\mathbf{v}) = \arg\min_{\mathbf{x}} \frac{1}{2}\|\mathbf{x} - \mathbf{v}\|^2 + \lambda R(\mathbf{x}).$

The Plug-and-Play (PnP) principle exploits this equivalence: replace the proximal operator with any off-the-shelf denoiser $\mathcal{D}_\sigma$ , even one that is not the proximal of any explicit function. This decouples algorithm design from prior design, allowing state-of-the-art denoisers (BM3D, DnCNN, DRUNet) to be "plugged in" without modification.

Historical Note: Origins of Plug-and-Play Priors

2013–present

The PnP framework was introduced by Venkatakrishnan, Bouman, and Wohlberg at GlobalSIP 2013. Their key observation was that ADMM's variable-splitting structure isolates the prior into a single subproblem — the proximal step — which could be solved by any existing denoiser without changing the rest of the algorithm. Within a decade PnP had spread to MRI, CT, microscopy, and RF imaging, with hundreds of follow-up works establishing convergence theory, deep denoiser variants, and domain-specific adaptations.

Theorem: Proximal Operators Are MAP Gaussian Denoisers

If $R \colon \mathbb{R}^N \to \mathbb{R} \cup \{+\infty\}$ is a proper, lower semicontinuous, convex function, then $\operatorname{prox}_{\lambda R}$ is the MAP denoiser for the model $\mathbf{v} = \mathbf{x} + \mathbf{n}$ with prior $p(\mathbf{x}) \propto e^{-R(\mathbf{x})/\sigma^2}$ and $\mathbf{n} \sim \mathcal{N}(\mathbf{0}, \sigma^2 \mathbf{I})$ , where $\sigma^2 = \lambda$ .

The MAP estimate under Gaussian noise and a log-concave prior is $\hat{\mathbf{x}} = \arg\min_\mathbf{x} \tfrac{1}{2\sigma^2}\|\mathbf{v} - \mathbf{x}\|^2 + R(\mathbf{x})$ , which is exactly the proximal operator with $\lambda = \sigma^2$ . So the proximal operator is Gaussian denoising with a specific prior.

Proof

Write the MAP objective

$\hat{\mathbf{x}}_{\text{MAP}} = \arg\min_{\mathbf{x}} \left[-\log p(\mathbf{v} \mid \mathbf{x}) - \log p(\mathbf{x})\right] = \arg\min_{\mathbf{x}} \frac{1}{2\sigma^2}\|\mathbf{v} - \mathbf{x}\|^2 + R(\mathbf{x}).$ $

Match with the proximal definition

With $\lambda = \sigma^2$ : $\hat{\mathbf{x}}_{\text{MAP}} = \arg\min_{\mathbf{x}} \frac{1}{2}\|\mathbf{v} - \mathbf{x}\|^2 + \lambda R(\mathbf{x}) = \operatorname{prox}_{\lambda R}(\mathbf{v}). \quad\blacksquare$

,

Definition:
The Plug-and-Play Framework

The Plug-and-Play (PnP) framework replaces the proximal operator in an iterative algorithm with a denoiser $\mathcal{D}_\sigma$ :

Component	Standard algorithm	PnP variant
Proximal step	$\operatorname{prox}_{\lambda R}(\mathbf{v})$	$\mathcal{D}_\sigma(\mathbf{v})$
Regulariser	Explicit $R(\mathbf{x})$	Implicit (defined by $\mathcal{D}_\sigma$ )
Convergence	Guaranteed (convex $R$ )	Requires analysis

The denoiser implicitly defines a regulariser $R$ if it is a valid proximal operator, or a more general operator otherwise. The correspondence between the denoiser noise level and the ADMM penalty is $\sigma = \sqrt{\lambda/\rho}$ .

The PnP framework is modular: the data-fidelity component and the denoiser are developed independently. A better denoiser immediately improves reconstruction, without retraining or redesigning the algorithm. This modularity is PnP's greatest practical strength.

Definition:
PnP-ADMM Algorithm

PnP-ADMM replaces the $\mathbf{z}$ -update (proximal step) in ADMM with a denoiser $\mathcal{D}_\sigma$ . For the model $\mathbf{y} = \mathbf{A}\mathbf{c} + \mathbf{w}$ :

$\mathbf{c}^{(k+1)} = \bigl(\mathbf{A}^{H}\mathbf{A} + \rho\mathbf{I}\bigr)^{-1} \bigl(\mathbf{A}^{H}\mathbf{y} + \rho(\mathbf{z}^{(k)} - \mathbf{u}^{(k)})\bigr)$

$\mathbf{z}^{(k+1)} = \mathcal{D}_\sigma\!\bigl(\mathbf{c}^{(k+1)} + \mathbf{u}^{(k)}\bigr)$

$\mathbf{u}^{(k+1)} = \mathbf{u}^{(k)} + \mathbf{c}^{(k+1)} - \mathbf{z}^{(k+1)}$

The noise level $\sigma$ relates to the ADMM penalty as $\sigma = \sqrt{\lambda/\rho}$ .

The $\mathbf{c}$ -update is unchanged from standard ADMM — it enforces data consistency via a linear solve. Only the proximal step is replaced. This means PnP-ADMM reuses any existing efficient linear solver for the data-fidelity subproblem (e.g., FFT-based inversion for Fourier operators).

Definition:
PnP-PGD Algorithm

PnP-PGD replaces the proximal step in proximal gradient descent:

$\mathbf{c}^{(k+1)} = \mathcal{D}_\sigma\!\bigl(\mathbf{c}^{(k)} - \alpha\,\mathbf{A}^{H}(\mathbf{A}\mathbf{c}^{(k)} - \mathbf{y})\bigr)$

where $\alpha$ is the step size (typically $\alpha = 1/\|\mathbf{A}\|^2$ ). Each iteration alternates:

Gradient step: $\tilde{\mathbf{c}}^{(k)} = \mathbf{c}^{(k)} - \alpha\,\nabla f(\mathbf{c}^{(k)})$ , where $f(\mathbf{c}) = \tfrac{1}{2}\|\mathbf{y} - \mathbf{A}\mathbf{c}\|^2$
Denoising step: $\mathbf{c}^{(k+1)} = \mathcal{D}_\sigma(\tilde{\mathbf{c}}^{(k)})$

PnP-PGD is simpler than PnP-ADMM (no splitting variable, no dual update) but generally converges more slowly per iteration. The choice depends on the computational cost of the linear solve versus simplicity of implementation.

PnP-ADMM

Complexity:

O(K \cdot [C_\text{solve} + C_\mathcal{D}])

per reconstruction

Input:

\mathbf{y}

,

\mathbf{A}

, denoiser

\mathcal{D}_\sigma

,

penalty

\rho > 0

, max iterations

K

Output: Reconstruction

\hat{\mathbf{c}}

1. Initialise

\mathbf{c}^{(0)} \leftarrow \mathbf{A}^{H}\mathbf{y}

,

\mathbf{z}^{(0)} \leftarrow \mathbf{c}^{(0)}

,

\mathbf{u}^{(0)} \leftarrow \mathbf{0}

2. for

k = 0, 1, \ldots, K-1

do

3.

\quad

\mathbf{c}^{(k+1)} \leftarrow (\mathbf{A}^{H}\mathbf{A} + \rho\mathbf{I})^{-1} (\mathbf{A}^{H}\mathbf{y} + \rho(\mathbf{z}^{(k)} - \mathbf{u}^{(k)}))

4.

\quad

\mathbf{z}^{(k+1)} \leftarrow \mathcal{D}_\sigma(\mathbf{c}^{(k+1)} + \mathbf{u}^{(k)})

\quad

(plug denoiser in here)

5.

\quad

\mathbf{u}^{(k+1)} \leftarrow \mathbf{u}^{(k)} + \mathbf{c}^{(k+1)} - \mathbf{z}^{(k+1)}

6.

\quad

if

\|\mathbf{c}^{(k+1)} - \mathbf{z}^{(k+1)}\| < \epsilon

: break

7. end for

8. return

\hat{\mathbf{c}} \leftarrow \mathbf{c}^{(K)}

Line 3 is the data-consistency step (unchanged from standard ADMM). Line 4 is the denoising step (the "plug-and-play" substitution). For Fourier sensing operators, step 3 costs $O(N \log N)$ via FFT.

Example: Efficient PnP-ADMM $\mathbf{c}$ -Update for Fourier Sensing

Derive the efficient $\mathbf{c}$ -update for a partial Fourier sensing matrix $\mathbf{A} = \mathbf{P}_\Omega\mathbf{F}$ , where $\mathbf{F}$ is the unitary DFT and $\mathbf{P}_\Omega$ selects measurements in $\Omega$ . Show the update costs two FFTs.

Solution

Diagonalise in the Fourier domain

$\mathbf{A}^{H}\mathbf{A} = \mathbf{F}^H\mathbf{P}_\Omega^T\mathbf{P}_\Omega\mathbf{F} = \mathbf{F}^H\mathbf{D}_\Omega\mathbf{F}$ $where$ [\mathbf{D}\Omega]{kk} = 1 $if$ k \in \Omega $, else$ 0 $. Therefore$ (\mathbf{A}^{H}\mathbf{A} + \rho\mathbf{I})^{-1} = \mathbf{F}^H(\mathbf{D}_\Omega + \rho\mathbf{I})^{-1}\mathbf{F}$.

Write the closed-form update

$\mathbf{c}^{(k+1)} = \mathbf{F}^H \frac{\mathbf{D}_\Omega \cdot \mathbf{F}\mathbf{A}^{H}\mathbf{y} + \rho\,\mathbf{F}(\mathbf{z}^{(k)} - \mathbf{u}^{(k)})} {\mathbf{D}_\Omega + \rho}$ $ where division is element-wise.

Count operations

One FFT for $\mathbf{F}(\mathbf{z}^{(k)} - \mathbf{u}^{(k)})$ , element-wise multiply/divide, one inverse FFT. Total cost: $2 \times O(N\log N)$ . The term $\mathbf{F}\mathbf{A}^{H}\mathbf{y}$ is precomputed once. $\blacksquare$

Example: Common PnP Denoisers and Their Properties

Describe BM3D, DnCNN, and DRUNet as PnP denoisers. For each, identify their strengths, weaknesses, and key PnP-relevant properties.

Solution

BM3D (Block-Matching 3D)

A non-local, patch-based denoiser grouping similar patches and collaboratively filtering in a transform domain.

Strengths: Excellent denoising quality; no training required
Weaknesses: Not differentiable; computationally expensive
PnP note: Not the proximal of any known function; Lipschitz constant not easily controlled

DnCNN (Denoising CNN)

A feedforward CNN with residual learning: $\mathcal{D}_\sigma(\mathbf{v}) = \mathbf{v} - f_\theta(\mathbf{v})$ where $f_\theta$ estimates the noise component.

Strengths: Fast, differentiable, GPU-accelerated
Weaknesses: Trained for a fixed noise level $\sigma$
PnP note: Generally not a proximal operator; Lipschitz constant depends on spectral norms of weight matrices

DRUNet (Denoising Residual U-Net)

A U-Net that takes the noise level $\sigma$ as an additional input channel, enabling a single network to denoise at any noise level.

Strengths: Single model for all noise levels; state-of-the-art quality
Weaknesses: Large model; computationally expensive per call
PnP note: The noise-level input naturally maps to the $\lambda$ parameter in PnP, enabling adaptive denoising across iterations

,

Quick Check

What is the key insight that justifies the Plug-and-Play framework?

Any denoiser can be used as a neural network layer.

The proximal step in iterative algorithms is equivalent to Gaussian denoising, so any denoiser can replace it.

Denoisers always converge faster than proximal operators.

PnP eliminates the need for a forward model.

Correction:

The proximal step in iterative algorithms is equivalent to Gaussian denoising, so any denoiser can replace it.

$\operatorname{prox}_{\lambda R}$ solves the same problem as MAP denoising with prior $p(\mathbf{x}) \propto e^{-R(\mathbf{x})/\sigma^2}$ and noise variance $\sigma^2 = \lambda$ . PnP replaces this with an arbitrary denoiser, implicitly defining a (possibly non-explicit) prior.

Common Mistake: Mismatched Denoiser Noise Level

Mistake:

Using a denoiser trained for noise level $\sigma_\text{train}$ in PnP with effective noise level $\sigma_\text{eff} \neq \sigma_\text{train}$ .

Correction:

In PnP-ADMM the effective noise level is $\sigma_\text{eff} = \sqrt{\lambda/\rho}$ . If this does not match the denoiser's training noise level, the denoiser under- or over-denoisers, leading to poor reconstruction or divergence.

Solutions:

Use DRUNet, which accepts $\sigma$ as an input and handles any level.
Adapt $\rho$ so that $\sqrt{\lambda/\rho} \approx \sigma_\text{train}$ .
Apply a noise-level schedule that decreases $\sigma$ across iterations.

Key Takeaway

The proximal operator is MAP Gaussian denoising with a specific prior, making any denoiser a valid (if theoretically informal) proximal replacement. PnP-ADMM and PnP-PGD swap the proximal step for an off-the-shelf denoiser while keeping data-consistency steps unchanged, yielding a modular algorithm where denoiser quality directly determines reconstruction quality.

The Plug-and-Play Principle

The Plug-and-Play Idea: Denoisers as Proximal Operators

Historical Note: Origins of Plug-and-Play Priors

Theorem: Proximal Operators Are MAP Gaussian Denoisers

Write the MAP objective

Match with the proximal definition

Definition: The Plug-and-Play Framework

Definition: PnP-ADMM Algorithm

Definition: PnP-PGD Algorithm

PnP-ADMM

Example: Efficient PnP-ADMM c\mathbf{c}c-Update for Fourier Sensing

Diagonalise in the Fourier domain

Write the closed-form update

Count operations

Example: Common PnP Denoisers and Their Properties

BM3D (Block-Matching 3D)

DnCNN (Denoising CNN)

DRUNet (Denoising Residual U-Net)

Quick Check

Common Mistake: Mismatched Denoiser Noise Level

Key Takeaway

Definition:
The Plug-and-Play Framework

Definition:
PnP-ADMM Algorithm

Definition:
PnP-PGD Algorithm

Example: Efficient PnP-ADMM $\mathbf{c}$ -Update for Fourier Sensing