Ferkans — Interactive Telecom Tutor

ex21-01-prox-denoiser

Easy

Show that $\operatorname{prox}_{\lambda\|\cdot\|_1}(\mathbf{v})$ is soft-thresholding, and interpret it as a MAP denoiser for the model $\mathbf{v} = \mathbf{x} + \mathbf{n}$ where $\mathbf{n} \sim \mathcal{N}(\mathbf{0}, \lambda\mathbf{I})$ and $\mathbf{x}$ has a Laplace prior.

Show Hint

The problem separates into component-wise scalar subproblems.

For each component: $\min_{x_i} \tfrac{1}{2}(x_i - v_i)^2 + \lambda|x_i|$ . Take subgradients.

Solution

Component-wise optimisation

For each component: $\min_{x_i} \tfrac{1}{2}(x_i - v_i)^2 + \lambda|x_i|$ . Setting the subgradient to zero: $x_i - v_i + \lambda\,\partial|x_i| \ni 0$ . Solution: $x_i = \operatorname{sign}(v_i)\max(|v_i| - \lambda, 0)$ .

MAP interpretation

Laplace prior: $p(x_i) \propto e^{-|x_i|/b}$ , so $-\log p(\mathbf{x}) = \tfrac{1}{b}\|\mathbf{x}\|_1$ . MAP estimator: $\operatorname{prox}_{\sigma^2/b\,\|\cdot\|_1}(\mathbf{v})$ . With $\lambda = \sigma^2/b$ , this is soft-thresholding. $\square$

ex21-02-pnp-pgd-iter

Easy

Write out the PnP-PGD iteration for the problem $\min_\mathbf{x} \tfrac{1}{2}\|\mathbf{y} - \mathbf{A}\mathbf{x}\|^2 + \lambda R(\mathbf{x})$ with step size $\alpha$ and denoiser $\mathcal{D}_\sigma$ . What is the relationship between $\sigma$ and $\lambda$ ?

Show Hint

PnP-PGD replaces $\operatorname{prox}_{\alpha\lambda R}$ with $\mathcal{D}_\sigma$ .

Solution

PnP-PGD iteration

$\mathbf{x}^{(k+1)} = \mathcal{D}_\sigma\!\bigl(\mathbf{x}^{(k)} - \alpha\mathbf{A}^{H}(\mathbf{A}\mathbf{x}^{(k)} - \mathbf{y})\bigr).$ $

Noise level correspondence

The proximal operator at step size $\alpha$ solves $\min_\mathbf{x} \tfrac{1}{2}\|\mathbf{x} - \mathbf{v}\|^2 + \alpha\lambda R(\mathbf{x})$ , which is MAP denoising at variance $\alpha\lambda$ . Hence $\sigma^2 = \alpha\lambda$ , or $\sigma = \sqrt{\alpha\lambda}$ . $\square$

ex21-03-nonexpansive

Easy

Verify that soft-thresholding $\mathcal{S}_\tau$ is non-expansive: $\|\mathcal{S}_\tau(\mathbf{a}) - \mathcal{S}_\tau(\mathbf{b})\| \leq \|\mathbf{a} - \mathbf{b}\|$ for all $\mathbf{a}, \mathbf{b}$ .

Show Hint

It suffices to show the scalar case: $|\mathcal{S}_\tau(a) - \mathcal{S}_\tau(b)| \leq |a - b|$ .

Consider cases: both above threshold, both below, one above and one below.

Solution

Scalar case

Case 1: $|a|, |b| > \tau$ , same sign. $|\mathcal{S}_\tau(a) - \mathcal{S}_\tau(b)| = |a - b|$ . ✓

Case 1b: Opposite signs. $|\mathcal{S}_\tau(a) - \mathcal{S}_\tau(b)| \leq |a| - \tau + |b| - \tau < |a| + |b| \leq |a - b|$ . ✓

Case 2: $|a| \leq \tau$ . Then $\mathcal{S}_\tau(a) = 0$ , so $|\mathcal{S}_\tau(a) - \mathcal{S}_\tau(b)| = |\mathcal{S}_\tau(b)| \leq |b| - \tau \leq |b - a|$ (since $|a| \leq \tau$ ). ✓

Vector case

$\|\mathcal{S}_\tau(\mathbf{a}) - \mathcal{S}_\tau(\mathbf{b})\|^2 = \sum_i |\mathcal{S}_\tau(a_i) - \mathcal{S}_\tau(b_i)|^2 \leq \sum_i |a_i - b_i|^2 = \|\mathbf{a} - \mathbf{b}\|^2$ . $\square$

ex21-04-pnp-fourier

Easy

Derive the efficient PnP-ADMM $\mathbf{c}$ -update for a partial Fourier sensing matrix $\mathbf{A} = \mathbf{P}_\Omega\mathbf{F}$ ( $\mathbf{F}$ is unitary DFT, $\mathbf{P}_\Omega$ selects rows in $\Omega$ ). Show the update requires two FFTs.

Show Hint

$\mathbf{A}^{H}\mathbf{A} = \mathbf{F}^H\mathbf{D}_\Omega\mathbf{F}$ with $[\mathbf{D}_\Omega]_{kk} = \mathbf{1}_{k\in\Omega}$ .

Solution

Diagonalise

$(\mathbf{A}^{H}\mathbf{A} + \rho\mathbf{I})^{-1} = \mathbf{F}^H(\mathbf{D}_\Omega + \rho\mathbf{I})^{-1}\mathbf{F}$ .

Closed-form update

$\mathbf{c}^{(k+1)} = \mathbf{F}^H\!\left[ \frac{\mathbf{D}_\Omega\mathbf{F}\mathbf{A}^{H}\mathbf{y} + \rho\,\mathbf{F}(\mathbf{z}^{(k)} - \mathbf{u}^{(k)})} {\mathbf{D}_\Omega + \rho}\right].$ $One FFT for$ \mathbf{F}(\mathbf{z}^{(k)} - \mathbf{u}^{(k)}) $, element-wise ops, one IFFT. Total:$ 2 \times O(N\log N) $.$ \square$

ex21-05-drunet-conditioning

Easy

Explain why the noise-level input $\sigma$ in DRUNet is critical for PnP applications, and describe two failure modes when $\sigma$ is misspecified.

Show Hint

The denoiser strength (smoothing) is controlled by $\sigma$ .

In PnP-ADMM, the effective noise level is $\sigma_\text{eff} = \sqrt{\lambda/\rho}$ .

Solution

Role of noise-level input

DRUNet concatenates $\sigma$ as a constant channel, allowing one network to denoise at any noise level. For PnP, this maps directly to the ADMM penalty parameter: $\sigma = \sqrt{\lambda/\rho}$ .

Failure mode 1 — over-denoising

If $\sigma$ is too large (stronger denoising than needed), the denoiser over-smooths and destroys fine details. The reconstruction converges to an excessively blurry image.

Failure mode 2 — under-denoising

If $\sigma$ is too small, the denoiser is too weak to suppress artefacts. The iterates may oscillate or produce a noisy reconstruction with residual measurement artefacts. $\square$

ex21-06-red-gradient

Medium

For the RED regulariser $R_\text{RED}(\mathbf{x}) = \tfrac{1}{2}\mathbf{x}^T(\mathbf{x} - \mathcal{D}_\sigma(\mathbf{x}))$ , compute $\nabla R_\text{RED}$ without assuming Jacobian symmetry. Show that the result involves the Jacobian $\mathbf{J}_\mathcal{D}(\mathbf{x})$ .

Show Hint

Use the product rule: $\nabla(\mathbf{x}^T\mathcal{D}(\mathbf{x})) = \mathcal{D}(\mathbf{x}) + \mathbf{J}_\mathcal{D}^T(\mathbf{x})\mathbf{x}$ .

Solution

Full gradient

$\nabla R_\text{RED} = \frac{1}{2}\nabla\!\left[\|\mathbf{x}\|^2 - \mathbf{x}^T\mathcal{D}_\sigma(\mathbf{x})\right] = \mathbf{x} - \frac{1}{2}\!\left[\mathcal{D}_\sigma(\mathbf{x}) + \mathbf{J}_\mathcal{D}^T(\mathbf{x})\,\mathbf{x}\right].$ $

Jacobian symmetry case

If $\mathbf{J}_\mathcal{D} = \mathbf{J}_\mathcal{D}^T$ and local homogeneity $\mathbf{J}_\mathcal{D}\mathbf{x} \approx \mathcal{D}(\mathbf{x})$ holds: $\nabla R_\text{RED} = \mathbf{x} - \mathcal{D}_\sigma(\mathbf{x})$ .

Without symmetry

The exact gradient involves the vector-Jacobian product $\mathbf{J}_\mathcal{D}^T\mathbf{x}$ , expensive via autodiff. Using the approximation $\nabla R \approx \mathbf{x} - \mathcal{D}_\sigma(\mathbf{x})$ introduces error $\approx \tfrac{1}{2}\|(\mathbf{J}_\mathcal{D}^T - \mathbf{J}_\mathcal{D})\mathbf{x}\|$ . $\square$

ex21-07-noise-schedule

Medium

For PnP-ADMM with $K = 30$ iterations, design a noise schedule $\{\sigma_k\}$ starting at $\sigma_1 = 0.196$ (i.e., $50/255$ ) and ending at $\sigma_{30} = 0.004$ (i.e., $1/255$ ). Compare geometric and cosine schedules.

Show Hint

Geometric: $\sigma_k = \sigma_1\gamma^{k-1}$ with $\gamma = (\sigma_K/\sigma_1)^{1/(K-1)}$ .

Cosine: $\sigma_k = \sigma_K + \tfrac{1}{2}(\sigma_1 - \sigma_K)(1 + \cos(\pi(k-1)/(K-1)))$ .

Solution

Geometric schedule

$\gamma = (1/49)^{1/29} \approx 0.875$ . $\sigma_k = 0.196 \times 0.875^{k-1}$ . Decreases rapidly early; slow refinement at the end.

Cosine schedule

$\sigma_k = 0.004 + \tfrac{0.192}{2}(1 + \cos(\pi(k-1)/29))$ . Decreases slowly at both endpoints, rapidly in the middle.

Comparison

For RF imaging with structured artefacts, the cosine schedule is often preferred: it spends more iterations at intermediate noise levels where the denoiser effectively suppresses artefact-like noise. The geometric schedule converges faster but spends fewer iterations on the mid-range transition. $\square$

ex21-08-firmly-nonexpansive

Medium

Prove that the proximal operator of any proper, lower semicontinuous, convex function $f$ is firmly non-expansive: $\|\operatorname{prox}_f(\mathbf{a}) - \operatorname{prox}_f(\mathbf{b})\|^2 + \|(\mathbf{I} - \operatorname{prox}_f)(\mathbf{a}) - (\mathbf{I} - \operatorname{prox}_f)(\mathbf{b})\|^2 \leq \|\mathbf{a} - \mathbf{b}\|^2.$

Show Hint

Let $\mathbf{p} = \operatorname{prox}_f(\mathbf{a})$ and $\mathbf{q} = \operatorname{prox}_f(\mathbf{b})$ . The optimality conditions give $\mathbf{a} - \mathbf{p} \in \partial f(\mathbf{p})$ .

Use monotonicity of $\partial f$ .

Solution

Optimality conditions

$\mathbf{p} = \operatorname{prox}_f(\mathbf{a})$ satisfies $\mathbf{a} - \mathbf{p} \in \partial f(\mathbf{p})$ . $\mathbf{q} = \operatorname{prox}_f(\mathbf{b})$ satisfies $\mathbf{b} - \mathbf{q} \in \partial f(\mathbf{q})$ .

Monotonicity

Since $f$ is convex, $\partial f$ is monotone: $\langle (\mathbf{a} - \mathbf{p}) - (\mathbf{b} - \mathbf{q}),\; \mathbf{p} - \mathbf{q}\rangle \geq 0$ .

Rearrange

Let $\mathbf{d} = \mathbf{p} - \mathbf{q}$ , $\mathbf{e} = (\mathbf{a} - \mathbf{p}) - (\mathbf{b} - \mathbf{q})$ . Monotonicity: $\langle \mathbf{e}, \mathbf{d}\rangle \geq 0$ . $\mathbf{a} - \mathbf{b} = \mathbf{d} + \mathbf{e}$ , so $\|\mathbf{a} - \mathbf{b}\|^2 = \|\mathbf{d}\|^2 + 2\langle\mathbf{d},\mathbf{e}\rangle + \|\mathbf{e}\|^2 \geq \|\mathbf{d}\|^2 + \|\mathbf{e}\|^2$ , which is the desired inequality. $\square$

ex21-09-icnn-architecture

Medium

Prove that the following ICNN is convex in its input $\mathbf{x}$ : $\mathbf{z}_1 = \operatorname{ReLU}(\mathbf{U}_1\mathbf{x} + \mathbf{b}_1), \quad \mathbf{z}_\ell = \operatorname{ReLU}(\mathbf{W}_\ell^{(z)}\mathbf{z}_{\ell-1} + \mathbf{U}_\ell\mathbf{x} + \mathbf{b}_\ell),$ $\Phi_\theta(\mathbf{x}) = \mathbf{w}^T\mathbf{z}_D + c,$ under the constraints $\mathbf{W}_\ell^{(z)} \geq 0$ (element-wise) and $\mathbf{w} \geq 0$ .

Show Hint

Prove by induction that each $[\mathbf{z}_\ell]_j$ is convex in $\mathbf{x}$ .

Use: if $g$ is convex and non-decreasing, and $h$ is convex, then $g(h(\cdot))$ is convex.

Solution

Base case ($\ell = 1$)

$[\mathbf{z}_1]_j = \operatorname{ReLU}([\mathbf{U}_1\mathbf{x} + \mathbf{b}_1]_j)$ . The affine argument is convex. ReLU is convex and non-decreasing. Composition preserves convexity.

Inductive step

Assume each $[\mathbf{z}_{\ell-1}]_i$ is convex in $\mathbf{x}$ . Since $[\mathbf{W}_\ell^{(z)}]_{ji} \geq 0$ , the sum $\sum_i [\mathbf{W}_\ell^{(z)}]_{ji}[\mathbf{z}_{\ell-1}]_i$ is a non-negative weighted sum of convex functions, hence convex. Adding the affine term $[\mathbf{U}_\ell\mathbf{x} + \mathbf{b}_\ell]_j$ preserves convexity. Applying ReLU (convex, non-decreasing) preserves convexity.

Output layer

$\Phi_\theta(\mathbf{x}) = \mathbf{w}^T\mathbf{z}_D + c$ with $\mathbf{w} \geq 0$ is a non-negative weighted sum of convex functions plus a constant: convex. $\square$

ex21-10-red-fixed-point

Medium

Show that the RED-GD fixed point satisfies the optimality condition of the RED objective: $\mathbf{A}^{H}(\mathbf{A}\mathbf{x}^* - \mathbf{y}) + \lambda(\mathbf{x}^* - \mathcal{D}_\sigma(\mathbf{x}^*)) = \mathbf{0}$ .

Show Hint

At a fixed point, $\mathbf{x}^{(k+1)} = \mathbf{x}^{(k)}$ , so the update gives zero change.

Solution

Fixed-point condition

At $\mathbf{x}^*$ : $\mathbf{x}^* = \mathbf{x}^* - \alpha\bigl[\mathbf{A}^{H}(\mathbf{A}\mathbf{x}^* - \mathbf{y}) + \lambda(\mathbf{x}^* - \mathcal{D}_\sigma(\mathbf{x}^*))\bigr]$ . Subtracting $\mathbf{x}^*$ from both sides and dividing by $\alpha > 0$ : $\mathbf{A}^{H}(\mathbf{A}\mathbf{x}^* - \mathbf{y}) + \lambda(\mathbf{x}^* - \mathcal{D}_\sigma(\mathbf{x}^*)) = \mathbf{0}$ . $\square$

ex21-11-pnp-complex

Hard

Extend PnP-ADMM to complex-valued RF imaging. For $\mathbf{y} = \mathbf{A}\mathbf{c} + \mathbf{w}$ with $\mathbf{c} \in \mathbb{C}^Q$ :

Derive the complex-valued ADMM updates.
Describe how to apply a real DRUNet to the complex intermediate image.
Show that non-expansiveness of the real denoiser implies non-expansiveness on the complex representation.

Show Hint

Split $\mathbf{c} = \mathbf{c}_{R} + j\mathbf{c}_{I}$ and stack as a 2-channel real vector.

Use $\|\tilde{\mathbf{v}}\|_{\mathbb{R}^{2Q}} = \|\mathbf{v}\|_{\mathbb{C}^Q}$ .

Solution

Complex ADMM updates

The $\mathbf{c}$ -update uses complex arithmetic: $\mathbf{c}^{(k+1)} = (\mathbf{A}^{H}\mathbf{A} + \rho\mathbf{I})^{-1}(\mathbf{A}^{H}\mathbf{y} + \rho(\mathbf{z}^{(k)} - \mathbf{u}^{(k)}))$ , which is a complex linear solve (same structure as real ADMM).

Real denoiser on complex images

Stack: $\tilde{\mathbf{v}} = [\operatorname{Re}(\mathbf{v}); \operatorname{Im}(\mathbf{v})] \in \mathbb{R}^{2Q}$ . Apply DRUNet as a 2-channel denoiser. Reconstruct: $\mathbf{z} = \tilde{z}_{1:Q} + j\tilde{z}_{Q+1:2Q}$ .

Convergence

$\|\mathcal{D}(\tilde{\mathbf{a}}) - \mathcal{D}(\tilde{\mathbf{b}})\|_{\mathbb{R}^{2Q}} \leq \|\tilde{\mathbf{a}} - \tilde{\mathbf{b}}\|_{\mathbb{R}^{2Q}} = \|\mathbf{a} - \mathbf{b}\|_{\mathbb{C}^Q}$ . So non-expansiveness on $\mathbb{R}^{2Q}$ implies non-expansiveness on $\mathbb{C}^Q$ . $\square$

ex21-12-spectral-norm

Hard

A 2-layer network has weight matrices $\mathbf{W}_1 \in \mathbb{R}^{m \times n}$ and $\mathbf{W}_2 \in \mathbb{R}^{p \times m}$ and ReLU activations.

Show the Lipschitz constant satisfies $L \leq \|\mathbf{W}_2\|\|\mathbf{W}_1\|$ .
Describe a per-iteration power method to estimate $\|\mathbf{W}_\ell\|$ .
Explain why spectral normalisation preserves denoising quality better than weight clipping.

Show Hint

Use the chain rule for Lipschitz constants: $L_{f \circ g} \leq L_f \cdot L_g$ .

Solution

Lipschitz bound

Layer 1: $\mathbf{h}_1(\mathbf{x}) = \operatorname{ReLU}(\mathbf{W}_1\mathbf{x} + \mathbf{b}_1)$ . $\|\mathbf{h}_1(\mathbf{x}) - \mathbf{h}_1(\mathbf{y})\| \leq \|\mathbf{W}_1(\mathbf{x}-\mathbf{y})\| \leq \|\mathbf{W}_1\|\|\mathbf{x}-\mathbf{y}\|$ . By composition: $L \leq \|\mathbf{W}_2\|\|\mathbf{W}_1\|$ .

Power method for spectral norm

Initialise $\tilde{\mathbf{u}} \sim \mathcal{N}(\mathbf{0}, \mathbf{I})$ . Each training step: $\tilde{\mathbf{v}} = \mathbf{W}^T\tilde{\mathbf{u}}/\|\mathbf{W}^T\tilde{\mathbf{u}}\|$ , then $\tilde{\mathbf{u}} = \mathbf{W}\tilde{\mathbf{v}}/\|\mathbf{W}\tilde{\mathbf{v}}\|$ . Spectral norm estimate: $\hat{\sigma}_1 = \tilde{\mathbf{u}}^T\mathbf{W}\tilde{\mathbf{v}}$ .

Spectral normalisation vs weight clipping

Weight clipping ( $|\mathbf{W}_{ij}| \leq c$ ) constrains individual entries, often destroying gradient signal. Spectral normalisation $\bar{\mathbf{W}} = \mathbf{W}/\hat{\sigma}_1$ divides by the largest singular value, preserving the weight's directional structure and allowing large entries in directions that do not expand the output norm. $\square$

ex21-13-red-convergence

Hard

Show that RED gradient descent converges to a global minimum when $R_\text{RED}$ is convex and the total objective $F(\mathbf{x}) = \tfrac{1}{2}\|\mathbf{y} - \mathbf{A}\mathbf{x}\|^2 + \lambda R_\text{RED}(\mathbf{x})$ is $\mu$ -strongly convex. What step size is required?

Show Hint

For $\beta$ -smooth, $\mu$ -strongly convex $F$ , gradient descent converges at rate $(1 - \mu/\beta)^k$ .

The gradient $\nabla F$ is $\beta = \|\mathbf{A}\|^2 + \lambda$ -Lipschitz when $\nabla R_\text{RED}$ is 1-Lipschitz.

Solution

Smoothness of $\nabla F$

$\|\nabla F(\mathbf{x}) - \nabla F(\mathbf{y})\| \leq (\|\mathbf{A}\|^2 + \lambda)\|\mathbf{x} - \mathbf{y}\|$ (assuming $\nabla R_\text{RED}$ is 1-Lipschitz, i.e., $\mathcal{D}_\sigma$ is 0-Lipschitz, which holds if $\mathcal{D}_\sigma$ is a constant). In general: $\beta = \|\mathbf{A}\|^2 + \lambda(1 + L_\mathcal{D})$ where $L_\mathcal{D}$ is the Lipschitz constant of $\mathcal{D}_\sigma$ .

Step size and convergence rate

Required step size: $\alpha \leq 1/\beta$ . Convergence rate for strongly convex $F$ with $\mu$ -strong convexity: $F(\mathbf{x}^{(k)}) - F^* \leq (1 - \mu\alpha)^k(F(\mathbf{x}^{(0)}) - F^*)$ . $\square$

ex21-14-pnp-divergence

Hard

Construct an explicit 2D example where PnP-PGD diverges. Use a linear denoiser $\mathcal{D}(\mathbf{x}) = c\mathbf{x}$ with $c > 1$ and show that the iterates grow without bound for a specific step size $\alpha$ .

Show Hint

Let $\mathbf{A} = \mathbf{I}$ , $\mathbf{y} = \mathbf{0}$ .

The PnP-PGD map is $\mathbf{x}^{(k+1)} = c(1-\alpha)\mathbf{x}^{(k)}$ .

Solution

Setup

Let $\mathbf{A} = \mathbf{I}$ , $\mathbf{y} = \mathbf{0}$ , $c = 1.5$ , $\alpha = 0.1$ .

PnP-PGD iteration

$\mathbf{x}^{(k+1)} = 1.5(\mathbf{x}^{(k)} - 0.1\mathbf{x}^{(k)}) = 1.35\,\mathbf{x}^{(k)}$ . The spectral radius is $1.35 > 1$ , so $\|\mathbf{x}^{(k)}\| = 1.35^k\|\mathbf{x}^{(0)}\| \to \infty$ . $\square$

ex21-15-pnp-vs-oamp-theory

Challenge

In the large-system limit with $Q, M \to \infty$ at fixed ratio $\delta = M/Q$ , OAMP with the exact BG prior achieves the MMSE estimator for Bernoulli-Gaussian signals. Explain why PnP cannot match this guarantee in general, and identify a condition under which PnP-ADMM would achieve the MMSE.

Show Hint

OAMP's state evolution tracks the exact distribution of messages in the large-system limit.

PnP denoiser: can it compute $\mathbb{E}[\mathbf{x}|\mathbf{v}]$ under a BG prior?

Solution

OAMP achieves MMSE

OAMP's state evolution shows that the OAMP denoiser input is asymptotically Gaussian with a known variance. The BG MMSE denoiser computes $\mathbb{E}[\mathbf{x}_i | v_i]$ exactly, giving the minimum MSE component-wise. Convergence of state evolution to a fixed point corresponds to convergence of OAMP to the MMSE estimate.

Why PnP cannot match this in general

A general deep denoiser (DRUNet) is trained on natural images and does not compute the BG MMSE denoiser. Even if the denoiser is optimal for natural images, it sub-optimally handles BG signals because the BG distribution differs from the natural image prior.

Condition for PnP to achieve MMSE

If the PnP denoiser is specifically the BG MMSE denoiser $\mathcal{D}_\sigma(\mathbf{v}) = \mathbb{E}_\text{BG}[\mathbf{x}|\mathbf{v}]$ at the correct variance $\sigma^2$ , then PnP-ADMM converges to the same fixed point as OAMP and achieves the MMSE under state evolution. This is achieved by using a matched denoiser — but then PnP reduces to OAMP with state evolution tracking. $\square$

ex21-16-icnn-expressivity

Challenge

Design an ICNN $\Phi_\theta \colon \mathbb{R}^2 \to \mathbb{R}$ that approximates the TV regulariser $R_\text{TV}(\mathbf{x}) = |x_1 - x_2|$ for $\mathbf{x} = (x_1, x_2)$ . Describe the architecture, constraints, and prove it is convex. Quantify the approximation error.

Show Hint

Use a max-pooling or softplus approximation of absolute value.

$|x_1 - x_2| = \max(x_1 - x_2, x_2 - x_1) \approx \text{softplus}(x_1 - x_2) + \text{softplus}(x_2 - x_1) - \log(2)$ .

Solution

Architecture

$\Phi_\theta(x_1, x_2) = \text{softplus}(\mathbf{u}^T\mathbf{x} + b) + \text{softplus}(-\mathbf{u}^T\mathbf{x} + b)$ where $\mathbf{u} = (1, -1)^T$ and $b = 0$ . This gives $\Phi_\theta(\mathbf{x}) = 2\log(1 + e^{x_1 - x_2}/2)$ .

Convexity

$\text{softplus}(a) = \log(1 + e^a)$ is convex. A linear function of $\mathbf{x}$ is affine (convex). The composition of a convex function with an affine function is convex. The sum of convex functions is convex.

Approximation error

As $|x_1 - x_2| \to \infty$ : $\Phi_\theta(\mathbf{x}) \to |x_1 - x_2|$ (exact). For small $|x_1 - x_2| \leq \epsilon$ : $\Phi_\theta(\mathbf{x}) \approx \log(2) + (x_1 - x_2)^2/(4\log(2)) \geq 0$ . Maximum error: $|\Phi_\theta - |x_1 - x_2|| \leq \log(2) \approx 0.693$ , achieved at $x_1 = x_2$ . $\square$

ex21-17-score-red-connection

Challenge

Using Tweedie's formula $\mathcal{D}_\sigma(\mathbf{v}) = \mathbf{v} + \sigma^2\nabla_\mathbf{v}\log p_\sigma(\mathbf{v})$ , show that the RED gradient descent step at $\mathbf{x}$ is equivalent to an approximate score-function gradient ascent step in the image distribution $p(\mathbf{x})$ . Under what approximation does this hold?

Show Hint

Apply Tweedie's formula at $\mathbf{v} = \mathbf{x}$ to get $\mathbf{x} - \mathcal{D}_\sigma(\mathbf{x})$ .

The score function is $\nabla_\mathbf{x}\log p(\mathbf{x})$ ; the smoothed score is $\nabla_\mathbf{v}\log p_\sigma(\mathbf{v})$ .

Solution

Apply Tweedie at $\mathbf{v} = \mathbf{x}$

$\mathcal{D}_\sigma(\mathbf{x}) = \mathbf{x} + \sigma^2\nabla_\mathbf{x}\log p_\sigma(\mathbf{x}).KATEXPLACEHOLDER0END\mathbf{x} - \mathcal{D}_\sigma(\mathbf{x}) = -\sigma^2\nabla_\mathbf{x}\log p_\sigma(\mathbf{x}).$ $

RED gradient descent as score ascent

The RED update: $\mathbf{x}^{(k+1)} = \mathbf{x}^{(k)} - \alpha[\underbrace{\nabla f(\mathbf{x}^{(k)})}_\text{data fidelity} + \lambda(\mathbf{x}^{(k)} - \mathcal{D}_\sigma(\mathbf{x}^{(k)}))].$ Substituting: $= \mathbf{x}^{(k)} - \alpha\nabla f(\mathbf{x}^{(k)}) + \alpha\lambda\sigma^2\nabla_\mathbf{x}\log p_\sigma(\mathbf{x}^{(k)}).$ This is a data-fidelity gradient step plus a score-function ascent step in $\log p_\sigma$ (the smoothed image log-density).

Approximation

The equivalence uses $p_\sigma(\mathbf{x}) \approx p(\mathbf{x})$ (image distribution evaluated at a clean point). This is accurate when $\sigma$ is small relative to the image structure, i.e., in late iterations when the estimate is close to the truth. $\square$

ex21-18-pnp-rf-imaging-experiment

Medium

An RF imaging system has $M = 512$ complex measurements of a $Q = 1024$ -voxel scene, modelled as $\mathbf{y} = \mathbf{A}\mathbf{c} + \mathbf{w}$ with $\mathbf{A} \in \mathbb{C}^{512 \times 1024}$ drawn i.i.d. Gaussian and $\sigma^2 = 0.01\|\mathbf{A}\mathbf{c}\|^2/M$ (SNR = 20 dB).

For PnP-ADMM with $K = 40$ , $\rho = 1$ , and a geometric noise schedule from $\sigma_1 = 0.05$ to $\sigma_{40} = 0.005$ :

Compute the ADMM penalty parameter $\rho$ that makes $\sigma_1$ consistent with $\sqrt{\lambda/\rho}$ for $\lambda = 0.05$ .
Estimate the per-iteration cost in flops (assume $Q = N\log N$ for the $\mathbf{A}^{H}\mathbf{A}$ solve via FFT).
Compare with LASSO (FISTA, 200 iterations, same sensing matrix).

Show Hint

From $\sigma_1 = \sqrt{\lambda/\rho}$ : $\rho = \lambda/\sigma_1^2$ .

Per ADMM iteration: 1 FFT-based solve + 1 DRUNet forward pass.

Solution

ADMM penalty

$\rho = \lambda/\sigma_1^2 = 0.05/0.05^2 = 20$ . At this $\rho$ , $\sqrt{0.05/20} = 0.05 = \sigma_1$ . ✓

Per-iteration cost

ADMM linear solve (via random $\mathbf{A}$ ): $O(MQ)$ (matrix–vector), $\approx 512 \times 1024 \approx 5 \times 10^5$ ops
DRUNet forward pass (CPU, 4-scale U-Net): $\approx 10^8$ ops per image

Total per-iteration: $\approx 10^8$ flops (dominated by DRUNet). 40 iterations: $\approx 4 \times 10^9$ flops.

Comparison with LASSO

FISTA per iteration: one matrix–vector multiply ( $O(MQ) \approx 5\times10^5$ ) + soft-threshold. 200 iterations: $\approx 10^8$ flops (no DRUNet).

PnP costs $\sim 40\times$ more per run, but typically achieves $2$ – $3$ dB better MSE for structured scenes. The tradeoff depends on the application's tolerance for computation time. $\square$

Exercises

ex21-01-prox-denoiser

Component-wise optimisation

MAP interpretation

ex21-02-pnp-pgd-iter

PnP-PGD iteration

Noise level correspondence

ex21-03-nonexpansive

Scalar case

Vector case

ex21-04-pnp-fourier

Diagonalise

Closed-form update

ex21-05-drunet-conditioning

Role of noise-level input

Failure mode 1 — over-denoising

Failure mode 2 — under-denoising

ex21-06-red-gradient

Full gradient

Jacobian symmetry case

Without symmetry

ex21-07-noise-schedule

Geometric schedule

Cosine schedule

Comparison

ex21-08-firmly-nonexpansive

Optimality conditions

Monotonicity

Rearrange

ex21-09-icnn-architecture

Base case ($\ell = 1$)

Inductive step

Output layer

ex21-10-red-fixed-point

Fixed-point condition

ex21-11-pnp-complex

Complex ADMM updates

Real denoiser on complex images

Convergence

ex21-12-spectral-norm

Lipschitz bound

Power method for spectral norm

Spectral normalisation vs weight clipping

ex21-13-red-convergence

Smoothness of $\nabla F$

Step size and convergence rate

ex21-14-pnp-divergence

Setup

PnP-PGD iteration

ex21-15-pnp-vs-oamp-theory

OAMP achieves MMSE

Why PnP cannot match this in general

Condition for PnP to achieve MMSE

ex21-16-icnn-expressivity

Architecture

Convexity

Approximation error

ex21-17-score-red-connection

Apply Tweedie at $\mathbf{v} = \mathbf{x}$

RED gradient descent as score ascent

Approximation

ex21-18-pnp-rf-imaging-experiment

ADMM penalty

Per-iteration cost

Comparison with LASSO