Exercises
ex21-01-prox-denoiser
EasyShow that is soft-thresholding, and interpret it as a MAP denoiser for the model where and has a Laplace prior.
The problem separates into component-wise scalar subproblems.
For each component: . Take subgradients.
Component-wise optimisation
For each component: . Setting the subgradient to zero: . Solution: .
MAP interpretation
Laplace prior: , so . MAP estimator: . With , this is soft-thresholding.
ex21-02-pnp-pgd-iter
EasyWrite out the PnP-PGD iteration for the problem with step size and denoiser . What is the relationship between and ?
PnP-PGD replaces with .
PnP-PGD iteration
$
Noise level correspondence
The proximal operator at step size solves , which is MAP denoising at variance . Hence , or .
ex21-03-nonexpansive
EasyVerify that soft-thresholding is non-expansive: for all .
It suffices to show the scalar case: .
Consider cases: both above threshold, both below, one above and one below.
Scalar case
Case 1: , same sign. . β
Case 1b: Opposite signs. . β
Case 2: . Then , so (since ). β
Vector case
.
ex21-04-pnp-fourier
EasyDerive the efficient PnP-ADMM -update for a partial Fourier sensing matrix ( is unitary DFT, selects rows in ). Show the update requires two FFTs.
with .
Diagonalise
.
Closed-form update
\mathbf{F}(\mathbf{z}^{(k)} - \mathbf{u}^{(k)})2 \times O(N\log N)\square$
ex21-05-drunet-conditioning
EasyExplain why the noise-level input in DRUNet is critical for PnP applications, and describe two failure modes when is misspecified.
The denoiser strength (smoothing) is controlled by .
In PnP-ADMM, the effective noise level is .
Role of noise-level input
DRUNet concatenates as a constant channel, allowing one network to denoise at any noise level. For PnP, this maps directly to the ADMM penalty parameter: .
Failure mode 1 β over-denoising
If is too large (stronger denoising than needed), the denoiser over-smooths and destroys fine details. The reconstruction converges to an excessively blurry image.
Failure mode 2 β under-denoising
If is too small, the denoiser is too weak to suppress artefacts. The iterates may oscillate or produce a noisy reconstruction with residual measurement artefacts.
ex21-06-red-gradient
MediumFor the RED regulariser , compute without assuming Jacobian symmetry. Show that the result involves the Jacobian .
Use the product rule: .
Full gradient
$
Jacobian symmetry case
If and local homogeneity holds: .
Without symmetry
The exact gradient involves the vector-Jacobian product , expensive via autodiff. Using the approximation introduces error .
ex21-07-noise-schedule
MediumFor PnP-ADMM with iterations, design a noise schedule starting at (i.e., ) and ending at (i.e., ). Compare geometric and cosine schedules.
Geometric: with .
Cosine: .
Geometric schedule
. . Decreases rapidly early; slow refinement at the end.
Cosine schedule
. Decreases slowly at both endpoints, rapidly in the middle.
Comparison
For RF imaging with structured artefacts, the cosine schedule is often preferred: it spends more iterations at intermediate noise levels where the denoiser effectively suppresses artefact-like noise. The geometric schedule converges faster but spends fewer iterations on the mid-range transition.
ex21-08-firmly-nonexpansive
MediumProve that the proximal operator of any proper, lower semicontinuous, convex function is firmly non-expansive:
Let and . The optimality conditions give .
Use monotonicity of .
Optimality conditions
satisfies . satisfies .
Monotonicity
Since is convex, is monotone: .
Rearrange
Let , . Monotonicity: . , so , which is the desired inequality.
ex21-09-icnn-architecture
MediumProve that the following ICNN is convex in its input : under the constraints (element-wise) and .
Prove by induction that each is convex in .
Use: if is convex and non-decreasing, and is convex, then is convex.
Base case ($\ell = 1$)
. The affine argument is convex. ReLU is convex and non-decreasing. Composition preserves convexity.
Inductive step
Assume each is convex in . Since , the sum is a non-negative weighted sum of convex functions, hence convex. Adding the affine term preserves convexity. Applying ReLU (convex, non-decreasing) preserves convexity.
Output layer
with is a non-negative weighted sum of convex functions plus a constant: convex.
ex21-10-red-fixed-point
MediumShow that the RED-GD fixed point satisfies the optimality condition of the RED objective: .
At a fixed point, , so the update gives zero change.
Fixed-point condition
At : . Subtracting from both sides and dividing by : .
ex21-11-pnp-complex
HardExtend PnP-ADMM to complex-valued RF imaging. For with :
- Derive the complex-valued ADMM updates.
- Describe how to apply a real DRUNet to the complex intermediate image.
- Show that non-expansiveness of the real denoiser implies non-expansiveness on the complex representation.
Split and stack as a 2-channel real vector.
Use .
Complex ADMM updates
The -update uses complex arithmetic: , which is a complex linear solve (same structure as real ADMM).
Real denoiser on complex images
Stack: . Apply DRUNet as a 2-channel denoiser. Reconstruct: .
Convergence
. So non-expansiveness on implies non-expansiveness on .
ex21-12-spectral-norm
HardA 2-layer network has weight matrices and and ReLU activations.
- Show the Lipschitz constant satisfies .
- Describe a per-iteration power method to estimate .
- Explain why spectral normalisation preserves denoising quality better than weight clipping.
Use the chain rule for Lipschitz constants: .
Lipschitz bound
Layer 1: . . By composition: .
Power method for spectral norm
Initialise . Each training step: , then . Spectral norm estimate: .
Spectral normalisation vs weight clipping
Weight clipping () constrains individual entries, often destroying gradient signal. Spectral normalisation divides by the largest singular value, preserving the weight's directional structure and allowing large entries in directions that do not expand the output norm.
ex21-13-red-convergence
HardShow that RED gradient descent converges to a global minimum when is convex and the total objective is -strongly convex. What step size is required?
For -smooth, -strongly convex , gradient descent converges at rate .
The gradient is -Lipschitz when is 1-Lipschitz.
Smoothness of $\nabla F$
(assuming is 1-Lipschitz, i.e., is 0-Lipschitz, which holds if is a constant). In general: where is the Lipschitz constant of .
Step size and convergence rate
Required step size: . Convergence rate for strongly convex with -strong convexity: .
ex21-14-pnp-divergence
HardConstruct an explicit 2D example where PnP-PGD diverges. Use a linear denoiser with and show that the iterates grow without bound for a specific step size .
Let , .
The PnP-PGD map is .
Setup
Let , , , .
PnP-PGD iteration
. The spectral radius is , so .
ex21-15-pnp-vs-oamp-theory
ChallengeIn the large-system limit with at fixed ratio , OAMP with the exact BG prior achieves the MMSE estimator for Bernoulli-Gaussian signals. Explain why PnP cannot match this guarantee in general, and identify a condition under which PnP-ADMM would achieve the MMSE.
OAMP's state evolution tracks the exact distribution of messages in the large-system limit.
PnP denoiser: can it compute under a BG prior?
OAMP achieves MMSE
OAMP's state evolution shows that the OAMP denoiser input is asymptotically Gaussian with a known variance. The BG MMSE denoiser computes exactly, giving the minimum MSE component-wise. Convergence of state evolution to a fixed point corresponds to convergence of OAMP to the MMSE estimate.
Why PnP cannot match this in general
A general deep denoiser (DRUNet) is trained on natural images and does not compute the BG MMSE denoiser. Even if the denoiser is optimal for natural images, it sub-optimally handles BG signals because the BG distribution differs from the natural image prior.
Condition for PnP to achieve MMSE
If the PnP denoiser is specifically the BG MMSE denoiser at the correct variance , then PnP-ADMM converges to the same fixed point as OAMP and achieves the MMSE under state evolution. This is achieved by using a matched denoiser β but then PnP reduces to OAMP with state evolution tracking.
ex21-16-icnn-expressivity
ChallengeDesign an ICNN that approximates the TV regulariser for . Describe the architecture, constraints, and prove it is convex. Quantify the approximation error.
Use a max-pooling or softplus approximation of absolute value.
.
Architecture
where and . This gives .
Convexity
is convex. A linear function of is affine (convex). The composition of a convex function with an affine function is convex. The sum of convex functions is convex.
Approximation error
As : (exact). For small : . Maximum error: , achieved at .
ex21-17-score-red-connection
ChallengeUsing Tweedie's formula , show that the RED gradient descent step at is equivalent to an approximate score-function gradient ascent step in the image distribution . Under what approximation does this hold?
Apply Tweedie's formula at to get .
The score function is ; the smoothed score is .
Apply Tweedie at $\mathbf{v} = \mathbf{x}$
$
RED gradient descent as score ascent
The RED update: Substituting: This is a data-fidelity gradient step plus a score-function ascent step in (the smoothed image log-density).
Approximation
The equivalence uses (image distribution evaluated at a clean point). This is accurate when is small relative to the image structure, i.e., in late iterations when the estimate is close to the truth.
ex21-18-pnp-rf-imaging-experiment
MediumAn RF imaging system has complex measurements of a -voxel scene, modelled as with drawn i.i.d. Gaussian and (SNR = 20 dB).
For PnP-ADMM with , , and a geometric noise schedule from to :
- Compute the ADMM penalty parameter that makes consistent with for .
- Estimate the per-iteration cost in flops (assume for the solve via FFT).
- Compare with LASSO (FISTA, 200 iterations, same sensing matrix).
From : .
Per ADMM iteration: 1 FFT-based solve + 1 DRUNet forward pass.
ADMM penalty
. At this , . β
Per-iteration cost
- ADMM linear solve (via random ): (matrixβvector), ops
- DRUNet forward pass (CPU, 4-scale U-Net): ops per image
Total per-iteration: flops (dominated by DRUNet). 40 iterations: flops.
Comparison with LASSO
FISTA per iteration: one matrixβvector multiply () + soft-threshold. 200 iterations: flops (no DRUNet).
PnP costs more per run, but typically achieves β dB better MSE for structured scenes. The tradeoff depends on the application's tolerance for computation time.