Exercises
ex22-01
EasyLet . Compute the score function .
Write out and differentiate.
Log-density
Differentiate
The score points from back toward the mean , scaled by the inverse covariance.
ex22-02
EasyIn the DDPM forward process, compute the signal-to-noise ratio for the linear schedule with , , . At which step does (0 dB)?
Compute numerically.
when .
Numerical computation
With the linear schedule, decreases from to . Evaluating numerically: occurs at .
Before this point, the signal dominates; after, noise dominates. This is the crossover point of the diffusion process.
Interpretation
The first half of the diffusion process () operates in the high-SNR regime where the score network sees mostly signal; the second half () operates in the low-SNR regime where the score must extrapolate from noise. The cosine schedule shifts this crossover to later steps, spending more time in the informative regime.
ex22-03
EasyVerify Tweedie's formula for the case (non-standard Gaussian). Show that matches the Tweedie prediction.
Compute the joint distribution of .
Use Gaussian conditioning to find .
Joint distribution
, so , , .
Gaussian conditioning
Verify via Tweedie
The marginal , so . Substituting into Tweedie's formula yields the same expression.
ex22-04
EasyFor DPS with guidance scale and measurement noise variance , what is the effective regularisation parameter in terms of the measurement residual? Compare with the proximal operator in PnP-ADMM (Chapter 21).
The DPS gradient is .
Effective step size
The DPS gradient step has effective step size . This plays the same role as in PnP-ADMM, where is the penalty parameter.
Connection to PnP
In PnP-ADMM: (denoiser step), (data step).
In DPS: the score network provides the denoiser, and the guidance gradient provides the data step. The ratio corresponds to — both control the balance between prior and data fidelity.
ex22-05
EasyName three advantages and three disadvantages of diffusion-based reconstruction compared to PnP methods for RF imaging.
Consider quality, speed, uncertainty, training, and generality.
Advantages
- Higher reconstruction quality (-- dB PSNR)
- Posterior sampling enables uncertainty quantification
- Stronger prior captures complex scene statistics
Disadvantages
- Higher computational cost (-- more NFEs)
- Requires more training data for the score network
- Approximate posterior — no convergence guarantees
ex22-06
MediumDerive the DPS guidance gradient for the nonlinear forward model where and is a differentiable function.
The log-likelihood is .
Apply the chain rule through and the Tweedie estimate.
Log-likelihood
Chain rule
\mathbf{J}_f(\hat{\mathbf{x}}0) = \partial f/\partial\mathbf{x}\big|{\hat{\mathbf{x}}_0}f$ and the Tweedie estimate.
Comparison with linear case
For , the Jacobian is , recovering the linear DPS gradient.
ex22-07
MediumConsider a D compressed sensing problem with where (75% undersampling). The SVD gives nonzero singular values. What fraction of the reconstruction is determined by the measurements, and what fraction must be filled by the diffusion prior?
Apply the null-space preservation theorem.
The range space has dimension and the null space has dimension .
Dimensions
(range space dimension), (null space dimension).
Interpretation
Only of the reconstruction content is determined by the measurements. The remaining must be inferred by the diffusion prior. This makes the quality of the prior critical: a weak prior produces artefacts in the null-space components, while a strong prior fills in plausible content.
For RF imaging at 75% undersampling, the prior must capture scene-specific statistics (point scatterers, clutter) — a natural-image prior would hallucinate inappropriate textures.
ex22-08
MediumDerive the DDNM correction formula:
and show that it satisfies when has full row rank.
Use for full row rank.
Apply the forward model
$
Use pseudoinverse identity
For full row rank: , so
Null-space preservation
The correction lies in the row space of . Therefore, the null-space component of is preserved: .
ex22-09
MediumCompare the per-step computational cost of DPS (with backpropagation) and DDRM (with SVD-based projection) for a forward model . Under what conditions is DDRM more efficient per step?
DPS per step: .
DDRM per step: (no backprop).
DDRM requires a one-time SVD: .
Per-step cost
- DPS: (forward, backward, matrix-vector)
- DDRM: (forward, projection via precomputed SVD, where )
DDRM is faster per step.
One-time SVD cost
The SVD of costs . This amortises over all steps, adding per step.
DDRM is overall more efficient when . For typical imaging problems with moderate () and , DDRM wins. For large-scale RF imaging (), the SVD is prohibitive and DPS is preferred.
ex22-10
MediumIn DiffPIR, the proximal data step has the closed-form solution (for invertible):
Show that as , this reduces to the pseudoinverse solution , and as , it reduces to (pure prior).
Use the Woodbury identity or direct limit analysis.
Limit $\rho \to 0$
(pseudoinverse, pure data fidelity, no prior).
Limit $\rho \to \infty$
(pure prior, no data fidelity).
Interpretation
The parameter interpolates between data fidelity and prior. In DiffPIR, is scheduled to decrease over iterations: early iterations rely on the prior (high , low noise), later iterations enforce data fidelity (low ).
ex22-11
MediumA DDIM sampler with uniformly spaced steps uses the time subsequence for . For the linear schedule , show that each DDIM step covers approximately the same change in log-SNR .
Compute .
For the linear schedule, is approximately linear in .
Log-SNR
. For the linear schedule, decreases roughly exponentially, making approximately linear in .
Uniform steps in $t$
If is linear in , then uniform steps give uniform steps .
Implication
Each DDIM step covers the same "amount of denoising" in the log-SNR sense. This is why uniform time subsampling works well for the linear schedule. For non-linear schedules (cosine), non-uniform subsampling may be preferred.
ex22-12
MediumShow that the DPS guidance gradient for the noiseless case () becomes a hard projection:
diverges unless exactly. What does this imply for the choice of at different noise levels?
The gradient magnitude scales as .
For finite , the gradient diverges as unless the residual is zero.
Divergence analysis
The guidance gradient has magnitude . As , this diverges unless .
Implication
In the noiseless limit, DPS requires the Tweedie estimate to be exactly measurement-consistent at every step, which is unrealistic. In practice, must be decreased as decreases, keeping bounded. A common heuristic: , making the effective step size independent of noise level.
ex22-13
HardDerive the GDM likelihood approximation. Starting from the Gaussian approximation , show that the marginal likelihood is:
and derive the corresponding guidance gradient.
Marginalise: .
Both factors are Gaussian; the integral is a convolution of Gaussians.
Marginalisation
: , . By the convolution property: .
Log-likelihood gradient
$
Guidance gradient
Apply the chain rule through the Tweedie estimate:
At large : is large, the covariance is dominated by , and guidance is weak (prior dominates). At small : , recovering the DPS gradient.
ex22-14
HardProve that the DDRM reconstruction is measurement-consistent: for the noiseless case. Use the SVD decomposition .
Write .
Apply and use .
Apply the forward model
$
Simplify
Using and :
Since , we have .
ex22-15
HardA radar system produces measurements where with , , and . A DPS reconstruction with DDIM steps and a U-Net with 100M parameters takes 60 seconds. Design an acceleration strategy to bring this to under 5 seconds while maintaining measurement consistency.
Consider DPM-Solver++ with -- steps.
Consider reducing the network size or using latent diffusion.
Step reduction
DPM-Solver++ with : reduces time from 60s to seconds. Still above 5s.
Combined acceleration
DPM-Solver++ () + gradient checkpointing (saves memory, costs more compute per step): seconds.
Alternatively: DPM-Solver++ () without backpropagation (replace DPS guidance with DDNM-style projection): seconds. This sacrifices the gradient-based guidance but is sufficient for a structured .
Recommendation
For with known SVD: DDNM + DPM-Solver++ () achieves seconds with exact measurement consistency. For general : DPS + DPM-Solver++ () achieves seconds with approximate consistency.
ex22-16
HardDerive the relationship between the DPS guidance gradient and the MAP estimator. Show that in the limit of infinitely many diffusion steps (continuous time), DPS with deterministic (DDIM) sampling converges to a gradient descent on the MAP objective .
In the continuous-time limit, the DDIM trajectory follows the probability flow ODE.
The score converges to as .
Probability flow ODE
The DDIM trajectory in continuous time follows: .
Add DPS guidance
With guidance: .
Limiting behaviour
As : (data distribution), (Tweedie estimate converges to the observation). The ODE becomes a gradient flow on: , which is the MAP objective with regularisation weight .
ex22-17
HardFor a complex-valued SAR scene , a diffusion model is trained on the 2-channel representation . Show that applying DPS with the linear forward model (complex-valued) is equivalent to DPS on a real-valued system of twice the dimension.
Write and expand the measurement equation into real and imaginary parts.
Real-valued embedding
Define , , and the real-valued sensing matrix:
Equivalence
The complex measurement is equivalent to .
DPS on the 2-channel representation with is mathematically identical to complex-valued DPS. The score network operates on and the guidance gradient is computed via the real-valued chain rule.
ex22-18
ChallengeDesign a physics-constrained diffusion training scheme for the RF imaging forward model . The training objective should combine DSM loss with a measurement consistency loss. Derive the gradient of the combined objective with respect to the network parameters .
Use .
The physics loss involves the Tweedie estimate, which depends on .
Combined objective
\hat{\mathbf{x}}_0(\mathbf{x}_t;\theta) = (\mathbf{x}_t - \sqrt{1-\bar{\alpha}t}\boldsymbol{\epsilon}\theta(\mathbf{x}_t,t))/\sqrt{\bar{\alpha}_t}$.
Gradient of physics loss
$
This requires backpropagation through the network (same as DPS guidance, but during training rather than inference).
Training procedure
- Sample from the simulation database
- Sample ; compute
- Forward pass:
- Compute via Tweedie
- Compute
- Backpropagate and update
ex22-19
ChallengeProve that the DDIM sampler is a first-order exponential integrator for the probability flow ODE. Start from the ODE:
and show that the DDIM update is the exact solution of this ODE with a piecewise-constant approximation of .
The linear part can be solved exactly via the integrating factor.
Treat as constant over .
Integrating factor
Define . Then .
Piecewise-constant approximation
Approximate for . Integrating:
Connection to DDIM
For the VP-SDE with and , evaluating the integral and transforming back to coordinates recovers the DDIM update formula. The approximation error is (first-order), which explains why DDIM needs -- steps for good quality.
ex22-20
ChallengeConsider using DPS for a non-Gaussian measurement model: (photon-counting model, relevant for low-dose imaging). Derive the DPS guidance gradient and discuss the challenges compared to the Gaussian case.
The Poisson log-likelihood is .
The gradient involves , which diverges when the estimate is near zero.
Poisson log-likelihood
$
Gradient with respect to $\hat{\mathbf{x}}_0$
$
where the division is element-wise.
DPS guidance gradient
$
Challenges
- Numerical instability: When , the gradient diverges. Requires clipping or smoothing.
- Non-negativity: must be non-negative for the Poisson model to be valid. The diffusion model may generate negative values, requiring projection.
- No closed-form proximal: Unlike the Gaussian case, there is no closed-form solution for the Poisson data-fidelity step in DiffPIR-style methods.