Ferkans — Interactive Telecom Tutor

ex22-01

Easy

Let $p(\mathbf{x}) = \mathcal{N}(\boldsymbol{\mu}, \boldsymbol{\Sigma})$ . Compute the score function $\nabla_\mathbf{x}\log p(\mathbf{x})$ .

Show Hint

Write out $\log p(\mathbf{x})$ and differentiate.

Solution

Log-density

$\log p(\mathbf{x}) = -\frac{1}{2}(\mathbf{x}-\boldsymbol{\mu})^T\boldsymbol{\Sigma}^{-1}(\mathbf{x}-\boldsymbol{\mu}) + \text{const}.$

Differentiate

$\nabla_\mathbf{x}\log p(\mathbf{x}) = -\boldsymbol{\Sigma}^{-1}(\mathbf{x}-\boldsymbol{\mu}).$

The score points from $\mathbf{x}$ back toward the mean $\boldsymbol{\mu}$ , scaled by the inverse covariance. $\blacksquare$

ex22-02

Easy

In the DDPM forward process, compute the signal-to-noise ratio $\text{SNR}(t) = \bar{\alpha}_t / (1 - \bar{\alpha}_t)$ for the linear schedule $\beta_t = \beta_{\min} + (\beta_{\max}-\beta_{\min})\frac{t-1}{T-1}$ with $\beta_{\min} = 10^{-4}$ , $\beta_{\max} = 0.02$ , $T = 1000$ . At which step $t$ does $\text{SNR}(t) = 1$ (0 dB)?

Show Hint

Compute $\bar{\alpha}_t = \prod_{s=1}^t(1-\beta_s)$ numerically.

$\text{SNR} = 1$ when $\bar{\alpha}_t = 0.5$ .

Solution

Numerical computation

With the linear schedule, $\bar{\alpha}_t$ decreases from $\sim 1$ to $\sim 0$ . Evaluating numerically: $\bar{\alpha}_{t^*} = 0.5$ occurs at $t^* \approx 520$ .

Before this point, the signal dominates; after, noise dominates. This is the crossover point of the diffusion process.

Interpretation

The first half of the diffusion process ( $t < 520$ ) operates in the high-SNR regime where the score network sees mostly signal; the second half ( $t > 520$ ) operates in the low-SNR regime where the score must extrapolate from noise. The cosine schedule shifts this crossover to later steps, spending more time in the informative regime.

ex22-03

Easy

Verify Tweedie's formula for the case $\mathbf{x}_0 \sim \mathcal{N}(\boldsymbol{\mu}, \sigma_0^2\mathbf{I})$ (non-standard Gaussian). Show that $\hat{\mathbf{x}}_0 = \mathbb{E}[\mathbf{x}_0 \mid \mathbf{x}_t]$ matches the Tweedie prediction.

Show Hint

Compute the joint distribution of $(\mathbf{x}_0, \mathbf{x}_t)$ .

Use Gaussian conditioning to find $\mathbb{E}[\mathbf{x}_0 \mid \mathbf{x}_t]$ .

Solution

Joint distribution

$\mathbf{x}_t = \sqrt{\bar{\alpha}_t}\mathbf{x}_0 + \sqrt{1-\bar{\alpha}_t}\boldsymbol{\epsilon}$ , so $\mathbb{E}[\mathbf{x}_t] = \sqrt{\bar{\alpha}_t}\boldsymbol{\mu}$ , $\text{Var}(\mathbf{x}_t) = \bar{\alpha}_t\sigma_0^2 + (1-\bar{\alpha}_t)$ , $\text{Cov}(\mathbf{x}_0, \mathbf{x}_t) = \sqrt{\bar{\alpha}_t}\sigma_0^2$ .

Gaussian conditioning

$\mathbb{E}[\mathbf{x}_0 \mid \mathbf{x}_t] = \boldsymbol{\mu} + \frac{\sqrt{\bar{\alpha}_t}\sigma_0^2}{\bar{\alpha}_t\sigma_0^2 + 1-\bar{\alpha}_t}(\mathbf{x}_t - \sqrt{\bar{\alpha}_t}\boldsymbol{\mu}).$

Verify via Tweedie

The marginal $p_t(\mathbf{x}_t) = \mathcal{N}(\sqrt{\bar{\alpha}_t}\boldsymbol{\mu}, (\bar{\alpha}_t\sigma_0^2 + 1-\bar{\alpha}_t)\mathbf{I})$ , so $\nabla\log p_t = -(\bar{\alpha}_t\sigma_0^2+1-\bar{\alpha}_t)^{-1}(\mathbf{x}_t-\sqrt{\bar{\alpha}_t}\boldsymbol{\mu})$ . Substituting into Tweedie's formula yields the same expression. $\checkmark$

ex22-04

Easy

For DPS with guidance scale $\zeta$ and measurement noise variance $\sigma^2_{n}$ , what is the effective regularisation parameter in terms of the measurement residual? Compare with the proximal operator in PnP-ADMM (Chapter 21).

Show Hint

The DPS gradient is $-\frac{\zeta}{2\sigma^2_{n}}\nabla\|\mathbf{A}\hat{\mathbf{x}}_0 - \mathbf{y}\|^2$ .

Solution

Effective step size

The DPS gradient step has effective step size $\lambda_{\text{eff}} = \zeta / \sigma^2_{n}$ . This plays the same role as $1/\rho$ in PnP-ADMM, where $\rho$ is the penalty parameter.

Connection to PnP

In PnP-ADMM: $\mathbf{x}^{k+1} = D_\sigma(\mathbf{z}^k)$ (denoiser step), $\mathbf{z}^{k+1} = \arg\min \frac{1}{2\sigma^2_{n}}\|\mathbf{A}\mathbf{z}-\mathbf{y}\|^2 + \frac{\rho}{2}\|\mathbf{z}-\mathbf{x}^{k+1}\|^2$ (data step).

In DPS: the score network provides the denoiser, and the guidance gradient provides the data step. The ratio $\zeta/\sigma^2_{n}$ corresponds to $1/\rho$ — both control the balance between prior and data fidelity.

ex22-05

Easy

Name three advantages and three disadvantages of diffusion-based reconstruction compared to PnP methods for RF imaging.

Show Hint

Consider quality, speed, uncertainty, training, and generality.

Solution

Advantages

Higher reconstruction quality ( $+1$ -- $3$ dB PSNR)
Posterior sampling enables uncertainty quantification
Stronger prior captures complex scene statistics

Disadvantages

Higher computational cost ( $2$ -- $10\times$ more NFEs)
Requires more training data for the score network
Approximate posterior — no convergence guarantees

ex22-06

Medium

Derive the DPS guidance gradient for the nonlinear forward model $\mathbf{y} = f(\mathbf{x}_0) + \mathbf{n}$ where $\mathbf{n} \sim \mathcal{N}(\mathbf{0}, \sigma^2_{n}\mathbf{I})$ and $f$ is a differentiable function.

Show Hint

The log-likelihood is $-\frac{1}{2\sigma^2_{n}}\|\mathbf{y} - f(\hat{\mathbf{x}}_0)\|^2 + \text{const}$ .

Apply the chain rule through $f$ and the Tweedie estimate.

Solution

Log-likelihood

$\log p(\mathbf{y} \mid \hat{\mathbf{x}}_0) = -\frac{1}{2\sigma^2_{n}}\|\mathbf{y} - f(\hat{\mathbf{x}}_0)\|^2 + \text{const}.$

Chain rule

$\nabla_{\mathbf{x}_t}\log p(\mathbf{y} \mid \hat{\mathbf{x}}_0) = \frac{1}{\sigma^2_{n}}\frac{\partial\hat{\mathbf{x}}_0}{\partial\mathbf{x}_t}^T\,\mathbf{J}_f^T(\hat{\mathbf{x}}_0)\,(\mathbf{y} - f(\hat{\mathbf{x}}_0)),$ $where$ \mathbf{J}_f(\hat{\mathbf{x}}0) = \partial f/\partial\mathbf{x}\big|{\hat{\mathbf{x}}_0} $is the Jacobian of the forward model. In practice, this is computed via automatic differentiation through both$ f$ and the Tweedie estimate.

Comparison with linear case

For $f(\mathbf{x}) = \mathbf{A}\mathbf{x}$ , the Jacobian is $\mathbf{J}_f = \mathbf{A}$ , recovering the linear DPS gradient. $\blacksquare$

ex22-07

Medium

Consider a $1$ D compressed sensing problem with $\mathbf{A} \in \mathbb{R}^{m \times n}$ where $m = n/4$ (75% undersampling). The SVD gives $r = m$ nonzero singular values. What fraction of the reconstruction is determined by the measurements, and what fraction must be filled by the diffusion prior?

Show Hint

Apply the null-space preservation theorem.

The range space has dimension $r = m$ and the null space has dimension $n - r$ .

Solution

Dimensions

$r = m = n/4$ (range space dimension), $n - r = 3n/4$ (null space dimension).

Interpretation

Only $25\%$ of the reconstruction content is determined by the measurements. The remaining $75\%$ must be inferred by the diffusion prior. This makes the quality of the prior critical: a weak prior produces artefacts in the null-space components, while a strong prior fills in plausible content.

For RF imaging at 75% undersampling, the prior must capture scene-specific statistics (point scatterers, clutter) — a natural-image prior would hallucinate inappropriate textures.

ex22-08

Medium

Derive the DDNM correction formula:

$\hat{\mathbf{x}}_0^{\text{DDNM}} = \hat{\mathbf{x}}_0 - \mathbf{A}^\dagger(\mathbf{A}\hat{\mathbf{x}}_0 - \mathbf{y})$

and show that it satisfies $\mathbf{A}\hat{\mathbf{x}}_0^{\text{DDNM}} = \mathbf{y}$ when $\mathbf{A}$ has full row rank.

Show Hint

Use $\mathbf{A}\mathbf{A}^\dagger = \mathbf{I}_m$ for full row rank.

Solution

Apply the forward model

$\mathbf{A}\hat{\mathbf{x}}_0^{\text{DDNM}} = \mathbf{A}\hat{\mathbf{x}}_0 - \mathbf{A}\mathbf{A}^\dagger(\mathbf{A}\hat{\mathbf{x}}_0 - \mathbf{y}).$ $

Use pseudoinverse identity

For full row rank: $\mathbf{A}\mathbf{A}^\dagger = \mathbf{I}_m$ , so

$= \mathbf{A}\hat{\mathbf{x}}_0 - (\mathbf{A}\hat{\mathbf{x}}_0 - \mathbf{y}) = \mathbf{y}. \quad\blacksquare$

Null-space preservation

The correction $\mathbf{A}^\dagger(\mathbf{A}\hat{\mathbf{x}}_0 - \mathbf{y})$ lies in the row space of $\mathbf{A}$ . Therefore, the null-space component of $\hat{\mathbf{x}}_0$ is preserved: $(\mathbf{I} - \mathbf{A}^\dagger\mathbf{A})\hat{\mathbf{x}}_0^{\text{DDNM}} = (\mathbf{I} - \mathbf{A}^\dagger\mathbf{A})\hat{\mathbf{x}}_0$ .

ex22-09

Medium

Compare the per-step computational cost of DPS (with backpropagation) and DDRM (with SVD-based projection) for a forward model $\mathbf{A} \in \mathbb{R}^{m \times n}$ . Under what conditions is DDRM more efficient per step?

Show Hint

DPS per step: $C_{\text{net}} + 2C_{\text{net}} + O(mn)$ .

DDRM per step: $C_{\text{net}} + O(mn)$ (no backprop).

DDRM requires a one-time SVD: $O(\min(m,n)^2\max(m,n))$ .

Solution

Per-step cost

DPS: $3C_{\text{net}} + O(mn)$ (forward, backward, matrix-vector)
DDRM: $C_{\text{net}} + O(rn)$ (forward, projection via precomputed SVD, where $r = \text{rank}(\mathbf{A})$ )

DDRM is $\sim 3\times$ faster per step.

One-time SVD cost

The SVD of $\mathbf{A}$ costs $O(\min(m,n)^2\max(m,n))$ . This amortises over all $S$ steps, adding $O(\min(m,n)^2\max(m,n)/S)$ per step.

DDRM is overall more efficient when $S \gg \min(m,n)^2\max(m,n) / (2C_{\text{net}})$ . For typical imaging problems with moderate $m, n$ ( $< 10^4$ ) and $S \geq 100$ , DDRM wins. For large-scale RF imaging ( $m, n > 10^5$ ), the SVD is prohibitive and DPS is preferred.

ex22-10

Medium

In DiffPIR, the proximal data step has the closed-form solution (for $\mathbf{A}^{H}\mathbf{A} + \rho\mathbf{I}$ invertible):

$\mathbf{z}_k = (\mathbf{A}^{H}\mathbf{A} + \rho\mathbf{I})^{-1}(\mathbf{A}^{H}\mathbf{y} + \rho\hat{\mathbf{x}}_k).$

Show that as $\rho \to 0$ , this reduces to the pseudoinverse solution $\mathbf{A}^\dagger\mathbf{y}$ , and as $\rho \to \infty$ , it reduces to $\hat{\mathbf{x}}_k$ (pure prior).

Show Hint

Use the Woodbury identity or direct limit analysis.

Solution

Limit $\rho \to 0$

$\mathbf{z}_k \to (\mathbf{A}^{H}\mathbf{A})^{-1}\mathbf{A}^{H}\mathbf{y} = \mathbf{A}^\dagger\mathbf{y}$ (pseudoinverse, pure data fidelity, no prior).

Limit $\rho \to \infty$

$\mathbf{z}_k \to \rho^{-1}(\rho^{-1}\mathbf{A}^{H}\mathbf{A} + \mathbf{I})^{-1}(\rho^{-1}\mathbf{A}^{H}\mathbf{y} + \hat{\mathbf{x}}_k) \to \hat{\mathbf{x}}_k$ (pure prior, no data fidelity).

Interpretation

The parameter $\rho$ interpolates between data fidelity and prior. In DiffPIR, $\rho$ is scheduled to decrease over iterations: early iterations rely on the prior (high $\rho$ , low noise), later iterations enforce data fidelity (low $\rho$ ). $\blacksquare$

ex22-11

Medium

A DDIM sampler with $S$ uniformly spaced steps uses the time subsequence $\tau_i = T \cdot i/S$ for $i = 0, 1, \ldots, S$ . For the linear schedule $\beta_t = \beta_{\min} + (\beta_{\max}-\beta_{\min})\frac{t-1}{T-1}$ , show that each DDIM step covers approximately the same change in log-SNR $\Delta\log\text{SNR} \approx -\log(\text{SNR}(0)/\text{SNR}(T))/S$ .

Show Hint

Compute $\log\text{SNR}(t) = \log\bar{\alpha}_t - \log(1-\bar{\alpha}_t)$ .

For the linear schedule, $\log\text{SNR}$ is approximately linear in $t$ .

Solution

Log-SNR

$\log\text{SNR}(t) = \log\bar{\alpha}_t - \log(1-\bar{\alpha}_t)$ . For the linear schedule, $\bar{\alpha}_t$ decreases roughly exponentially, making $\log\text{SNR}$ approximately linear in $t$ .

Uniform steps in $t$

If $\log\text{SNR}$ is linear in $t$ , then uniform steps $\Delta t = T/S$ give uniform steps $\Delta\log\text{SNR} \approx (\log\text{SNR}(T) - \log\text{SNR}(0))/S$ .

Implication

Each DDIM step covers the same "amount of denoising" in the log-SNR sense. This is why uniform time subsampling works well for the linear schedule. For non-linear schedules (cosine), non-uniform subsampling may be preferred.

ex22-12

Medium

Show that the DPS guidance gradient for the noiseless case ( $\sigma^2_{n} \to 0$ ) becomes a hard projection:

$\lim_{\sigma^2_{n} \to 0} \frac{\zeta}{\sigma^2_{n}}\nabla_{\mathbf{x}_t}\frac{1}{2}\|\mathbf{A}\hat{\mathbf{x}}_0 - \mathbf{y}\|^2$

diverges unless $\mathbf{A}\hat{\mathbf{x}}_0 = \mathbf{y}$ exactly. What does this imply for the choice of $\zeta$ at different noise levels?

Show Hint

The gradient magnitude scales as $\zeta/\sigma^2_{n}$ .

For finite $\zeta$ , the gradient diverges as $\sigma^2_{n} \to 0$ unless the residual is zero.

Solution

Divergence analysis

The guidance gradient has magnitude $\propto \zeta\|\mathbf{A}\hat{\mathbf{x}}_0 - \mathbf{y}\|/\sigma^2_{n}$ . As $\sigma^2_{n} \to 0$ , this diverges unless $\mathbf{A}\hat{\mathbf{x}}_0 = \mathbf{y}$ .

Implication

In the noiseless limit, DPS requires the Tweedie estimate to be exactly measurement-consistent at every step, which is unrealistic. In practice, $\zeta$ must be decreased as $\sigma^2_{n}$ decreases, keeping $\zeta/\sigma^2_{n}$ bounded. A common heuristic: $\zeta = \zeta_0 \cdot \sigma^2_{n}$ , making the effective step size $\zeta_0$ independent of noise level.

ex22-13

Hard

Derive the $\Pi$ GDM likelihood approximation. Starting from the Gaussian approximation $p(\mathbf{x}_0 \mid \mathbf{x}_t) \approx \mathcal{N}(\hat{\mathbf{x}}_0, r_t^2\mathbf{I})$ , show that the marginal likelihood is:

$p(\mathbf{y} \mid \mathbf{x}_t) \approx \mathcal{N}(\mathbf{A}\hat{\mathbf{x}}_0,\; \sigma^2_{n}\mathbf{I} + r_t^2\mathbf{A}\mathbf{A}^{H})$

and derive the corresponding guidance gradient.

Show Hint

Marginalise: $p(\mathbf{y}|\mathbf{x}_t) = \int p(\mathbf{y}|\mathbf{x}_0)p(\mathbf{x}_0|\mathbf{x}_t)d\mathbf{x}_0$ .

Both factors are Gaussian; the integral is a convolution of Gaussians.

Solution

Marginalisation

$\mathbf{y} \mid \mathbf{x}_t$ : $\mathbf{x}_0 \sim \mathcal{N}(\hat{\mathbf{x}}_0, r_t^2\mathbf{I})$ , $\mathbf{y} \mid \mathbf{x}_0 \sim \mathcal{N}(\mathbf{A}\mathbf{x}_0, \sigma^2_{n}\mathbf{I})$ . By the convolution property: $\mathbf{y} \mid \mathbf{x}_t \sim \mathcal{N}(\mathbf{A}\hat{\mathbf{x}}_0, \sigma^2_{n}\mathbf{I} + r_t^2\mathbf{A}\mathbf{A}^{H})$ .

Log-likelihood gradient

$\nabla_{\hat{\mathbf{x}}_0}\log p(\mathbf{y} \mid \mathbf{x}_t) = \mathbf{A}^{H}(\sigma^2_{n}\mathbf{I} + r_t^2\mathbf{A}\mathbf{A}^{H})^{-1}(\mathbf{y} - \mathbf{A}\hat{\mathbf{x}}_0).$ $

Guidance gradient

Apply the chain rule through the Tweedie estimate: $\nabla_{\mathbf{x}_t}\log p(\mathbf{y} \mid \mathbf{x}_t) = \frac{\partial\hat{\mathbf{x}}_0}{\partial\mathbf{x}_t}^T\mathbf{A}^{H}(\sigma^2_{n}\mathbf{I} + r_t^2\mathbf{A}\mathbf{A}^{H})^{-1}(\mathbf{y} - \mathbf{A}\hat{\mathbf{x}}_0).$

At large $t$ : $r_t^2$ is large, the covariance is dominated by $r_t^2\mathbf{A}\mathbf{A}^{H}$ , and guidance is weak (prior dominates). At small $t$ : $r_t^2 \to 0$ , recovering the DPS gradient. $\blacksquare$

ex22-14

Hard

Prove that the DDRM reconstruction is measurement-consistent: $\mathbf{A}\hat{\mathbf{x}} = \mathbf{y}$ for the noiseless case. Use the SVD decomposition $\mathbf{A} = \mathbf{U}_r\boldsymbol{\Sigma}_r\mathbf{V}_r^H$ .

Show Hint

Write $\hat{\mathbf{x}} = \mathbf{V}_r\boldsymbol{\Sigma}_r^{-1}\mathbf{U}_r^H\mathbf{y} + \mathbf{V}_\perp\boldsymbol{\eta}$ .

Apply $\mathbf{A}$ and use $\mathbf{A}\mathbf{V}_\perp = \mathbf{0}$ .

Solution

Apply the forward model

$\mathbf{A}\hat{\mathbf{x}} = \mathbf{U}_r\boldsymbol{\Sigma}_r\mathbf{V}_r^H(\mathbf{V}_r\boldsymbol{\Sigma}_r^{-1}\mathbf{U}_r^H\mathbf{y} + \mathbf{V}_\perp\boldsymbol{\eta}).$ $

Simplify

Using $\mathbf{V}_r^H\mathbf{V}_r = \mathbf{I}_r$ and $\mathbf{V}_r^H\mathbf{V}_\perp = \mathbf{0}$ :

$= \mathbf{U}_r\boldsymbol{\Sigma}_r\boldsymbol{\Sigma}_r^{-1}\mathbf{U}_r^H\mathbf{y} = \mathbf{U}_r\mathbf{U}_r^H\mathbf{y}.$

Since $\mathbf{y} \in \text{range}(\mathbf{A}) = \text{span}(\mathbf{U}_r)$ , we have $\mathbf{U}_r\mathbf{U}_r^H\mathbf{y} = \mathbf{y}$ . $\blacksquare$

ex22-15

Hard

A radar system produces measurements $\mathbf{y} = \mathbf{A}\mathbf{c} + \mathbf{w}$ where $\mathbf{A} \in \mathbb{C}^{m \times n}$ with $m = 1000$ , $n = 4000$ , and $\text{rank}(\mathbf{A}) = 1000$ . A DPS reconstruction with $S = 200$ DDIM steps and a U-Net with 100M parameters takes 60 seconds. Design an acceleration strategy to bring this to under 5 seconds while maintaining measurement consistency.

Show Hint

Consider DPM-Solver++ with $S = 20$ -- $25$ steps.

Consider reducing the network size or using latent diffusion.

Solution

Step reduction

DPM-Solver++ with $S = 25$ : reduces time from 60s to $60 \times 25/200 = 7.5$ seconds. Still above 5s.

Combined acceleration

DPM-Solver++ ( $S = 20$ ) + gradient checkpointing (saves memory, costs $\sim 30\%$ more compute per step): $60 \times 20/200 \times 1.0 = 6$ seconds.

Alternatively: DPM-Solver++ ( $S = 20$ ) without backpropagation (replace DPS guidance with DDNM-style projection): $60 \times 20/200 \times (1/3) = 2$ seconds. This sacrifices the gradient-based guidance but is sufficient for a structured $\mathbf{A}$ .

Recommendation

For $\mathbf{A}$ with known SVD: DDNM + DPM-Solver++ ( $S = 20$ ) achieves $\sim 2$ seconds with exact measurement consistency. For general $\mathbf{A}$ : DPS + DPM-Solver++ ( $S = 25$ ) achieves $\sim 5$ seconds with approximate consistency.

ex22-16

Hard

Derive the relationship between the DPS guidance gradient and the MAP estimator. Show that in the limit of infinitely many diffusion steps (continuous time), DPS with deterministic (DDIM) sampling converges to a gradient descent on the MAP objective $-\log p(\mathbf{x}) + \frac{1}{2\sigma^2_{n}}\|\mathbf{y} - \mathbf{A}\mathbf{x}\|^2$ .

Show Hint

In the continuous-time limit, the DDIM trajectory follows the probability flow ODE.

The score $\nabla\log p_t$ converges to $\nabla\log p_0 = \nabla\log p$ as $t \to 0$ .

Solution

Probability flow ODE

The DDIM trajectory in continuous time follows: $d\mathbf{x} = [\mathbf{f}(\mathbf{x}, t) - \frac{1}{2}g(t)^2\nabla\log p_t(\mathbf{x})]dt$ .

Add DPS guidance

With guidance: $d\mathbf{x} = [\mathbf{f}(\mathbf{x}, t) - \frac{1}{2}g(t)^2(\nabla\log p_t(\mathbf{x}) + \frac{\zeta}{\sigma^2_{n}}\nabla\log p(\mathbf{y}|\hat{\mathbf{x}}_0))]dt$ .

Limiting behaviour

As $t \to 0$ : $p_t \to p_0 = p$ (data distribution), $\hat{\mathbf{x}}_0 \to \mathbf{x}$ (Tweedie estimate converges to the observation). The ODE becomes a gradient flow on: $\nabla[-\log p(\mathbf{x}) - \frac{\zeta}{2\sigma^2_{n}}\|\mathbf{y}-\mathbf{A}\mathbf{x}\|^2]$ , which is the MAP objective with regularisation weight $\zeta$ . $\blacksquare$

ex22-17

Hard

For a complex-valued SAR scene $\mathbf{c} \in \mathbb{C}^n$ , a diffusion model is trained on the 2-channel representation $[\text{Re}(\mathbf{c}), \text{Im}(\mathbf{c})]$ . Show that applying DPS with the linear forward model $\mathbf{y} = \mathbf{A}\mathbf{c} + \mathbf{w}$ (complex-valued) is equivalent to DPS on a real-valued system of twice the dimension.

Show Hint

Write $\mathbf{A} = \mathbf{A}_{R} + j\mathbf{A}_{I}$ and expand the measurement equation into real and imaginary parts.

Solution

Real-valued embedding

Define $\tilde{\mathbf{x}} = [\text{Re}(\mathbf{c})^T, \text{Im}(\mathbf{c})^T]^T \in \mathbb{R}^{2n}$ , $\tilde{\mathbf{y}} = [\text{Re}(\mathbf{y})^T, \text{Im}(\mathbf{y})^T]^T \in \mathbb{R}^{2m}$ , and the real-valued sensing matrix: $\tilde{\mathbf{A}} = \begin{bmatrix} \mathbf{A}_{R} & -\mathbf{A}_{I} \\ \mathbf{A}_{I} & \mathbf{A}_{R} \end{bmatrix} \in \mathbb{R}^{2m \times 2n}.$

Equivalence

The complex measurement $\mathbf{y} = \mathbf{A}\mathbf{c} + \mathbf{w}$ is equivalent to $\tilde{\mathbf{y}} = \tilde{\mathbf{A}}\tilde{\mathbf{x}} + \tilde{\mathbf{w}}$ .

DPS on the 2-channel representation with $\tilde{\mathbf{A}}$ is mathematically identical to complex-valued DPS. The score network operates on $\tilde{\mathbf{x}} \in \mathbb{R}^{2n}$ and the guidance gradient is computed via the real-valued chain rule. $\blacksquare$

ex22-18

Challenge

Design a physics-constrained diffusion training scheme for the RF imaging forward model $\mathbf{y} = \mathbf{A}\mathbf{c} + \mathbf{w}$ . The training objective should combine DSM loss with a measurement consistency loss. Derive the gradient of the combined objective with respect to the network parameters $\theta$ .

Show Hint

Use $\mathcal{L} = \mathcal{L}_{\text{DSM}} + \lambda\mathcal{L}_{\text{physics}}$ .

The physics loss involves the Tweedie estimate, which depends on $\theta$ .

Solution

Combined objective

$\mathcal{L}(\theta) = \underbrace{\mathbb{E}_{t,\mathbf{x}_0,\boldsymbol{\epsilon}}\|\boldsymbol{\epsilon} - \boldsymbol{\epsilon}_\theta(\mathbf{x}_t, t)\|^2}_{\mathcal{L}_{\text{DSM}}} + \lambda\underbrace{\mathbb{E}_{t,\mathbf{x}_0,\boldsymbol{\epsilon}}\|\mathbf{y} - \mathbf{A}\hat{\mathbf{x}}_0(\mathbf{x}_t;\theta)\|^2}_{\mathcal{L}_{\text{physics}}}$ $where$ \hat{\mathbf{x}}_0(\mathbf{x}_t;\theta) = (\mathbf{x}_t - \sqrt{1-\bar{\alpha}t}\boldsymbol{\epsilon}\theta(\mathbf{x}_t,t))/\sqrt{\bar{\alpha}_t}$.

Gradient of physics loss

$\nabla_\theta\mathcal{L}_{\text{physics}} = -\frac{2\lambda}{\sqrt{\bar{\alpha}_t}}\mathbb{E}\left[(\mathbf{y}-\mathbf{A}\hat{\mathbf{x}}_0)^T\mathbf{A}\frac{\sqrt{1-\bar{\alpha}_t}}{\sqrt{\bar{\alpha}_t}}\nabla_\theta\boldsymbol{\epsilon}_\theta\right].$ $

This requires backpropagation through the network (same as DPS guidance, but during training rather than inference).

Training procedure

Sample $(\mathbf{x}_0, \mathbf{y})$ from the simulation database
Sample $t, \boldsymbol{\epsilon}$ ; compute $\mathbf{x}_t$
Forward pass: $\boldsymbol{\epsilon}_\theta(\mathbf{x}_t, t)$
Compute $\hat{\mathbf{x}}_0$ via Tweedie
Compute $\mathcal{L} = \|\boldsymbol{\epsilon} - \boldsymbol{\epsilon}_\theta\|^2 + \lambda\|\mathbf{y} - \mathbf{A}\hat{\mathbf{x}}_0\|^2$
Backpropagate and update $\theta$

ex22-19

Challenge

Prove that the DDIM sampler is a first-order exponential integrator for the probability flow ODE. Start from the ODE:

$\frac{d\mathbf{x}}{dt} = f(t)\mathbf{x} + \frac{g(t)^2}{2\sigma_t}\boldsymbol{\epsilon}_\theta(\mathbf{x}, t)$

and show that the DDIM update is the exact solution of this ODE with a piecewise-constant approximation of $\boldsymbol{\epsilon}_\theta$ .

Show Hint

The linear part $f(t)\mathbf{x}$ can be solved exactly via the integrating factor.

Treat $\boldsymbol{\epsilon}_\theta$ as constant over $[t, t-\Delta t]$ .

Solution

Integrating factor

Define $\mathbf{z}(t) = e^{-\int_T^t f(s)ds}\mathbf{x}(t)$ . Then $d\mathbf{z}/dt = e^{-\int_T^t f(s)ds}\frac{g(t)^2}{2\sigma_t}\boldsymbol{\epsilon}_\theta$ .

Piecewise-constant approximation

Approximate $\boldsymbol{\epsilon}_\theta(\mathbf{x}(s), s) \approx \boldsymbol{\epsilon}_\theta(\mathbf{x}(t), t)$ for $s \in [t-\Delta t, t]$ . Integrating:

$\mathbf{z}(t-\Delta t) - \mathbf{z}(t) = \boldsymbol{\epsilon}_\theta(\mathbf{x}(t), t)\int_{t}^{t-\Delta t}e^{-\int_T^s f(\tau)d\tau}\frac{g(s)^2}{2\sigma_s}ds.$

Connection to DDIM

For the VP-SDE with $f(t) = -\frac{1}{2}\beta(t)$ and $g(t) = \sqrt{\beta(t)}$ , evaluating the integral and transforming back to $\mathbf{x}$ coordinates recovers the DDIM update formula. The approximation error is $O(\Delta t^2)$ (first-order), which explains why DDIM needs $\sim 50$ -- $200$ steps for good quality. $\blacksquare$

ex22-20

Challenge

Consider using DPS for a non-Gaussian measurement model: $y_i \sim \text{Poisson}(\mathbf{A}_{i}^{T}\mathbf{x}_0)$ (photon-counting model, relevant for low-dose imaging). Derive the DPS guidance gradient and discuss the challenges compared to the Gaussian case.

Show Hint

The Poisson log-likelihood is $\sum_i [y_i\log(\mathbf{A}_{i}^{T}\hat{\mathbf{x}}_0) - \mathbf{A}_{i}^{T}\hat{\mathbf{x}}_0]$ .

The gradient involves $y_i/(\mathbf{A}_{i}^{T}\hat{\mathbf{x}}_0)$ , which diverges when the estimate is near zero.

Solution

Poisson log-likelihood

$\log p(\mathbf{y} \mid \hat{\mathbf{x}}_0) = \sum_{i=1}^m\left[y_i\log(\mathbf{A}_{i}^{T}\hat{\mathbf{x}}_0) - \mathbf{A}_{i}^{T}\hat{\mathbf{x}}_0 - \log(y_i!)\right].$ $

Gradient with respect to $\hat{\mathbf{x}}_0$

$\nabla_{\hat{\mathbf{x}}_0}\log p(\mathbf{y} \mid \hat{\mathbf{x}}_0) = \sum_{i=1}^m\left(\frac{y_i}{\mathbf{A}_{i}^{T}\hat{\mathbf{x}}_0} - 1\right)\mathbf{A}_{i} = \mathbf{A}^{T}\left(\frac{\mathbf{y}}{\mathbf{A}\hat{\mathbf{x}}_0} - \mathbf{1}\right),$ $

where the division is element-wise.

DPS guidance gradient

$\nabla_{\mathbf{x}_t}\log p(\mathbf{y} \mid \hat{\mathbf{x}}_0) = \frac{\partial\hat{\mathbf{x}}_0}{\partial\mathbf{x}_t}^T\mathbf{A}^{T}\left(\frac{\mathbf{y}}{\mathbf{A}\hat{\mathbf{x}}_0} - \mathbf{1}\right).$ $

Challenges

Numerical instability: When $\mathbf{A}_{i}^{T}\hat{\mathbf{x}}_0 \to 0$ , the gradient diverges. Requires clipping or smoothing.
Non-negativity: $\mathbf{x}_0$ must be non-negative for the Poisson model to be valid. The diffusion model may generate negative values, requiring projection.
No closed-form proximal: Unlike the Gaussian case, there is no closed-form solution for the Poisson data-fidelity step in DiffPIR-style methods.

Exercises

ex22-01

Log-density

Differentiate

ex22-02

Numerical computation

Interpretation

ex22-03

Joint distribution

Gaussian conditioning

Verify via Tweedie

ex22-04

Effective step size

Connection to PnP

ex22-05

Advantages

Disadvantages

ex22-06

Log-likelihood

Chain rule

Comparison with linear case

ex22-07

Dimensions

Interpretation

ex22-08

Apply the forward model

Use pseudoinverse identity

Null-space preservation

ex22-09

Per-step cost

One-time SVD cost

ex22-10

Limit $\rho \to 0$

Limit $\rho \to \infty$

Interpretation

ex22-11

Log-SNR

Uniform steps in $t$

Implication

ex22-12

Divergence analysis

Implication

ex22-13

Marginalisation

Log-likelihood gradient

Guidance gradient

ex22-14

Apply the forward model

Simplify

ex22-15

Step reduction

Combined acceleration

Recommendation

ex22-16

Probability flow ODE

Add DPS guidance

Limiting behaviour

ex22-17

Real-valued embedding

Equivalence

ex22-18

Combined objective

Gradient of physics loss

Training procedure

ex22-19

Integrating factor

Piecewise-constant approximation

Connection to DDIM

ex22-20

Poisson log-likelihood

Gradient with respect to $\hat{\mathbf{x}}_0$

DPS guidance gradient

Challenges