Ferkans — Interactive Telecom Tutor

The Computational Bottleneck of Diffusion-Based Reconstruction

The main practical limitation of diffusion-based inverse problem solvers is computational cost. Standard DPS with $T = 1000$ steps requires $\sim 2000$ NFEs per reconstruction — orders of magnitude more than PnP methods ( $\sim 50$ -- $200$ iterations) or unrolled networks ( $\sim 5$ -- $15$ layers). This section surveys acceleration techniques and provides a quantitative cost comparison to guide method selection.

Definition:
DDIM Acceleration

Denoising Diffusion Implicit Models (DDIM) replace the stochastic DDPM reverse process with a deterministic one, enabling larger step sizes. The DDIM update is:

$\mathbf{x}_{t-\Delta t} = \sqrt{\bar{\alpha}_{t-\Delta t}}\,\hat{\mathbf{x}}_0(\mathbf{x}_t) + \sqrt{1 - \bar{\alpha}_{t-\Delta t}}\,\frac{\mathbf{x}_t - \sqrt{\bar{\alpha}_t}\,\hat{\mathbf{x}}_0(\mathbf{x}_t)}{\sqrt{1 - \bar{\alpha}_t}},$

where $\hat{\mathbf{x}}_0(\mathbf{x}_t)$ is the Tweedie estimate.

DDIM with $S = 50$ -- $200$ steps (uniformly subsampled from the original $T = 1000$ schedule) achieves quality comparable to DDPM with $T$ steps.

For inverse problems, DDIM is preferred for two reasons: (1) fewer steps mean proportionally fewer NFEs, and (2) the deterministic process produces consistent reconstructions — the same initialisation always gives the same output, which is essential for reproducibility in scientific imaging.

Definition:
DPM-Solver Acceleration

DPM-Solver treats the probability-flow ODE as a semi-linear ODE and applies exponential integrators. The key observation is that the diffusion ODE has the form:

$\frac{d\mathbf{x}}{dt} = f(t)\mathbf{x} + g(t)\boldsymbol{\epsilon}_\theta(\mathbf{x}, t),$

where $f(t)\mathbf{x}$ is a linear term (exactly solvable) and $g(t)\boldsymbol{\epsilon}_\theta$ is a nonlinear correction. DPM-Solver applies high-order Taylor expansion to the nonlinear part, achieving high-quality samples in $10$ -- $25$ steps.

DPM-Solver++ extends this to guided diffusion, making it directly applicable to DPS-style inverse problem solvers.

Definition:
Consistency Models

Consistency models learn a direct mapping from any noisy $\mathbf{x}_t$ to the clean estimate $\hat{\mathbf{x}}_0$ :

$f_\theta(\mathbf{x}_t, t) \approx \mathbf{x}_0 \qquad \text{for all } t \in [0, T].$

The model is trained to be self-consistent: for any two points on the same diffusion trajectory, $f_\theta(\mathbf{x}_s, s) = f_\theta(\mathbf{x}_t, t)$ .

For inverse problems, a consistency model produces a reconstruction in $1$ -- $4$ NFEs, with measurement guidance applied at each step. The quality-speed tradeoff is more aggressive than DDIM: faster but with $1$ -- $3$ dB PSNR loss on imaging benchmarks.

Theorem: NFE Scaling for Diffusion-Based Reconstruction

For a diffusion-based inverse problem solver with $S$ reverse steps, the total computational cost is:

$C_{\text{total}} = S \cdot \bigl(\underbrace{C_{\text{net}}}_{\text{score evaluation}} + \underbrace{C_{\text{back}}}_{\text{backpropagation (if DPS)}} + \underbrace{C_{\mathbf{A}}}_{\text{forward model}}\bigr),$

where:

$C_{\text{net}} \propto D \cdot W^2$ for a U-Net with depth $D$ and width $W$
$C_{\text{back}} \approx 2\,C_{\text{net}}$ (backpropagation costs $\sim 2\times$ forward)
$C_{\mathbf{A}}$ is the cost of one forward/adjoint model evaluation

For methods without backpropagation (DDRM, DDNM, DiffPIR), the $C_{\text{back}}$ term is absent, reducing the per-step cost by $\sim 3\times$ .

The cost scales linearly with the number of steps $S$ and the network size. Reducing $S$ (via DDIM, DPM-Solver, or consistency models) is the primary lever for acceleration. Reducing the network size (via smaller architectures or latent diffusion) is the secondary lever.

Proof

Per-step cost breakdown

Each step involves: (1) score evaluation via $\mathbf{s}_\theta(\mathbf{x}_t, t)$ — cost $C_{\text{net}}$ ; (2) for DPS, backpropagation $\nabla_{\mathbf{x}_t}\|\mathbf{A}\hat{\mathbf{x}}_0 - \mathbf{y}\|^2$ — cost $\sim 2C_{\text{net}} + C_{\mathbf{A}}$ ; (3) for projection methods, $\mathbf{A}\hat{\mathbf{x}}_0$ and $\mathbf{A}^{H}\mathbf{r}$ — cost $2C_{\mathbf{A}}$ .

Total cost

Summing over $S$ steps: $C_{\text{total}} = S \cdot (C_{\text{net}} + C_{\text{back}} + C_{\mathbf{A}})$ . For DPS with $S = 200$ : $C_{\text{total}} \approx 200 \times 3C_{\text{net}} = 600C_{\text{net}}$ . For DDRM with $S = 200$ : $C_{\text{total}} \approx 200 \times C_{\text{net}} = 200C_{\text{net}}$ . $\blacksquare$

Computational Cost Comparison

Compare the computational cost (in NFEs) and reconstruction quality (PSNR) for different reconstruction paradigms. The plot positions each method in the NFE-vs-quality plane, revealing the Pareto frontier. Diffusion methods dominate at the high-quality end but at high NFE cost; PnP methods occupy the middle ground; unrolled networks provide the fastest reconstruction at the cost of task-specific training.

Parameters

Inverse problem type

Reconstruction Paradigm Comparison

Method	NFEs	Training Required	Posterior Samples	Typical PSNR Gap
PnP-DRUNet (Ch. 21)	50--200	Denoiser only	No	Baseline
Unrolled OAMP (Ch. 18)	5--15	End-to-end	No	$+0.5$ to $+1.5$ dB
DPS (DDIM, $S=200$ )	400	Score network	Yes	$+1$ to $+3$ dB
DDRM ( $S=200$ )	200	Score network	Yes	$+0.5$ to $+2$ dB
DiffPIR ( $K=15$ , $S=5$ )	100--150	Score network	Partial	$+0.5$ to $+1.5$ dB
Consistency DPS ( $S=4$ )	8	Distilled model	No	$-0.5$ to $+0.5$ dB

Example: Accelerating DPS for a $256 \times 256$ Reconstruction

A DPS reconstruction with $T = 1000$ takes 5 minutes on an A100 GPU. Compare acceleration strategies to bring this to under 30 seconds.

Solution

DDIM subsampling

Reduce to $S = 100$ steps (uniform subsampling):

Time: $\sim 30$ seconds
Quality: $\sim 0.5$ dB PSNR loss
Still provides posterior samples via different $\mathbf{x}_T$ seeds

DPM-Solver++

Use the 2nd-order DPM-Solver++ with $S = 25$ steps:

Time: $\sim 8$ seconds
Quality: $\sim 1.0$ dB PSNR loss
Deterministic (no posterior sampling)

Consistency model

Distill the score network into a consistency model, use $S = 4$ :

Time: $\sim 1.5$ seconds
Quality: $\sim 2.0$ dB PSNR loss
Requires additional training (distillation)

Recommendation

For RF imaging where quality is paramount: DDIM with $S = 100$ -- $200$ . For real-time radar: consistency models or PnP methods. The choice depends on whether posterior sampling (uncertainty quantification) is needed.

,

⚠️Engineering Note

GPU Memory Requirements for DPS

DPS requires storing the full computational graph for backpropagation through the score network at each step. For a U-Net with $\sim 100$ M parameters processing a $256 \times 256$ image, the peak GPU memory is:

Forward pass: $\sim 2$ GB
Activations for backprop: $\sim 6$ -- $10$ GB
Total: $\sim 8$ -- $12$ GB per sample

For batch processing (multiple posterior samples in parallel), memory scales linearly. Gradient checkpointing reduces memory to $\sim 4$ -- $6$ GB at the cost of $\sim 30\%$ additional computation. For $512 \times 512$ images, an A100 (80 GB) is typically required.

Practical Constraints

•
RTX 3090 (24 GB): single $256 \times 256$ sample with gradient checkpointing
•
A100 (80 GB): batch of 4--8 samples at $256 \times 256$ or single $512 \times 512$
•
Latent diffusion reduces memory by $4$ -- $16\times$ via compression

Common Mistake: DDIM Produces Point Estimates, Not Posterior Samples

Mistake:

Using DDIM for uncertainty quantification by running multiple reconstructions with different random seeds.

Correction:

DDIM is a deterministic sampler: given the same initial noise $\mathbf{x}_T$ , it always produces the same reconstruction. Different $\mathbf{x}_T$ seeds produce different outputs, but these are not valid posterior samples — they are different MAP-like estimates. For posterior sampling (and hence uncertainty quantification), the stochastic DDPM reverse process must be used, or a stochastic variant of DDIM with nonzero noise injection.

Quick Check

Which acceleration technique preserves the ability to generate diverse posterior samples?

DDIM (deterministic)

DPM-Solver (deterministic ODE)

Stochastic DDPM with fewer steps

Consistency models

Correction:

Stochastic DDPM with fewer steps

Both DDIM and DPM-Solver are deterministic ODE solvers that produce a single reconstruction per initialisation. Stochastic DDPM with reduced steps (e.g., $S = 100$ ) maintains the noise injection at each step, preserving the ability to sample from the posterior.

DDIM (Denoising Diffusion Implicit Models)

A deterministic variant of the DDPM reverse process that enables accelerated sampling by using larger step sizes. DDIM produces the same output for the same initial noise seed, enabling reproducible reconstructions.

Consistency Model

A model trained to map any noisy point $\mathbf{x}_t$ directly to the clean image $\hat{\mathbf{x}}_0$ in one or a few steps. Achieves extreme acceleration ( $1$ -- $4$ NFEs) at the cost of some quality degradation.

Key Takeaway

The computational cost of diffusion-based reconstruction scales linearly with the number of steps $S$ and the network size. DDIM reduces $S$ from $1000$ to $50$ -- $200$ with minimal quality loss; DPM-Solver++ achieves $10$ -- $25$ steps; consistency models reach $1$ -- $4$ steps with more aggressive quality tradeoffs. For RF imaging, the choice is between quality (more steps, posterior sampling) and speed (fewer steps, deterministic). The Pareto frontier shows that diffusion methods dominate PnP methods in quality but at $2$ -- $10\times$ higher computational cost.

Computational Cost and Acceleration