Diffusion Posterior Sampling (DPS)

From Priors to Posteriors: Guiding Diffusion with Measurements

Diffusion models learn the prior p(x)p(\mathbf{x}). For inverse problems we need the posterior p(xy)p(\mathbf{x} \mid \mathbf{y}). By Bayes' rule:

p(xy)p(yx)p(x).p(\mathbf{x} \mid \mathbf{y}) \propto p(\mathbf{y} \mid \mathbf{x})\,p(\mathbf{x}).

Diffusion Posterior Sampling (DPS) modifies the reverse diffusion process to incorporate the likelihood p(yx)p(\mathbf{y} \mid \mathbf{x}), steering the generative trajectory toward images that explain the measurements. The result is (approximate) posterior sampling: each run of DPS with a different noise seed produces a different plausible reconstruction, enabling uncertainty quantification.

Definition:

Posterior Score Decomposition

At diffusion time tt, the posterior score decomposes as:

xtlogpt(xty)=xtlogpt(xt)prior score+xtlogpt(yxt)likelihood score.\nabla_{\mathbf{x}_t}\log p_t(\mathbf{x}_t \mid \mathbf{y}) = \underbrace{\nabla_{\mathbf{x}_t}\log p_t(\mathbf{x}_t)}_{\text{prior score}} + \underbrace{\nabla_{\mathbf{x}_t}\log p_t(\mathbf{y} \mid \mathbf{x}_t)}_{\text{likelihood score}}.

The prior score is provided by the pretrained score network sθ(xt,t)logpt(xt)\mathbf{s}_\theta(\mathbf{x}_t, t) \approx \nabla\log p_t(\mathbf{x}_t).

The likelihood score is intractable because pt(yxt)=p(yx0)p(x0xt)dx0p_t(\mathbf{y} \mid \mathbf{x}_t) = \int p(\mathbf{y} \mid \mathbf{x}_0)\,p(\mathbf{x}_0 \mid \mathbf{x}_t)\,d\mathbf{x}_0 involves marginalising over the unknown x0\mathbf{x}_0. Different methods (DPS, DDRM, MCG) differ in how they approximate this intractable term.

Definition:

Diffusion Posterior Sampling (DPS)

DPS approximates the likelihood score using the Tweedie estimate x^0(xt)\hat{\mathbf{x}}_0(\mathbf{x}_t) as a plug-in for x0\mathbf{x}_0. The modified reverse step is:

xt1=DDPM_step(xt,sθ)ζxt ⁣[12σn2yAx^0(xt)2],\mathbf{x}_{t-1} = \text{DDPM\_step}(\mathbf{x}_t, \mathbf{s}_\theta) - \zeta\,\nabla_{\mathbf{x}_t}\!\left[\frac{1}{2\sigma^2_{n}}\|\mathbf{y} - \mathbf{A}\hat{\mathbf{x}}_0(\mathbf{x}_t)\|^2\right],

where:

  • x^0(xt)=(xt+(1αˉt)sθ(xt,t))/αˉt\hat{\mathbf{x}}_0(\mathbf{x}_t) = (\mathbf{x}_t + (1-\bar{\alpha}_t)\mathbf{s}_\theta(\mathbf{x}_t, t))/\sqrt{\bar{\alpha}_t} is the Tweedie estimate
  • A\mathbf{A} is the forward model (sensing matrix)
  • σn2\sigma^2_{n} is the measurement noise variance
  • ζ>0\zeta > 0 is the guidance scale
  • The gradient xt\nabla_{\mathbf{x}_t} is computed via automatic differentiation through the Tweedie estimate

The DPS approximation replaces the intractable marginal likelihood pt(yxt)p_t(\mathbf{y} \mid \mathbf{x}_t) with the point-estimate likelihood p(yx^0(xt))p(\mathbf{y} \mid \hat{\mathbf{x}}_0(\mathbf{x}_t)). This is exact only when the posterior p(x0xt)p(\mathbf{x}_0 \mid \mathbf{x}_t) is concentrated (low noise), and becomes increasingly approximate at high noise levels.

Theorem: DPS Likelihood Guidance Gradient

For the linear Gaussian model y=Ax0+n\mathbf{y} = \mathbf{A}\mathbf{x}_0 + \mathbf{n} with nN(0,σn2I)\mathbf{n} \sim \mathcal{N}(\mathbf{0}, \sigma^2_{n}\mathbf{I}), the DPS guidance gradient is:

xt ⁣[12σn2yAx^02]=1σn2x^0xt ⁣AH(Ax^0y).\nabla_{\mathbf{x}_t}\!\left[\frac{1}{2\sigma^2_{n}}\|\mathbf{y} - \mathbf{A}\hat{\mathbf{x}}_0\|^2\right] = \frac{1}{\sigma^2_{n}}\,\frac{\partial\hat{\mathbf{x}}_0}{\partial\mathbf{x}_t}^{\!\top}\mathbf{A}^{H}(\mathbf{A}\hat{\mathbf{x}}_0 - \mathbf{y}).

The Jacobian x^0/xt\partial\hat{\mathbf{x}}_0/\partial\mathbf{x}_t involves the score network's Jacobian, computed via backpropagation.

The gradient pushes xt\mathbf{x}_t in the direction that reduces the measurement residual Ax^0y2\|\mathbf{A}\hat{\mathbf{x}}_0 - \mathbf{y}\|^2. The chain rule through the Tweedie estimate ensures the correction is applied at the appropriate noise level.

DPS Algorithm for Linear Inverse Problems

Complexity: O(T(Cnet+CA))O(T \cdot (C_{\text{net}} + C_{\mathbf{A}})) where CnetC_{\text{net}} includes the backpropagation cost
Input: Measurements y\mathbf{y}, forward model A\mathbf{A},
score network sθ\mathbf{s}_\theta, noise schedule {αˉt}t=T0\{\bar{\alpha}_t\}_{t=T}^0,
guidance scale ζ\zeta, measurement noise variance σn2\sigma^2_{n}.
1. Sample xTN(0,I)\mathbf{x}_T \sim \mathcal{N}(\mathbf{0}, \mathbf{I})
2. for t=T,T1,,1t = T, T-1, \ldots, 1 do
3. \quad Compute score: s=sθ(xt,t)\mathbf{s} = \mathbf{s}_\theta(\mathbf{x}_t, t)
4. \quad Tweedie estimate: x^0=(xt+(1αˉt)s)/αˉt\hat{\mathbf{x}}_0 = (\mathbf{x}_t + (1-\bar{\alpha}_t)\mathbf{s})/\sqrt{\bar{\alpha}_t}
5. \quad Measurement residual: r=Ax^0y\mathbf{r} = \mathbf{A}\hat{\mathbf{x}}_0 - \mathbf{y}
6. \quad Guidance gradient: g=xtr2\mathbf{g} = \nabla_{\mathbf{x}_t}\|\mathbf{r}\|^2 \quad(via backprop)
7. \quad DDPM update: x~t1=DDPM_step(xt,s)\tilde{\mathbf{x}}_{t-1} = \text{DDPM\_step}(\mathbf{x}_t, \mathbf{s})
8. \quad Guided step: xt1=x~t1ζ2σn2g\mathbf{x}_{t-1} = \tilde{\mathbf{x}}_{t-1} - \frac{\zeta}{2\sigma^2_{n}}\,\mathbf{g}
9. end for
10. return x^0(x1)\hat{\mathbf{x}}_0(\mathbf{x}_1)

Each iteration requires one score network evaluation (line 3) plus one backpropagation through the network (line 6), giving 2T\sim 2T total NFEs. For T=1000T = 1000, this is the dominant computational cost.

DPS Reconstruction Trajectory

Visualise the DPS reconstruction as a function of the diffusion step. The plot shows the evolving Tweedie estimate x^0\hat{\mathbf{x}}_0 at several intermediate times, from pure noise (t=Tt = T) to the final reconstruction (t=0t = 0). Adjust the guidance scale ζ\zeta: too small yields measurement-inconsistent samples; too large introduces artefacts from over-fitting to the measurements.

Parameters
1
200
20

Example: Effect of the Guidance Scale

Consider a 1D deblurring problem with Gaussian blur kernel of width σb=3\sigma_b = 3 pixels and measurement noise σn2=0.01\sigma^2_{n} = 0.01. Describe the effect of the guidance scale ζ\zeta on the DPS reconstruction.

Common Mistake: DPS Is Approximate Posterior Sampling

Mistake:

Claiming that DPS produces exact samples from the posterior p(x0y)p(\mathbf{x}_0 \mid \mathbf{y}).

Correction:

DPS makes two approximations:

  1. The score network sθlogpt\mathbf{s}_\theta \approx \nabla\log p_t is only approximate (finite training).
  2. The likelihood gradient uses the point estimate x^0\hat{\mathbf{x}}_0 rather than marginalising over p(x0xt)p(\mathbf{x}_0 \mid \mathbf{x}_t).

These approximations mean DPS samples are from an approximate posterior. The guidance scale ζ\zeta compensates: larger ζ\zeta enforces stronger measurement consistency at the expense of prior fidelity. No theoretical guarantee exists for the quality of this approximation.

Quick Check

For DPS with T=1000T = 1000 diffusion steps, approximately how many score network evaluations (NFEs) are required per reconstruction?

500

1000

2000

100

Diffusion Posterior Sampling (DPS)

A method for solving inverse problems with pretrained diffusion models by adding a likelihood guidance gradient to the reverse diffusion process. The guidance gradient is computed via the Tweedie estimate and backpropagation through the score network.

Related: Tweedie Formula, Posterior Sampling

Guidance Scale

A hyperparameter ζ>0\zeta > 0 that controls the strength of the measurement consistency term in DPS. Larger ζ\zeta produces more measurement-consistent but potentially less natural reconstructions.

Related: Diffusion Posterior Sampling (DPS), Measurement Consistency

Key Takeaway

DPS modifies the reverse diffusion process with a likelihood guidance term computed via the Tweedie estimate. The guidance scale ζ\zeta controls the tradeoff between prior fidelity and measurement consistency. The main limitation is computational cost: 2T\sim 2T NFEs per sample, with TT typically in the hundreds. The main advantage is that multiple runs produce diverse posterior samples, enabling uncertainty quantification — a capability unavailable in deterministic methods.