Ferkans — Interactive Telecom Tutor

From Priors to Posteriors: Guiding Diffusion with Measurements

Diffusion models learn the prior $p(\mathbf{x})$ . For inverse problems we need the posterior $p(\mathbf{x} \mid \mathbf{y})$ . By Bayes' rule:

$p(\mathbf{x} \mid \mathbf{y}) \propto p(\mathbf{y} \mid \mathbf{x})\,p(\mathbf{x}).$

Diffusion Posterior Sampling (DPS) modifies the reverse diffusion process to incorporate the likelihood $p(\mathbf{y} \mid \mathbf{x})$ , steering the generative trajectory toward images that explain the measurements. The result is (approximate) posterior sampling: each run of DPS with a different noise seed produces a different plausible reconstruction, enabling uncertainty quantification.

Definition:
Posterior Score Decomposition

At diffusion time $t$ , the posterior score decomposes as:

$\nabla_{\mathbf{x}_t}\log p_t(\mathbf{x}_t \mid \mathbf{y}) = \underbrace{\nabla_{\mathbf{x}_t}\log p_t(\mathbf{x}_t)}_{\text{prior score}} + \underbrace{\nabla_{\mathbf{x}_t}\log p_t(\mathbf{y} \mid \mathbf{x}_t)}_{\text{likelihood score}}.$

The prior score is provided by the pretrained score network $\mathbf{s}_\theta(\mathbf{x}_t, t) \approx \nabla\log p_t(\mathbf{x}_t)$ .

The likelihood score is intractable because $p_t(\mathbf{y} \mid \mathbf{x}_t) = \int p(\mathbf{y} \mid \mathbf{x}_0)\,p(\mathbf{x}_0 \mid \mathbf{x}_t)\,d\mathbf{x}_0$ involves marginalising over the unknown $\mathbf{x}_0$ . Different methods (DPS, DDRM, MCG) differ in how they approximate this intractable term.

Definition:
Diffusion Posterior Sampling (DPS)

DPS approximates the likelihood score using the Tweedie estimate $\hat{\mathbf{x}}_0(\mathbf{x}_t)$ as a plug-in for $\mathbf{x}_0$ . The modified reverse step is:

$\mathbf{x}_{t-1} = \text{DDPM\_step}(\mathbf{x}_t, \mathbf{s}_\theta) - \zeta\,\nabla_{\mathbf{x}_t}\!\left[\frac{1}{2\sigma^2_{n}}\|\mathbf{y} - \mathbf{A}\hat{\mathbf{x}}_0(\mathbf{x}_t)\|^2\right],$

where:

$\hat{\mathbf{x}}_0(\mathbf{x}_t) = (\mathbf{x}_t + (1-\bar{\alpha}_t)\mathbf{s}_\theta(\mathbf{x}_t, t))/\sqrt{\bar{\alpha}_t}$ is the Tweedie estimate
$\mathbf{A}$ is the forward model (sensing matrix)
$\sigma^2_{n}$ is the measurement noise variance
$\zeta > 0$ is the guidance scale
The gradient $\nabla_{\mathbf{x}_t}$ is computed via automatic differentiation through the Tweedie estimate

The DPS approximation replaces the intractable marginal likelihood $p_t(\mathbf{y} \mid \mathbf{x}_t)$ with the point-estimate likelihood $p(\mathbf{y} \mid \hat{\mathbf{x}}_0(\mathbf{x}_t))$ . This is exact only when the posterior $p(\mathbf{x}_0 \mid \mathbf{x}_t)$ is concentrated (low noise), and becomes increasingly approximate at high noise levels.

Theorem: DPS Likelihood Guidance Gradient

For the linear Gaussian model $\mathbf{y} = \mathbf{A}\mathbf{x}_0 + \mathbf{n}$ with $\mathbf{n} \sim \mathcal{N}(\mathbf{0}, \sigma^2_{n}\mathbf{I})$ , the DPS guidance gradient is:

$\nabla_{\mathbf{x}_t}\!\left[\frac{1}{2\sigma^2_{n}}\|\mathbf{y} - \mathbf{A}\hat{\mathbf{x}}_0\|^2\right] = \frac{1}{\sigma^2_{n}}\,\frac{\partial\hat{\mathbf{x}}_0}{\partial\mathbf{x}_t}^{\!\top}\mathbf{A}^{H}(\mathbf{A}\hat{\mathbf{x}}_0 - \mathbf{y}).$

The Jacobian $\partial\hat{\mathbf{x}}_0/\partial\mathbf{x}_t$ involves the score network's Jacobian, computed via backpropagation.

The gradient pushes $\mathbf{x}_t$ in the direction that reduces the measurement residual $\|\mathbf{A}\hat{\mathbf{x}}_0 - \mathbf{y}\|^2$ . The chain rule through the Tweedie estimate ensures the correction is applied at the appropriate noise level.

Proof

Gaussian log-likelihood

$\log p(\mathbf{y} \mid \hat{\mathbf{x}}_0) = -\frac{1}{2\sigma^2_{n}}\|\mathbf{y} - \mathbf{A}\hat{\mathbf{x}}_0\|^2 + \text{const}.$ $

Chain rule through Tweedie

$\nabla_{\mathbf{x}_t}\log p(\mathbf{y} \mid \hat{\mathbf{x}}_0) = \frac{\partial\hat{\mathbf{x}}_0}{\partial\mathbf{x}_t}^{\!\top}\nabla_{\hat{\mathbf{x}}_0}\log p(\mathbf{y} \mid \hat{\mathbf{x}}_0) = -\frac{1}{\sigma^2_{n}}\frac{\partial\hat{\mathbf{x}}_0}{\partial\mathbf{x}_t}^{\!\top}\mathbf{A}^{H}(\mathbf{A}\hat{\mathbf{x}}_0 - \mathbf{y}). \quad\blacksquare$ $

DPS Algorithm for Linear Inverse Problems

Complexity:

O(T \cdot (C_{\text{net}} + C_{\mathbf{A}}))

where

C_{\text{net}}

includes the backpropagation cost

Input: Measurements

\mathbf{y}

, forward model

\mathbf{A}

,

score network

\mathbf{s}_\theta

, noise schedule

\{\bar{\alpha}_t\}_{t=T}^0

,

guidance scale

\zeta

, measurement noise variance

\sigma^2_{n}

.

1. Sample

\mathbf{x}_T \sim \mathcal{N}(\mathbf{0}, \mathbf{I})

2. for

t = T, T-1, \ldots, 1

do

3.

\quad

Compute score:

\mathbf{s} = \mathbf{s}_\theta(\mathbf{x}_t, t)

4.

\quad

Tweedie estimate:

\hat{\mathbf{x}}_0 = (\mathbf{x}_t + (1-\bar{\alpha}_t)\mathbf{s})/\sqrt{\bar{\alpha}_t}

5.

\quad

Measurement residual:

\mathbf{r} = \mathbf{A}\hat{\mathbf{x}}_0 - \mathbf{y}

6.

\quad

Guidance gradient:

\mathbf{g} = \nabla_{\mathbf{x}_t}\|\mathbf{r}\|^2

\quad

(via backprop)

7.

\quad

DDPM update:

\tilde{\mathbf{x}}_{t-1} = \text{DDPM\_step}(\mathbf{x}_t, \mathbf{s})

8.

\quad

Guided step:

\mathbf{x}_{t-1} = \tilde{\mathbf{x}}_{t-1} - \frac{\zeta}{2\sigma^2_{n}}\,\mathbf{g}

9. end for

10. return

\hat{\mathbf{x}}_0(\mathbf{x}_1)

Each iteration requires one score network evaluation (line 3) plus one backpropagation through the network (line 6), giving $\sim 2T$ total NFEs. For $T = 1000$ , this is the dominant computational cost.

DPS Reconstruction Trajectory

Visualise the DPS reconstruction as a function of the diffusion step. The plot shows the evolving Tweedie estimate $\hat{\mathbf{x}}_0$ at several intermediate times, from pure noise ( $t = T$ ) to the final reconstruction ( $t = 0$ ). Adjust the guidance scale $\zeta$ : too small yields measurement-inconsistent samples; too large introduces artefacts from over-fitting to the measurements.

Parameters

Guidance scale

\zeta

1

Diffusion steps

T

200

Measurement SNR (dB)20

Example: Effect of the Guidance Scale

Consider a 1D deblurring problem with Gaussian blur kernel of width $\sigma_b = 3$ pixels and measurement noise $\sigma^2_{n} = 0.01$ . Describe the effect of the guidance scale $\zeta$ on the DPS reconstruction.

Solution

$\zeta = 0$ (no guidance)

Without guidance, the reverse diffusion generates a sample from the prior $p(\mathbf{x})$ — a plausible image that bears no relationship to the measurements. The measurement residual $\|\mathbf{A}\hat{\mathbf{x}}_0 - \mathbf{y}\|$ is large.

$\zeta = 1$ (moderate guidance)

The guidance term steers the diffusion toward measurement-consistent images. The reconstruction balances prior fidelity (natural-looking) with data fidelity (explaining the measurements). This is the typical operating point.

$\zeta \gg 1$ (strong guidance)

Excessive guidance forces $\mathbf{A}\hat{\mathbf{x}}_0 \approx \mathbf{y}$ at every step, disrupting the diffusion process. The reconstruction over-fits to the noisy measurements, producing artefacts and losing the regularising effect of the prior. In the limit $\zeta \to \infty$ , DPS approaches the pseudoinverse solution.

Common Mistake: DPS Is Approximate Posterior Sampling

Mistake:

Claiming that DPS produces exact samples from the posterior $p(\mathbf{x}_0 \mid \mathbf{y})$ .

Correction:

DPS makes two approximations:

The score network $\mathbf{s}_\theta \approx \nabla\log p_t$ is only approximate (finite training).
The likelihood gradient uses the point estimate $\hat{\mathbf{x}}_0$ rather than marginalising over $p(\mathbf{x}_0 \mid \mathbf{x}_t)$ .

These approximations mean DPS samples are from an approximate posterior. The guidance scale $\zeta$ compensates: larger $\zeta$ enforces stronger measurement consistency at the expense of prior fidelity. No theoretical guarantee exists for the quality of this approximation.

Quick Check

For DPS with $T = 1000$ diffusion steps, approximately how many score network evaluations (NFEs) are required per reconstruction?

500

1000

2000

100

Correction:

2000

Each step requires one forward pass (score evaluation) and one backward pass (backpropagation for the guidance gradient), giving $\sim 2T = 2000$ NFEs. This is the dominant computational cost.

Diffusion Posterior Sampling (DPS)

A method for solving inverse problems with pretrained diffusion models by adding a likelihood guidance gradient to the reverse diffusion process. The guidance gradient is computed via the Tweedie estimate and backpropagation through the score network.

Guidance Scale

A hyperparameter $\zeta > 0$ that controls the strength of the measurement consistency term in DPS. Larger $\zeta$ produces more measurement-consistent but potentially less natural reconstructions.

Key Takeaway

DPS modifies the reverse diffusion process with a likelihood guidance term computed via the Tweedie estimate. The guidance scale $\zeta$ controls the tradeoff between prior fidelity and measurement consistency. The main limitation is computational cost: $\sim 2T$ NFEs per sample, with $T$ typically in the hundreds. The main advantage is that multiple runs produce diverse posterior samples, enabling uncertainty quantification — a capability unavailable in deterministic methods.

Diffusion Posterior Sampling (DPS)

From Priors to Posteriors: Guiding Diffusion with Measurements

Definition: Posterior Score Decomposition

Definition: Diffusion Posterior Sampling (DPS)

Theorem: DPS Likelihood Guidance Gradient

Gaussian log-likelihood

Chain rule through Tweedie

DPS Algorithm for Linear Inverse Problems

DPS Reconstruction Trajectory

Parameters

Example: Effect of the Guidance Scale

$\zeta = 0$ (no guidance)

$\zeta = 1$ (moderate guidance)

$\zeta \gg 1$ (strong guidance)

Common Mistake: DPS Is Approximate Posterior Sampling

Quick Check

Diffusion Posterior Sampling (DPS)

Guidance Scale

Key Takeaway

Definition:
Posterior Score Decomposition

Definition:
Diffusion Posterior Sampling (DPS)