Score-Based Diffusion Models Recap
From Denoisers to Generative Priors: The Diffusion Revolution
Chapter 21 established the equivalence between denoising and score estimation: training a denoiser at noise level is equivalent to learning the score function . Diffusion models exploit this connection to build powerful generative models by training a single noise-conditioned score network across all noise levels. In this chapter, we use these pretrained diffusion models as priors for solving inverse problems β the central theme of RF imaging.
The golden thread is: diffusion models provide the strongest learned priors available today, enabling state-of-the-art reconstruction quality, but at high computational cost. Understanding the tradeoffs between quality and speed is essential for determining when diffusion-based reconstruction is appropriate for RF applications.
Definition: The Score Function
The Score Function
The score function of a probability distribution is the gradient of its log-density:
The score points in the direction of steepest ascent of the log-density. Unlike the density itself, the score does not require computing the normalisation constant: if , then since .
This normalisation-free property is what makes score-based methods tractable in high dimensions, where computing is intractable.
Definition: Denoising Score Matching (DSM)
Denoising Score Matching (DSM)
Denoising score matching trains a network to approximate the score by minimising
The optimal score network satisfies , where is the noise-convolved density. This confirms that score estimation is equivalent to denoising (Chapter 21).
The expectation over is essential: we need the score at all noise levels, not just one. In practice, is sampled from a geometric schedule or a continuous distribution over the noise levels.
Definition: DDPM Forward Process
DDPM Forward Process
The forward process of a Denoising Diffusion Probabilistic Model (DDPM) gradually adds Gaussian noise over steps:
where is the noise schedule with . The marginal at time has a closed form:
where and . Equivalently:
At (with ), is approximately pure Gaussian noise.
Definition: DDPM Reverse Process
DDPM Reverse Process
The reverse process starts from and iteratively denoises:
where the mean is parameterised via the noise prediction network :
The training objective is:
The connection to score matching: the noise prediction and score are related by
Theorem: Tweedie's Formula
Let with . Then the posterior mean of given is:
This connects the score function, the denoiser, and the posterior mean in a single identity.
Tweedie's formula says: to estimate from a noisy observation , take the noisy observation, correct it using the score (which points toward high-density regions), and rescale. This is exactly what the denoiser does.
Write and take the conditional expectation.
Use the fact that .
Express $\mathbf{x}_0$ in terms of $\mathbf{x}_t$ and $\boldsymbol{\epsilon}$
From the forward process: .
Take conditional expectation
$
Relate conditional noise to score
By the score matching identity for the Gaussian perturbation kernel: , hence .
Substitute and simplify
$
Score Field Visualisation
Visualise the score field for a 2D Gaussian mixture at different noise levels. At low , the score arrows point sharply toward the nearest mode. At high , the modes merge and the field becomes smooth. Langevin dynamics following these arrows would generate samples from .
Parameters
Example: Tweedie's Formula for a 1D Gaussian
Let and with . Verify Tweedie's formula by computing directly.
Joint distribution
Since and are independent standard Gaussians, is jointly Gaussian with:
Conditional expectation
By Gaussian conditioning: .
Verify via Tweedie
Since , the score is . Tweedie's formula gives:
Forward Diffusion Process
Observe how a 1D signal is progressively corrupted as the diffusion time increases. The plot shows the signal at several intermediate steps, illustrating the transition from structured data to pure noise. Compare linear and cosine noise schedules: the cosine schedule preserves more signal structure at intermediate times.
Parameters
Historical Note: From Thermodynamics to Image Generation
2015--2021The idea of using a diffusion process for generative modelling was introduced by Sohl-Dickstein et al. (2015), inspired by non-equilibrium statistical mechanics. The approach remained largely dormant until Ho et al. (2020) demonstrated that a simple noise-prediction objective produced image quality rivalling GANs. Independently, Song and Ermon (2019) developed score-based generative models via Langevin dynamics. The unification of these perspectives through stochastic differential equations by Song et al. (2021) established the modern framework used throughout this chapter.
Score Function
The gradient of the log-density of a probability distribution: . The score encodes the data distribution without requiring the normalisation constant.
Related: Denoising Score Matching, Tweedie Formula
Tweedie's Formula
An identity relating the posterior mean to the score function of the noisy distribution . In the DDPM setting: .
Related: Score Function, DDPM Forward Process
Network Function Evaluation (NFE)
A single forward pass through the score network . The total number of NFEs determines the computational cost of diffusion-based reconstruction. Standard DDPM requires NFEs; DPS requires due to the additional backpropagation step.
Related: DDPM Forward Process, Diffusion Posterior Sampling (DPS)
Common Mistake: Confusing Noise Prediction and Score Prediction
Mistake:
Treating and as the same quantity.
Correction:
They are related by a sign and a scaling factor: . The noise predictor outputs the noise that was added; the score points toward the data manifold. Mixing up the sign or forgetting the scaling factor produces reconstructions that diverge from the data.
Quick Check
Tweedie's formula gives . At (no noise, ), what does the formula reduce to?
At : , so . The denoised estimate is the observation itself when there is no noise.
Key Takeaway
Diffusion models combine three ingredients: (1) a forward process that gradually adds noise, (2) a noise-conditioned score network trained via denoising score matching, and (3) a reverse process that uses the learned score to denoise step by step. Tweedie's formula provides the bridge from the score at any noise level to a clean-image estimate, which is the key tool for incorporating measurements into the reverse process (Section 22.2).