Ferkans — Interactive Telecom Tutor

From Denoisers to Generative Priors: The Diffusion Revolution

Chapter 21 established the equivalence between denoising and score estimation: training a denoiser at noise level $\sigma$ is equivalent to learning the score function $\nabla_\mathbf{x}\log p_\sigma(\mathbf{x})$ . Diffusion models exploit this connection to build powerful generative models by training a single noise-conditioned score network across all noise levels. In this chapter, we use these pretrained diffusion models as priors for solving inverse problems — the central theme of RF imaging.

The golden thread is: diffusion models provide the strongest learned priors available today, enabling state-of-the-art reconstruction quality, but at high computational cost. Understanding the tradeoffs between quality and speed is essential for determining when diffusion-based reconstruction is appropriate for RF applications.

Definition:
The Score Function

The score function of a probability distribution $p(\mathbf{x})$ is the gradient of its log-density:

$\mathbf{s}(\mathbf{x}) \triangleq \nabla_\mathbf{x} \log p(\mathbf{x}).$

The score points in the direction of steepest ascent of the log-density. Unlike the density $p(\mathbf{x})$ itself, the score does not require computing the normalisation constant: if $p(\mathbf{x}) = \tilde{p}(\mathbf{x}) / Z$ , then $\nabla \log p = \nabla \log \tilde{p}$ since $\nabla \log Z = 0$ .

This normalisation-free property is what makes score-based methods tractable in high dimensions, where computing $Z = \int \tilde{p}(\mathbf{x})\,d\mathbf{x}$ is intractable.

Definition:
Denoising Score Matching (DSM)

Denoising score matching trains a network $\mathbf{s}_\theta(\mathbf{x}, \sigma)$ to approximate the score by minimising

$\mathcal{L}_{\text{DSM}} = \mathbb{E}_{\sigma}\,\mathbb{E}_{\mathbf{x}_0 \sim p}\,\mathbb{E}_{\boldsymbol{\epsilon} \sim \mathcal{N}(\mathbf{0}, \sigma^2\mathbf{I})}\!\left[\left\|\mathbf{s}_\theta(\mathbf{x}_0 + \boldsymbol{\epsilon}, \sigma) + \frac{\boldsymbol{\epsilon}}{\sigma^2}\right\|^2\right].$

The optimal score network satisfies $\mathbf{s}_\theta^*(\mathbf{x}, \sigma) = \nabla_\mathbf{x}\log p_\sigma(\mathbf{x})$ , where $p_\sigma = p * \mathcal{N}(0, \sigma^2 I)$ is the noise-convolved density. This confirms that score estimation is equivalent to denoising (Chapter 21).

The expectation over $\sigma$ is essential: we need the score at all noise levels, not just one. In practice, $\sigma$ is sampled from a geometric schedule or a continuous distribution over the noise levels.

,

Definition:
DDPM Forward Process

The forward process of a Denoising Diffusion Probabilistic Model (DDPM) gradually adds Gaussian noise over $T$ steps:

$q(\mathbf{x}_t \mid \mathbf{x}_{t-1}) = \mathcal{N}(\mathbf{x}_t;\, \sqrt{1 - \beta_t}\,\mathbf{x}_{t-1},\, \beta_t\mathbf{I}),$

where $\{\beta_t\}_{t=1}^T$ is the noise schedule with $0 < \beta_t < 1$ . The marginal at time $t$ has a closed form:

$q(\mathbf{x}_t \mid \mathbf{x}_0) = \mathcal{N}(\mathbf{x}_t;\, \sqrt{\bar{\alpha}_t}\,\mathbf{x}_0,\, (1 - \bar{\alpha}_t)\mathbf{I}),$

where $\alpha_t = 1 - \beta_t$ and $\bar{\alpha}_t = \prod_{s=1}^t \alpha_s$ . Equivalently:

$\mathbf{x}_t = \sqrt{\bar{\alpha}_t}\,\mathbf{x}_0 + \sqrt{1 - \bar{\alpha}_t}\,\boldsymbol{\epsilon}, \qquad \boldsymbol{\epsilon} \sim \mathcal{N}(\mathbf{0}, \mathbf{I}).$

At $t = T$ (with $\bar{\alpha}_T \approx 0$ ), $\mathbf{x}_T$ is approximately pure Gaussian noise.

Definition:
DDPM Reverse Process

The reverse process starts from $\mathbf{x}_T \sim \mathcal{N}(\mathbf{0}, \mathbf{I})$ and iteratively denoises:

$p_\theta(\mathbf{x}_{t-1} \mid \mathbf{x}_t) = \mathcal{N}(\mathbf{x}_{t-1};\, \boldsymbol{\mu}_\theta(\mathbf{x}_t, t),\, \sigma_t^2\mathbf{I}),$

where the mean is parameterised via the noise prediction network $\boldsymbol{\epsilon}_\theta$ :

$\boldsymbol{\mu}_\theta(\mathbf{x}_t, t) = \frac{1}{\sqrt{\alpha_t}}\left(\mathbf{x}_t - \frac{\beta_t}{\sqrt{1 - \bar{\alpha}_t}}\,\boldsymbol{\epsilon}_\theta(\mathbf{x}_t, t)\right).$

The training objective is:

$\mathcal{L}_{\text{DDPM}} = \mathbb{E}_{t, \mathbf{x}_0, \boldsymbol{\epsilon}}\!\left[\|\boldsymbol{\epsilon} - \boldsymbol{\epsilon}_\theta(\mathbf{x}_t, t)\|^2\right].$

The connection to score matching: the noise prediction and score are related by

$\mathbf{s}_\theta(\mathbf{x}_t, t) = -\frac{\boldsymbol{\epsilon}_\theta(\mathbf{x}_t, t)}{\sqrt{1 - \bar{\alpha}_t}}.$

Theorem: Tweedie's Formula

Let $\mathbf{x}_t = \sqrt{\bar{\alpha}_t}\,\mathbf{x}_0 + \sqrt{1-\bar{\alpha}_t}\,\boldsymbol{\epsilon}$ with $\boldsymbol{\epsilon} \sim \mathcal{N}(\mathbf{0}, \mathbf{I})$ . Then the posterior mean of $\mathbf{x}_0$ given $\mathbf{x}_t$ is:

$\hat{\mathbf{x}}_0(\mathbf{x}_t) \triangleq \mathbb{E}[\mathbf{x}_0 \mid \mathbf{x}_t] = \frac{1}{\sqrt{\bar{\alpha}_t}}\bigl(\mathbf{x}_t + (1 - \bar{\alpha}_t)\,\nabla_{\mathbf{x}_t}\log p_t(\mathbf{x}_t)\bigr).$

This connects the score function, the denoiser, and the posterior mean in a single identity.

Tweedie's formula says: to estimate $\mathbf{x}_0$ from a noisy observation $\mathbf{x}_t$ , take the noisy observation, correct it using the score (which points toward high-density regions), and rescale. This is exactly what the denoiser does.

Show Hint

Write $\mathbf{x}_0 = (\mathbf{x}_t - \sqrt{1-\bar{\alpha}_t}\,\boldsymbol{\epsilon})/\sqrt{\bar{\alpha}_t}$ and take the conditional expectation.

Use the fact that $\mathbb{E}[\boldsymbol{\epsilon} \mid \mathbf{x}_t] = -\sqrt{1-\bar{\alpha}_t}\,\nabla_{\mathbf{x}_t}\log p_t(\mathbf{x}_t)$ .

Proof

Express $\mathbf{x}_0$ in terms of $\mathbf{x}_t$ and $\boldsymbol{\epsilon}$

From the forward process: $\mathbf{x}_0 = \frac{1}{\sqrt{\bar{\alpha}_t}}(\mathbf{x}_t - \sqrt{1-\bar{\alpha}_t}\,\boldsymbol{\epsilon})$ .

Take conditional expectation

$\mathbb{E}[\mathbf{x}_0 \mid \mathbf{x}_t] = \frac{1}{\sqrt{\bar{\alpha}_t}}\bigl(\mathbf{x}_t - \sqrt{1-\bar{\alpha}_t}\,\mathbb{E}[\boldsymbol{\epsilon} \mid \mathbf{x}_t]\bigr).$ $

Relate conditional noise to score

By the score matching identity for the Gaussian perturbation kernel: $\nabla_{\mathbf{x}_t}\log p_t(\mathbf{x}_t) = -\frac{\mathbb{E}[\boldsymbol{\epsilon} \mid \mathbf{x}_t]}{\sqrt{1-\bar{\alpha}_t}}$ , hence $\mathbb{E}[\boldsymbol{\epsilon} \mid \mathbf{x}_t] = -\sqrt{1-\bar{\alpha}_t}\,\nabla_{\mathbf{x}_t}\log p_t(\mathbf{x}_t)$ .

Substitute and simplify

$\hat{\mathbf{x}}_0(\mathbf{x}_t) = \frac{1}{\sqrt{\bar{\alpha}_t}}\bigl(\mathbf{x}_t + (1-\bar{\alpha}_t)\nabla_{\mathbf{x}_t}\log p_t(\mathbf{x}_t)\bigr). \quad\blacksquare$ $

,

Score Field Visualisation

Visualise the score field $\nabla\log p_\sigma(\mathbf{x})$ for a 2D Gaussian mixture at different noise levels. At low $\sigma$ , the score arrows point sharply toward the nearest mode. At high $\sigma$ , the modes merge and the field becomes smooth. Langevin dynamics following these arrows would generate samples from $p_\sigma$ .

Parameters

Noise level

\sigma

0.3

Number of modes3

Example: Tweedie's Formula for a 1D Gaussian

Let $x_0 \sim \mathcal{N}(0, 1)$ and $x_t = \sqrt{\bar{\alpha}_t}\,x_0 + \sqrt{1-\bar{\alpha}_t}\,\epsilon$ with $\epsilon \sim \mathcal{N}(0, 1)$ . Verify Tweedie's formula by computing $\mathbb{E}[x_0 \mid x_t]$ directly.

Solution

Joint distribution

Since $x_0$ and $\epsilon$ are independent standard Gaussians, $(x_0, x_t)$ is jointly Gaussian with:

$\mathbb{E}[x_t] = 0$
$\text{Var}(x_t) = \bar{\alpha}_t + (1-\bar{\alpha}_t) = 1$
$\text{Cov}(x_0, x_t) = \sqrt{\bar{\alpha}_t}$

Conditional expectation

By Gaussian conditioning: $\mathbb{E}[x_0 \mid x_t] = \frac{\text{Cov}(x_0, x_t)}{\text{Var}(x_t)}\,x_t = \sqrt{\bar{\alpha}_t}\,x_t$ .

Verify via Tweedie

Since $p_t(x_t) = \mathcal{N}(0, 1)$ , the score is $\nabla_{x_t}\log p_t(x_t) = -x_t$ . Tweedie's formula gives: $\hat{x}_0 = \frac{1}{\sqrt{\bar{\alpha}_t}}(x_t + (1-\bar{\alpha}_t)(-x_t)) = \frac{\bar{\alpha}_t}{\sqrt{\bar{\alpha}_t}}\,x_t = \sqrt{\bar{\alpha}_t}\,x_t. \;\checkmark$

Forward Diffusion Process

Observe how a 1D signal is progressively corrupted as the diffusion time $t$ increases. The plot shows the signal at several intermediate steps, illustrating the transition from structured data to pure noise. Compare linear and cosine noise schedules: the cosine schedule preserves more signal structure at intermediate times.

Parameters

Noise schedule

Normalised time

t/T

0.5

Historical Note: From Thermodynamics to Image Generation

2015--2021

The idea of using a diffusion process for generative modelling was introduced by Sohl-Dickstein et al. (2015), inspired by non-equilibrium statistical mechanics. The approach remained largely dormant until Ho et al. (2020) demonstrated that a simple noise-prediction objective produced image quality rivalling GANs. Independently, Song and Ermon (2019) developed score-based generative models via Langevin dynamics. The unification of these perspectives through stochastic differential equations by Song et al. (2021) established the modern framework used throughout this chapter.

, ,

Score Function

The gradient of the log-density of a probability distribution: $\mathbf{s}(\mathbf{x}) = \nabla_\mathbf{x}\log p(\mathbf{x})$ . The score encodes the data distribution without requiring the normalisation constant.

Tweedie's Formula

An identity relating the posterior mean $\mathbb{E}[\mathbf{x}_0 \mid \mathbf{x}_t]$ to the score function of the noisy distribution $p_t(\mathbf{x}_t)$ . In the DDPM setting: $\hat{\mathbf{x}}_0 = (\mathbf{x}_t + (1-\bar{\alpha}_t)\nabla\log p_t(\mathbf{x}_t))/\sqrt{\bar{\alpha}_t}$ .

Network Function Evaluation (NFE)

A single forward pass through the score network $\mathbf{s}_\theta(\mathbf{x}_t, t)$ . The total number of NFEs determines the computational cost of diffusion-based reconstruction. Standard DDPM requires $T$ NFEs; DPS requires $\sim 2T$ due to the additional backpropagation step.

Common Mistake: Confusing Noise Prediction and Score Prediction

Mistake:

Treating $\boldsymbol{\epsilon}_\theta(\mathbf{x}_t, t)$ and $\mathbf{s}_\theta(\mathbf{x}_t, t)$ as the same quantity.

Correction:

They are related by a sign and a scaling factor: $\mathbf{s}_\theta(\mathbf{x}_t, t) = -\boldsymbol{\epsilon}_\theta(\mathbf{x}_t, t)/\sqrt{1-\bar{\alpha}_t}$ . The noise predictor outputs the noise $\boldsymbol{\epsilon}$ that was added; the score points toward the data manifold. Mixing up the sign or forgetting the scaling factor produces reconstructions that diverge from the data.

Quick Check

Tweedie's formula gives $\hat{\mathbf{x}}_0 = (\mathbf{x}_t + (1-\bar{\alpha}_t)\nabla\log p_t(\mathbf{x}_t))/\sqrt{\bar{\alpha}_t}$ . At $t = 0$ (no noise, $\bar{\alpha}_0 = 1$ ), what does the formula reduce to?

$\hat{\mathbf{x}}_0 = \mathbf{x}_0$

$\hat{\mathbf{x}}_0 = \nabla\log p_0(\mathbf{x}_0)$

$\hat{\mathbf{x}}_0 = \mathbf{0}$

$\hat{\mathbf{x}}_0 = \mathbf{x}_0 + \nabla\log p_0(\mathbf{x}_0)$

Correction:

\hat{\mathbf{x}}_0 = \mathbf{x}_0

At $t = 0$ : $\bar{\alpha}_0 = 1$ , so $\hat{\mathbf{x}}_0 = (\mathbf{x}_0 + 0 \cdot \nabla\log p_0)/1 = \mathbf{x}_0$ . The denoised estimate is the observation itself when there is no noise.

Key Takeaway

Diffusion models combine three ingredients: (1) a forward process that gradually adds noise, (2) a noise-conditioned score network trained via denoising score matching, and (3) a reverse process that uses the learned score to denoise step by step. Tweedie's formula provides the bridge from the score at any noise level to a clean-image estimate, which is the key tool for incorporating measurements into the reverse process (Section 22.2).

Score-Based Diffusion Models Recap

From Denoisers to Generative Priors: The Diffusion Revolution

Definition: The Score Function

Definition: Denoising Score Matching (DSM)

Definition: DDPM Forward Process

Definition: DDPM Reverse Process

Theorem: Tweedie's Formula

Express $\mathbf{x}_0$ in terms of $\mathbf{x}_t$ and $\boldsymbol{\epsilon}$

Take conditional expectation

Relate conditional noise to score

Substitute and simplify

Score Field Visualisation

Parameters

Example: Tweedie's Formula for a 1D Gaussian

Joint distribution

Conditional expectation

Verify via Tweedie

Forward Diffusion Process

Parameters

Historical Note: From Thermodynamics to Image Generation

Score Function

Tweedie's Formula

Network Function Evaluation (NFE)

Common Mistake: Confusing Noise Prediction and Score Prediction

Quick Check

Key Takeaway

Definition:
The Score Function

Definition:
Denoising Score Matching (DSM)

Definition:
DDPM Forward Process

Definition:
DDPM Reverse Process