Denoising Diffusion (DDPM)

Definition:

Denoising Diffusion Probabilistic Model

DDPM adds noise gradually over TT steps (forward process):

q(xtx0)=N(xt;αˉtx0,(1αˉt)I)q(\mathbf{x}_t | \mathbf{x}_0) = \mathcal{N}(\mathbf{x}_t; \sqrt{\bar{\alpha}_t}\mathbf{x}_0, (1-\bar{\alpha}_t)\mathbf{I})

The model learns to reverse the process by predicting the noise:

L=Et,x0,ε[εεθ(xt,t)2]L = \mathbb{E}_{t, \mathbf{x}_0, \boldsymbol{\varepsilon}} \left[\|\boldsymbol{\varepsilon} - \boldsymbol{\varepsilon}_\theta(\mathbf{x}_t, t)\|^2\right]

Sampling: start from xTN(0,I)\mathbf{x}_T \sim \mathcal{N}(0, I) and iteratively denoise.

Definition:

Noise Schedule

The variance schedule βt\beta_t controls the noise level:

αt=1βt,αˉt=s=1tαs\alpha_t = 1 - \beta_t, \quad \bar{\alpha}_t = \prod_{s=1}^{t} \alpha_s

Linear: βt\beta_t linearly from 10410^{-4} to 0.020.02. Cosine: αˉt=cos2(t/T+s1+sπ2)\bar{\alpha}_t = \cos^2\left(\frac{t/T + s}{1+s} \cdot \frac{\pi}{2}\right).

Theorem: DDPM Simplied Loss

The variational lower bound for DDPM simplifies to:

Lsimple=EtU[1,T],εN(0,I)[εεθ(αˉtx0+1αˉtε,t)2]L_{\text{simple}} = \mathbb{E}_{t \sim U[1,T], \boldsymbol{\varepsilon} \sim \mathcal{N}(0,I)} \left[\|\boldsymbol{\varepsilon} - \boldsymbol{\varepsilon}_\theta(\sqrt{\bar{\alpha}_t}\mathbf{x}_0 + \sqrt{1-\bar{\alpha}_t}\boldsymbol{\varepsilon}, t)\|^2\right]

This is simply training a denoiser at random noise levels.

DDPM training is just denoising: add noise at a random level, predict the noise, and minimise MSE. The magic is in the iterative sampling.

Diffusion Forward and Reverse Process

Watch data get noised (forward) and denoised (reverse).

Parameters

Noise Schedule Comparison

Compare linear vs cosine noise schedules.

Parameters