Noise2Noise, Noise2Self, Noise2Void
Can We Train a Denoiser Without Clean Data?
Supervised denoisers require paired data: noisy input and clean target. In many imaging domains --- and especially RF imaging --- clean ground truth is expensive or impossible to obtain. The Noise2X family of methods asks: what is the minimum data requirement for training a denoiser?
The answer is surprising. Noise2Noise shows that noisy-noisy pairs suffice. Noise2Void goes further: a single noisy image is enough, provided the noise is pixel-independent. The key insight in all these methods is that the MSE loss minimum is the conditional expectation, regardless of whether the target is clean or noisy.
Definition: Noise2Noise
Noise2Noise
Noise2Noise trains a denoiser using pairs of noisy images of the same scene, without requiring clean ground truth:
where and are two independent noisy observations of the same scene .
The optimal network converges to the conditional mean --- the same MMSE estimator as supervised training with clean targets.
Noise2Noise works because when and are independent with zero mean. Replacing the clean target with a noisy one changes only the variance of the gradient, not its expectation.
Theorem: Noise2Noise Achieves the MMSE Estimator
Let and where and are independent, zero-mean noise. Then the minimiser of the Noise2Noise loss
is identical to the minimiser of the supervised loss :
Replacing the clean target with a noisy version adds a term that depends on but not on . Since the noise is zero-mean and independent, the cross-term vanishes in expectation, and the minimiser remains the conditional mean.
Expand the Noise2Noise loss
$
Cross-term vanishes
Since is independent of (hence of ) and of , with :
Conclude
Therefore . The constant does not depend on , so .
Definition: Noise2Self and Noise2Void
Noise2Self and Noise2Void
Noise2Self and Noise2Void train denoisers from a single noisy image by exploiting the statistical independence of noise across pixels.
Noise2Void masks a subset of pixels in the input and predicts them from the surrounding context:
where denotes the noisy image with pixel replaced by interpolation from neighbours, and is the mask set.
Noise2Self generalises this via the concept of -invariance: a function is -invariant if depends only on . For -invariant estimators, self-supervised loss equals supervised loss in expectation.
Noise2Void requires only a single noisy image for training --- no pairs, no clean data. This is the most data-efficient self-supervised method. However, it assumes pixel-independent noise, which fails for correlated noise (common in RF imaging after matched filtering).
Noise2Noise vs. Supervised Training
Compare the denoising PSNR of Noise2Noise training (noisy-noisy pairs) vs. supervised training (noisy-clean pairs) as a function of training set size and SNR. For Gaussian noise, the two methods converge to the same performance; the gap appears only for small training sets (higher gradient variance in N2N).
For non-Gaussian noise (Poisson, speckle), N2N requires the noise to be zero-mean, which fails for Poisson. Observe the performance gap in that regime.
Parameters
Example: Noise2Noise for Radar Imaging
A MIMO radar system collects two independent measurements of the same scene in consecutive coherent processing intervals (CPIs). How can Noise2Noise be applied, and what assumptions must hold?
Training data generation
Each measurement pair consists of two radar returns from the same scene with independent noise (thermal noise, clutter realisation).
Critical assumption: The noise must be independent between the two CPIs. Thermal noise is always independent. Clutter, however, may be static or slowly varying --- if the clutter is identical in both CPIs, Noise2Noise will treat it as signal.
Training
Train a denoiser with loss: .
After training, apply to any single measurement. The denoiser must handle complex-valued inputs (2-channel real/imaginary representation).
RF-specific considerations
For multi-static RF imaging, independent noise realisations can come from:
- Multiple CPIs (time diversity)
- Sub-aperture processing (spatial diversity)
- Frequency sub-band splitting (frequency diversity)
The denoiser architecture should preserve phase information (use complex convolutions or 2-channel representation).
Comparison of Self-Supervised Denoising Methods
| Method | Training Data | Noise Assumption | RF Imaging Source | Limitation |
|---|---|---|---|---|
| Supervised | Noisy-clean pairs | Any | N/A (no ground truth) | Requires ground truth |
| Noise2Noise | Noisy-noisy pairs | Zero-mean, independent | Multiple CPIs | Need repeated measurements |
| Noise2Void | Single noisy image | Pixel-independent | Single measurement | Fails for correlated noise |
| Noise2Self | Single noisy image | J-invariant | Single measurement | Reduced resolution |
Quick Check
Why does Noise2Noise training converge to the MMSE estimator?
The noisy targets average out to the clean image over many training samples
The cross-term vanishes in expectation due to independence
The network learns to subtract the noise from the target
The MSE loss is convex, so any local minimum is global
Correct. The N2N loss equals the supervised loss plus a constant because the cross-term vanishes. The minimiser is the same: .
Historical Note: From NVIDIA Research to Medical Imaging
2018Noise2Noise was developed at NVIDIA Research by Lehtinen et al. (2018). The original paper demonstrated results on natural images, MRI, and Monte Carlo rendered images. The work was motivated by the observation that in many practical scenarios --- medical imaging, microscopy, astronomy --- collecting two noisy measurements is far easier than obtaining a clean ground truth.
The paper's most striking result was that a Noise2Noise-trained denoiser matched the quality of a supervised denoiser trained on millions of clean-noisy pairs, using only noisy-noisy pairs. This result was initially met with skepticism but is now well understood through the lens of conditional expectation theory.
Noise2Noise
A self-supervised training strategy that uses pairs of independent noisy observations of the same scene as input-target pairs, converging to the MMSE estimator without clean ground truth.
Related: Noise2Void
Noise2Void
A self-supervised denoising method that trains from a single noisy image by masking pixels and predicting them from context, requiring only pixel-independent noise.
Related: Noise2Noise
Key Takeaway
Noise2Noise achieves MMSE-optimal denoising using only noisy-noisy pairs --- the cross-term in the MSE expansion vanishes by independence. Noise2Void extends this to a single noisy image by exploiting pixel-independent noise. For RF imaging, independent noise realisations arise naturally from multiple CPIs, sub-aperture processing, or frequency diversity. The critical limitation is that these methods fail for spatially correlated noise (post-matched-filtering).