Chapter Summary

Key Points

1.
Deep Image Prior (DIP) uses the CNN architecture as an implicit prior, reconstructing from a single measurement without training data. Spectral bias causes low-frequency signal to be learned before high-frequency noise; early stopping acts as regularisation. The Deep Decoder eliminates the need for early stopping via under-parameterisation.
2.
Noise2Noise trains denoisers from noisy-noisy pairs (no clean data), converging to the MMSE estimator because the cross-term vanishes by independence. Noise2Void extends this to a single noisy image by exploiting pixel-independent noise, but fails for correlated noise (common after matched filtering in RF imaging).
3.
SURE provides an unbiased MSE estimate without clean targets, using the denoiser's divergence as a correction term computed via a single Monte Carlo probe vector. SURE-trained denoisers match supervised quality for Gaussian noise. GSURE extends to inverse problems but is blind to the null space.
4.
Equivariant imaging uses known signal symmetries (rotations, shifts, flips) as self-supervision, constraining the null space of the forward operator without ground truth. The equivariance loss creates "virtual measurements" that provide indirect observations of unmeasured subspace components.
5.
Foundation models provide general-purpose priors that can be adapted to RF imaging via LoRA fine-tuning or operator conditioning (RAM). The domain gap between natural images and RF scenes requires careful adaptation; Caire's vision of a simulation-pretrained RF foundation model connects physics-based modelling with data-driven reconstruction.
6.
Self-supervised methods form a spectrum of data requirements: DIP (one measurement), Noise2Void (one noisy image), SURE (noisy images with known noise), Noise2Noise (noisy pairs), EI (unpaired measurements + symmetries), foundation models (large pretraining + small adaptation). The choice depends on available data, noise statistics, and computational budget.

Looking Ahead

This chapter completes Part VI on learned reconstruction methods. The methods of Chapters 20--23 form a spectrum from fully supervised (Chapter 20) to fully unsupervised (this chapter), with increasing independence from training data but also increasing reliance on architectural priors and symmetry assumptions.

Part VII takes a fundamentally different perspective: rather than reconstructing images on a voxel grid, neural scene representations (NeRF, 3D Gaussian splatting, signed distance functions) model the 3D scene as a continuous function, enabling joint estimation of geometry, reflectivity, and material properties from RF measurements.

Foundation Models for Imaging Exercises