Deep Image Prior (DIP) and Deep Decoder
The Network Architecture IS the Prior
All methods in Chapters 20--22 require either paired training data (supervised) or a pretrained generative model (diffusion). Deep Image Prior (DIP) demonstrates a remarkable fact: the architecture of a neural network itself encodes a prior over natural images, even with random, untrained weights.
DIP fits a randomly initialised network to a single measurement by optimising the network weights. The key observation is that the network architecture imposes an implicit regularisation: natural images are learned faster than noise, so early stopping acts as the regulariser. This is particularly valuable for RF imaging, where ground-truth reflectivity maps are rarely available.
Definition: Deep Image Prior (DIP)
Deep Image Prior (DIP)
The Deep Image Prior reconstructs an image by optimising the weights of a generator network to fit the measurements:
where is a fixed random input (not optimised) and is a U-Net or encoder-decoder CNN.
The reconstruction is where , with early stopping to prevent overfitting to noise.
DIP requires no training data --- it reconstructs from a single measurement. The "prior" is encoded in the network architecture: convolutional layers favour spatially smooth, locally correlated images. This bias toward natural-looking images provides implicit regularisation.
Historical Note: The Accidental Discovery of DIP
2018Ulyanov, Vedaldi, and Lempitsky (2018) initially set out to study texture generation using untrained networks. They noticed that when fitting a CNN to a single noisy image, the network produced a clean version before learning the noise --- a behaviour that seemed to violate the expectation that over-parameterised networks immediately memorise their training data.
This observation led to the DIP paper, which argued that the CNN architecture itself acts as a regulariser. The result was surprising because it contradicted the prevailing view that neural network priors must be learned from data. DIP demonstrated that the inductive bias of convolutional architectures is, by itself, a powerful image prior.
Theorem: Spectral Bias of Deep Image Prior
During gradient descent optimisation of the DIP objective, the network learns low-frequency components of the target faster than high-frequency components. Specifically, for a network with convolutional layers of kernel size , the Fourier coefficient of the output at frequency and optimisation step satisfies:
where the convergence rate decreases with --- low frequencies converge first.
The network acts as a low-pass filter that progressively admits higher frequencies as training progresses. Signal (which is typically low-frequency dominant) is learned first; noise (which is broadband) is learned later. Stopping optimisation at the right time captures the signal while rejecting noise.
Neural tangent kernel analysis
In the infinite-width limit, the network's learning dynamics are characterised by the Neural Tangent Kernel (NTK). For convolutional networks, the NTK has a block-circulant structure whose eigenvalues decay with spatial frequency. Denoting the NTK eigenvalue at frequency by , gradient descent converges as:
Since decreases with , low frequencies converge first.
Early stopping as regularisation
Stopping at iteration is equivalent to spectral regularisation with a frequency-dependent filter: . High frequencies (small ) are suppressed. This is analogous to Tikhonov regularisation with parameter .
DIP Spectral Bias and Early Stopping
Visualise the DIP reconstruction process. The left subplot shows PSNR vs. iteration, exhibiting the characteristic "rise and fall": PSNR increases as the network learns the signal, peaks at the optimal stopping point, then decreases as the network overfits to noise.
The right subplot shows the power spectrum of the reconstruction at the current iteration, compared to the ground truth and the noisy measurement. Observe how low frequencies are recovered first. The Deep Decoder architecture has a stronger spectral bias (slower overfitting) than the U-Net.
Parameters
Definition: Deep Decoder
Deep Decoder
The Deep Decoder is an under-parameterised variant of DIP that uses a decoder-only architecture (no skip connections, no encoder):
where is bilinear upsampling and has channels. The number of parameters is deliberately kept much smaller than the number of pixels: .
This under-parameterisation prevents the network from memorising noise, eliminating the need for early stopping.
The Deep Decoder provides a more principled alternative to DIP: the regularisation is architectural (under-parameterisation) rather than procedural (early stopping). However, the reconstruction quality is slightly lower because the reduced capacity also limits the network's ability to represent fine details.
Example: Early Stopping in DIP: Practical Strategies
A DIP reconstruction of a RF reflectivity map is run for 10,000 iterations. The PSNR curve peaks at iteration 3,200 ( dB) and falls to 22.1 dB at iteration 10,000. How do you determine the optimal stopping point in practice (without ground truth)?
The stopping problem
Without ground truth, we cannot compute PSNR. Three practical methods for determining the stopping point:
Measurement residual monitoring
Track vs. iteration. When this drops below (the expected noise energy), the network has fit the signal; further iterations fit noise. Stop when .
Running average (exponential smoothing)
Average the last iterates: . This smooths out the noise-fitting oscillations and reduces sensitivity to the exact stopping point. Typically --.
Cross-validation on held-out measurements
Split the measurements: fit using , monitor the residual on . The validation residual increases when the network begins overfitting. This is the most reliable method but requires sufficient measurements.
Deep Image Prior Reconstruction
Complexity: where is the cost of one forward+backward pass through the network. Typically -- iterations.Unlike supervised methods that amortise training cost over many test images, DIP requires per-image optimisation. This makes it slow (-- minutes per image on GPU) but eliminates the need for any training data.
Common Mistake: DIP Overfits to Noise Without Early Stopping
Mistake:
Running DIP optimisation until convergence and using the final iterate as the reconstruction.
Correction:
DIP will overfit to noise if run long enough. The final iterate perfectly reproduces the noisy measurements () but amplifies noise in the null space of .
Always use early stopping, running-average smoothing, or the Deep Decoder to prevent overfitting. For RF imaging with unknown noise level, use cross-validation on held-out measurements.
Quick Check
In DIP, the random input is:
Optimised jointly with the network weights
Fixed throughout optimisation --- only is updated
Set equal to the noisy measurement
Learned from a separate training set
Correct. The input is sampled once from and remains fixed. The network weights are the only optimisation variables.
Why This Matters: DIP for RF Imaging Without Training Data
DIP is especially valuable for RF imaging because:
-
No training data: RF reflectivity ground truth is almost never available. DIP reconstructs from a single measurement.
-
Flexible forward model: The sensing matrix can be any linear operator --- partial Fourier, diffraction tomography, MIMO radar --- without retraining.
-
Complex-valued extension: DIP naturally handles complex RF signals by using a 2-channel (real/imaginary) output.
The main limitation is speed: per-image optimisation takes minutes, making DIP unsuitable for real-time RF imaging. However, it serves as an excellent baseline and can be combined with learned initialisation (meta-learning) for faster convergence.
Deep Image Prior (DIP)
A reconstruction method that uses the architecture of an untrained CNN as an implicit regulariser, optimising network weights to fit a single measurement with early stopping to prevent noise overfitting.
Related: Deep Decoder, Spectral Bias
Deep Decoder
An under-parameterised variant of DIP using a decoder-only architecture with far fewer parameters than pixels, eliminating the need for early stopping by preventing the network from memorising noise.
Related: Deep Image Prior (DIP)
Spectral Bias
The tendency of neural networks to learn low-frequency components of a target function before high-frequency components, arising from the eigenvalue structure of the Neural Tangent Kernel.
Related: Deep Image Prior (DIP)
Key Takeaway
Deep Image Prior uses the CNN architecture as an implicit prior, requiring no training data. Spectral bias causes signal to be learned before noise; early stopping acts as regularisation. The Deep Decoder eliminates early stopping via under-parameterisation but sacrifices some expressivity. DIP is especially valuable for RF imaging where ground-truth reflectivity data is scarce.