Exercises
ex23-01-dip-loss
EasyWrite the DIP loss function for a linear inverse problem with generator network and fixed input . What is being optimised: the input, the weights, or both?
The DIP loss measures the discrepancy between the measurements and the forward-modelled reconstruction.
Only is optimised; is fixed after initial sampling.
DIP loss
\theta\mathbf{z}\mathcal{N}(\mathbf{0}, \mathbf{I})\hat{\mathbf{x}} = f_{\theta^*}(\mathbf{z})\square$
ex23-02-n2n-proof
EasyProve that the Noise2Noise loss has the same minimiser as the supervised loss when and are independent zero-mean noise.
Write and expand the squared norm.
The cross-term vanishes due to independence and zero mean.
Expand the Noise2Noise loss
$
Cross-term vanishes
Since is independent of (and hence of and ) with :
The N2N loss equals the supervised loss plus a constant . The minimiser over is the same.
ex23-03-sure-linear
EasyFor a linear denoiser , compute the divergence and write the SURE loss in closed form.
For a linear map, .
Divergence of a linear map
. Therefore and .
SURE loss
\mathbf{W}\mathbf{W}^* = \mathbf{C}_x(\mathbf{C}_x + \sigma^2\mathbf{I})^{-1}\square$
ex23-04-ei-definition
EasyWrite the equivariant imaging loss for a reconstruction network , forward operator , and a group of discrete transformations. Explain the role of each term.
There are two terms: data consistency and equivariance.
Data consistency loss
\mathbf{A}^H$).
Equivariance loss
\mathbf{A}$).
Total loss
where balances the two objectives.
ex23-05-foundation-gap
EasyList three statistical differences between natural images and RF reflectivity maps that create a domain gap for foundation models. For each, suggest a mitigation strategy.
Think about dynamic range, complex values, and texture statistics.
Domain gap factors
-
Dynamic range: Natural images: 8-bit (0--255). RF reflectivity: 40+ dB dynamic range, often log-scaled. Mitigation: Normalise to log-magnitude before feeding to the foundation model.
-
Complex values: Natural images are real-valued (RGB). RF signals are complex (amplitude + phase). Mitigation: Use 2-channel (real/imaginary) representation, or separate magnitude/phase processing.
-
Texture statistics: Natural images have smooth textures and sharp edges. RF images have speckle (multiplicative noise), sidelobes, and grating lobes. Mitigation: Fine-tune on simulated RF data, or use LoRA adaptation with a small RF dataset.
ex23-06-dip-overfitting
MediumA DIP reconstruction of a image uses a U-Net with 1.5 million parameters. The image has pixels. Explain why the network can overfit and estimate the number of iterations before overfitting begins. Compare with a Deep Decoder having 50K parameters.
The network has more parameters than pixels.
Overfitting begins when the network starts memorising the noise pattern.
Over-parameterisation
With 1.5M parameters and 16K pixels, the network is over-parameterised. It has sufficient capacity to fit any function, including pure noise.
Overfitting timeline
Empirically, DIP overfitting begins after -- iterations. The spectral bias delays overfitting: low-frequency signal ( of energy) is fit in iterations; mid-frequency details in ; high-frequency noise in +. The optimal stopping point depends on SNR.
Deep Decoder comparison
A Deep Decoder with 50K parameters ( the pixel count) cannot memorise noise due to under-parameterisation. It converges without overfitting but may miss fine details. The U-Net DIP achieves higher peak PSNR (at the optimal stopping point) but requires careful early stopping.
ex23-07-sure-soft-threshold
MediumCompute SURE for the soft-thresholding denoiser . Find the optimal threshold as a function of and the signal statistics.
.
The optimal balances bias (from thresholding signal) and variance (from noise leakage).
SURE for soft thresholding
The divergence is .
Optimal threshold
For Gaussian signal + Gaussian noise with known sparsity fraction , the optimal threshold scales as (the universal threshold).
In practice, minimise SURE numerically: evaluate for a grid of values and choose the minimiser. This requires no knowledge of .
ex23-08-ei-shift-fourier
MediumFor a partial Fourier sensing matrix , show that a spatial shift by pixels corresponds to a phase rotation in Fourier space. Explain why this is useful for equivariant imaging.
Shift theorem: .
Fourier shift theorem
A circular shift by pixels: . In Fourier domain: .
Effect on measurements
.
The phase rotation does not change which frequencies are measured, but it modulates them differently. Different shifts create a system of equations at each measured frequency.
EI benefit
The EI constraint for multiple shifts forces the network to produce consistent reconstructions under different phase modulations. This implicitly constrains the unmeasured frequencies through the network's learned inductive bias.
ex23-09-gsure-derivation
MediumDerive GSURE for the inverse problem with . Show that it estimates the projected MSE and explain why it cannot constrain the null space.
Apply standard SURE to the composite map .
Apply SURE to the projected estimate
Define . Applying standard SURE:
Divergence
where is the Jacobian of . MC estimate: with .
Null-space blindness
GSURE estimates , which depends only on the range-space error. If , then , so GSURE is identical for and . Additional regularisation (TV, EI, prior) is needed.
ex23-11-n2n-gradient-variance
MediumCompare the gradient variance of Noise2Noise and supervised training. Show that N2N has higher gradient variance and explain the practical implications for training.
The N2N gradient has an extra noise term from the noisy target.
Gradient comparison
Supervised gradient: .
N2N gradient: .
The N2N gradient has an additional term .
Variance analysis
.
The extra variance is proportional to and the Jacobian norm.
Practical implications
Higher gradient variance means N2N requires: (1) smaller learning rate, (2) larger batch size, or (3) more training iterations to achieve the same convergence. Typically, N2N needs -- more iterations than supervised training, but the per-iteration cost is the same.
ex23-12-dip-tv
HardCombine DIP with total variation regularisation to create a more robust reconstruction. Write the modified loss, explain how TV interacts with DIP's spectral bias, and analyse whether early stopping is still needed.
Modified loss: .
TV penalises high-frequency content, reinforcing the spectral bias.
Modified DIP loss
$
Interaction with spectral bias
DIP's spectral bias learns low frequencies first. TV penalises high-frequency gradients, reinforcing this tendency. The combined effect:
- Early iterations: DIP learns low-frequency structure; TV is inactive (low gradient values).
- Mid iterations: DIP starts fitting mid-frequency details; TV selectively preserves edges while suppressing oscillations.
- Late iterations: TV prevents the network from fitting high-frequency noise, extending the useful training window.
Early stopping analysis
TV significantly reduces the need for early stopping: the PSNR curve has a plateau (from to iterations) rather than a sharp peak ( without TV). However, for very long runs ( iterations), the network can still overfit in the TV-null directions (constant regions). In practice, DIP+TV is much more robust to the stopping criterion than vanilla DIP.
ex23-13-ei-recovery-proof
HardProve that for a partial Fourier matrix with , equivariant imaging with all circular shifts recovers the full signal (assuming shift-invariant signal class).
Each shift creates measurements with different phase modulations at the same frequency locations.
Show that the system of equations is over-determined.
Shift creates virtual measurements
For shift : . The EI constraint requires correct reconstruction from these modulated measurements.
Full system
With shifts , the EI constraints create a system of equations. For each measured frequency , the shifts provide modulated observations with phases for .
The reconstruction must be consistent with all these modulated views, which constrains not just the measured coefficients but also the unmeasured ones through the network.
Over-determined system
The system has equations for unknowns (the Fourier coefficients). Since , this is over-determined, and the unique solution (for a perfect network) is the true signal. The condition for exact recovery is that shifts generate enough "virtual diversity" to span all frequencies through the reconstruction network.
ex23-14-sure-nonGaussian
HardExtend SURE to Poisson noise. For , derive an unbiased risk estimate analogous to SURE for Gaussian noise.
For Poisson: (Stein-type identity for Poisson).
Poisson Stein identity
For : for any with .
Equivalently: .
Poisson SURE (PURE)
For denoiser applied to Poisson data: where is the -th canonical vector.
The divergence term is a finite-difference approximation that replaces the Gaussian divergence.
Practical limitation
PURE requires forward passes (one per pixel perturbation), which is much more expensive than MC-SURE (one extra backward pass). Efficient approximations use random subsets of pixels.
ex23-15-dip-complex
HardDesign a DIP reconstruction for complex-valued SAR images that handles: (1) complex signals, (2) multiplicative speckle noise, and (3) phase preservation. Compare with standard real-valued DIP.
Use a 2-channel output for real and imaginary parts.
For speckle: work in log-domain to convert multiplicative noise to additive.
Complex DIP architecture
Network outputs 2 channels: . The complex image is .
Speckle-aware loss
For multiplicative speckle with Goodman model: combining the negative log-likelihood with data fidelity on raw measurements.
Phase preservation
Add a phase consistency loss: on the measured frequencies. The complex DIP naturally preserves phase through the 2-channel representation.
Comparison
Real DIP on magnitude: PSNR dB, no phase. Complex DIP: PSNR dB, phase error on strong scatterers. The improvement comes from jointly optimising magnitude and phase.
ex23-16-ei-measurement-splitting
HardCombine equivariant imaging with measurement splitting for RF imaging with a partial Fourier forward model. Write the combined loss and analyse how each component contributes to null-space recovery.
Split measurements: , .
Combined loss
$
Complementary contributions
- Measurement splitting constrains frequencies in (cross-validates on held-out measurements).
- Data consistency constrains frequencies in .
- Equivariance constrains frequencies in (null space) via symmetry-induced virtual measurements.
Together, they provide supervision for all frequency components.
Advantage
The combination is more robust than either method alone: measurement splitting handles non-symmetric scenes, while EI handles scenes where the measurement split is too sparse.
ex23-17-ram-conditioning
ChallengeDesign a conditioning mechanism for a RAM-style foundation model that adapts to different RF imaging forward operators. The model should handle partial Fourier, diffraction tomography, and MIMO radar sensing matrices without retraining.
The forward operator can be encoded via its SVD, its PSF, or a learned embedding.
Consider both explicit conditioning (operator as input) and implicit conditioning (via the data consistency loss).
SVD-based conditioning
Encode via its truncated SVD: . Feed the singular values and a low-dimensional representation of to the network as side information.
The network architecture: uses cross-attention or FiLM conditioning to modulate features based on the operator.
PSF conditioning
For shift-invariant operators, encode via its point spread function (PSF). The PSF is a compact 2D/3D function that captures the operator's spatial characteristics. This is natural for RF imaging where the PSF depends on array geometry and frequency.
Learned operator embedding
Train an operator encoder that maps any linear operator to a fixed-dimensional embedding. During RAM pretraining, the encoder learns to extract operator-relevant features (rank, condition number, spatial frequency coverage).
Test-time adaptation
For new operators not seen during training: (1) compute the conditioning vector, (2) run the foundation model for an initial reconstruction, (3) fine-tune using SURE or EI losses for 50--100 iterations. This combines the foundation model's broad prior with operator-specific adaptation.
ex23-18-self-supervised-comparison
ChallengeDesign a comprehensive comparison experiment of self-supervised methods for RF imaging. Compare DIP, Noise2Noise, SURE+PnP, equivariant imaging, and foundation model transfer on the same test set. Define evaluation metrics, data requirements, and predict which method wins in each regime.
Each method has different data requirements: DIP (1 measurement), N2N (paired noisy), SURE+PnP (noisy images), EI (unpaired measurements), FM (pretrained model).
Consider both low-SNR/high-SNR and sparse/extended scene regimes.
Experimental setup
Test set: 100 simulated RF scenes (50 sparse point-scatterers, 50 extended). Forward model: partial Fourier with compression ratio . SNR: 10, 20, 30 dB.
Data requirements
| Method | Training data | Test-time data | Test-time compute |
|---|---|---|---|
| DIP | None | 1 meas. + optimisation | min/image |
| N2N | 1000 noisy pairs | 1 measurement | s |
| SURE+PnP | 1000 noisy images | 1 measurement | s |
| EI | 1000 unpaired meas. | 1 measurement | s |
| FM (LoRA) | ImageNet + 100 RF | 1 measurement | s |
Predicted winners
- Low SNR (10 dB), sparse: DIP (strong architectural bias for sparse signals)
- Low SNR (10 dB), extended: EI (symmetries provide null-space info)
- High SNR (30 dB), sparse: SURE+PnP (denoiser quality dominates)
- High SNR (30 dB), extended: N2N (strongest supervision among self-supervised)
- Domain shift at test time: FM with LoRA (robust to distribution changes)
Metrics
Primary: PSNR (magnitude), SSIM, phase error (degrees). Secondary: computational time, memory, hyperparameter sensitivity.
ex23-19-sure-pnp
ChallengeDesign a SURE-based training procedure for the plug-and-play denoiser in a PnP-ADMM algorithm. The denoiser is trained end-to-end through the ADMM iterations using SURE loss (no clean ground truth). Analyse convergence and the interaction between SURE training and ADMM convergence.
Unroll ADMM iterations and compute SURE on the final output.
The divergence must be computed through the entire unrolled ADMM.
Unrolled PnP-ADMM
Unroll ADMM iterations: where the denoiser is applied at each proximal step. The end-to-end map is .
SURE on the unrolled output
Apply GSURE to the end-to-end reconstruction:
The divergence is computed via backpropagation through the entire unrolled ADMM (MC estimate with one probe vector).
Convergence analysis
Two nested optimisation loops interact:
- Outer loop: gradient descent on to minimise GSURE.
- Inner loop: ADMM iterations for reconstruction.
For convergence: (1) the ADMM must converge for each fixed (requires the denoiser to be firm non-expansive), (2) the GSURE gradient must be unbiased (requires Gaussian noise). In practice, -- iterations and the denoiser is regularised for non-expansiveness via spectral normalisation.
ex23-20-ei-rf-multiview
ChallengeFor a multi-static RF imaging system with transmitters and receivers, design an equivariant imaging framework that exploits both spatial symmetries and measurement redundancy. Analyse the null-space recovery guarantee as a function of the array geometry and the symmetry group.
The multi-static sensing matrix has a block structure: .
Consider both scene symmetries and array symmetries.
Multi-static forward model
Each Tx-Rx pair provides where depends on the steering vectors and . The combined system has measurements for voxels.
Scene symmetries
For scenes invariant under rotations by angle : the rotation permutes the voxel grid, and the EI loss enforces .
For the rotation to mix range and null spaces, the array must not have the same rotational symmetry as the scene (otherwise the measurements are invariant under rotation, providing no information).
Array-induced symmetries
If the array has -fold symmetry (e.g., UPA with ), the measurements of a rotated scene can be computed from the measurements of the original scene by permuting Tx-Rx indices. This provides "free" data augmentation that does not require re-measurement.
Recovery guarantee
Full recovery requires the group action to be transitive on the null space. For a UPA with elements at wavelength and scene diameter , the null space has dimension . The number of independent constraints from group elements is where overlap measures redundancy between transformed measurements. Recovery requires this to exceed .