Training Strategies for Imaging
The Loss Function Shapes the Reconstruction
The choice of loss function determines what the network optimises for, and hence the character of the reconstruction.
- MSE loss produces the posterior mean β smooth and unbiased but blurry when the posterior is multimodal.
- Perceptual loss preserves textures and edges by measuring distance in VGG feature space rather than pixel space.
- Adversarial loss generates sharp, realistic-looking images, but risks hallucination β inventing features not in the true scene.
For RF imaging, where fidelity to the true scene is paramount (a false target in a radar image could trigger a false alarm), the choice of loss function carries significant practical consequences. This section surveys the three main families and their implications, then discusses data augmentation and transfer learning strategies.
Definition: MSE (Mean Squared Error) Loss
MSE (Mean Squared Error) Loss
The MSE loss between reconstruction and ground truth is
Minimising over the training distribution yields the posterior mean .
MSE penalises all pixel errors equally. When the posterior is multimodal β multiple scenes are consistent with the measurements β the posterior mean lies between the modes, producing a blurry reconstruction.
Definition: Perceptual Loss
Perceptual Loss
The perceptual loss measures distance in the feature space of a pretrained classification network (typically VGG-16):
where extracts the feature map at VGG layer , is a set of selected layers, and are the channel, height, and width dimensions.
Perceptual loss encourages structural similarity at multiple scales. It preserves edges and textures better than MSE because VGG features are invariant to small spatial shifts that MSE penalises heavily. For RF scenes with point targets, the perceptual loss can preserve the sharp target signature while suppressing background clutter.
Definition: Adversarial (GAN) Loss
Adversarial (GAN) Loss
The adversarial loss trains a discriminator alongside the reconstruction network :
The generator minimises while the discriminator maximises it. At equilibrium, the generator produces images that the discriminator cannot distinguish from true scenes.
Adversarial training produces the sharpest, most visually realistic reconstructions. The danger is hallucination: the network may generate plausible-looking features not present in the true scene. For scientific imaging (radar, SAR), this is problematic β hallucinated targets could trigger false alarms in detection pipelines.
Theorem: Loss Functions and Bayesian Estimators
Under mild regularity conditions, the optimal reconstruction network trained with different losses converges to distinct Bayesian estimators:
| Loss | Optimal estimator |
|---|---|
| Posterior mean | |
| Posterior median (component-wise) | |
| (Wasserstein) | Sample from the posterior |
Each loss function defines a different notion of "best." MSE minimises average squared error (yields the mean). MAE minimises average absolute error (yields the median, which is mode-seeking for heavy-tailed priors). The adversarial loss matches distributions, so the generator produces samples from the posterior rather than a single point estimate.
MSE $\to$ posterior mean
For a fixed , decompose the MSE: The second term is independent of . Minimising over sets the first term to zero: .
Adversarial $\to$ posterior samples
The optimal generator in a GAN with sufficient discriminator capacity produces samples from the data distribution. When conditioned on , this becomes the posterior . Each forward pass produces a different sample (if noise is injected), unlike the deterministic MSE estimator.
Effect of Loss Function on Reconstruction Quality
Compare reconstructions produced by networks trained with different loss functions on the same RF imaging scene. The MSE-trained network produces smooth but blurry estimates. The perceptual-loss network preserves more texture and target sharpness. The adversarial-loss network produces the sharpest images but may introduce hallucinated point targets. The combined loss balances all three.
Examine the error maps carefully: MSE has the lowest pixel-wise error but the worst perceptual quality, while the adversarial loss has higher pixel-wise error but better structural similarity (SSIM).
Parameters
Definition: SSIM Loss
SSIM Loss
The structural similarity index (SSIM) measures perceptual image quality by comparing luminance, contrast, and structure between and in local windows:
where are local means, variances, and cross-covariance, and are small constants for stability. The SSIM loss is .
SSIM correlates better with human perceptual quality than MSE. For RF scenes, SSIM is particularly useful for evaluating target detection performance: it penalises blurry targets and sidelobe artefacts while being relatively tolerant of background noise variations.
Example: Combined Loss Function for RF Imaging
Design a combined loss function for training an MFβU-Net reconstruction network for a MIMO radar imaging system. The loss should balance pixel accuracy, structural quality, and data consistency.
Define the combined loss
\mathcal{L}{\text{DC}} = |\mathbf{A} f\theta(\hat{\mathbf{c}}^{\text{BP}}) - \mathbf{y}|_2^2$ is the data-consistency loss penalising measurement mismatch.
Choose the weights for RF imaging
For scientific imaging where fidelity is paramount:
- β MSE for pixel accuracy
- β mild perceptual regularisation to sharpen targets
- β SSIM to preserve structural similarity
- β strong data consistency
The high weight on ensures that the reconstruction is physically plausible. The small perceptual weight prevents hallucination while improving sharpness.
Why avoid a pure adversarial loss?
For radar and SAR, adversarial losses are typically avoided because hallucinated targets (false alarms) are more dangerous than slightly blurry reconstructions. The MSE + perceptual + SSIM + data-consistency combination provides a good compromise between sharpness and reliability.
If adversarial loss is included at all, add it with a very small weight () and monitor the measurement residual to detect hallucination.
Data Augmentation for Inverse Problems
Training data for RF imaging reconstruction consists of paired samples generated from a simulator. Standard image augmentation (flips, rotations, crops) must be applied carefully in this setting:
Augmentation in scene space: Flipping the scene must be accompanied by applying the corresponding transformation to the measurements . For operators with spatial symmetry (e.g., symmetric antenna arrays), some transformations can be applied exactly.
Augmentation in measurement space: Randomly masking measurement components ( measurements) at training time teaches the network to be robust to partial aperture coverage. This is particularly effective for MoDL, where the CG step adapts automatically to different effective .
Noise-level augmentation: Training with a range of SNR values (e.g., 5β40 dB) prevents overfitting to a specific noise regime and produces networks that degrade gracefully at low SNR.
Scene diversity: RF imaging scenes are highly non-stationary (isolated point targets vs. extended objects vs. clutter). A diverse training set covering all these regimes is essential for generalisation.
Transfer Learning from Optical to RF Domains
RF imaging networks are often trained on synthetic data because paired ground truth is unavailable in the field. Transfer learning can reduce the simulation-to-real gap:
Optical-to-RF transfer: Pretrain the U-Net backbone on large optical image datasets (ImageNet, COCO) where ground truth is abundant. Fine-tune on synthetic RF data with physically accurate . The low-level feature detectors (edges, textures) transfer well; higher-level semantic features do not.
Simulation-to-real transfer: Train on high-fidelity simulated data with calibrated forward models, then fine-tune on a small set of real measurements (possibly without ground truth, using self-supervised losses from Chapter 23).
Transfer across array geometries: When the sensing geometry changes (different number of antennas, frequencies), the Gram matrix changes. For MoDL, only the CG step changes β the denoiser may transfer without retraining. For MFβU-Net, full retraining is needed.
Domain randomisation: During training, randomise the sensing geometry (antenna positions, frequencies) so the network learns to be geometry-agnostic. At inference, condition on the true geometry via physics-informed channels (Section 20.3).
When Supervised Training is Impossible: The Real-World RF Scenario
The fundamental assumption of supervised training is the availability of paired ground-truth data . In real-world RF imaging deployments, this assumption frequently fails:
- The true scene is unknown β that is precisely what we want to measure.
- Collecting calibrated ground truth requires a controlled environment that does not reflect real deployment conditions.
- Scenes are non-stationary (people, vehicles, weather changes) and ground truth changes faster than data can be labelled.
This motivates the self-supervised and unsupervised approaches developed in Chapter 23: self-supervised losses (Noise2Noise variants, equivariant imaging, SURE-based estimation) that require only measurement pairs without ground truth.
For the short term, the CommIT group approach is: train on synthetic data, validate with a small calibration target, deploy with domain adaptation. Understanding when and why synthetic-to-real transfer works is one of the central open questions in RF imaging.
Common Mistake: Adversarial Loss Hallucination in Scientific Imaging
Mistake:
Using a pure adversarial loss for radar or SAR image reconstruction without data-consistency constraints.
Correction:
GAN-trained networks can hallucinate realistic-looking features (targets, lesions) that do not exist in the true scene. For scientific applications:
- Always include a data-consistency term in the combined loss.
- Prefer MSE + perceptual over pure adversarial training.
- If using adversarial training, add measurement-consistency constraints as hard layers (DC layers) rather than soft penalties.
- Monitor the measurement residual on a held-out validation set to detect hallucination.
- For detection tasks, evaluate false alarm rate (FAR) alongside SSIM and PSNR β hallucinated targets inflate FAR even when pixel-wise metrics look acceptable.
Loss Functions for RF Imaging Reconstruction
| Loss | Optimal estimator | Strengths | Weaknesses for RF imaging |
|---|---|---|---|
| MSE | Posterior mean | Unbiased, mathematically tractable | Blurry for multimodal posteriors |
| MAE () | Posterior median | Robust to outliers | Non-smooth gradient, slower training |
| SSIM | Perceptual quality optimum | Preserves target structure | Non-convex, local optima |
| Perceptual (VGG) | Feature-space mean | Sharp targets, edge-preserving | Not a metric, hallucination risk |
| Adversarial | Posterior sample | Sharpest images | Hallucination, unstable training, false targets |
| Data-consistency | Measurement-feasible estimate | Physical plausibility | Does not promote sparsity or image quality alone |
| Combined (MSE+perc+DC) | Balanced tradeoff | Fidelity + quality + plausibility | Hyperparameter tuning required |
Quick Check
Why do MSE-trained networks tend to produce blurry reconstructions?
Because MSE penalises large errors too strongly
Because the optimal MSE estimate is the posterior mean, which averages over multiple plausible reconstructions
Because MSE ignores high-frequency components
Because the U-Net architecture cannot produce sharp images
When multiple scenes are consistent with the measurements (multimodal posterior), the posterior mean averages over all modes. This averaging produces a blurred image that lies "between" the modes rather than at any single one. The same phenomenon causes blurry face reconstructions in super-resolution and blurry target images in low-SNR radar.
Quick Check
When the sensing matrix changes (different array geometry), which approach requires the least retraining?
Direct inversion network
MF-to-U-Net
MoDL (shared denoiser with new CG step)
Physics-informed U-Net (PSF channel only)
In MoDL, the CG data-consistency step adapts automatically to any new , because is provided explicitly at inference. The shared denoiser encodes scene-space priors that generalise across sensing geometries. Only when the scene statistics change significantly does the denoiser need retraining.
Perceptual loss
A loss function measuring distance between reconstructed and target images in the feature space of a pretrained network (VGG-16), rather than in pixel space. Encourages structural and textural similarity. See DPerceptual Loss.
Related: Perceptual Loss, Adversarial (GAN) Loss
Hallucination (in reconstruction)
A failure mode of trained reconstruction networks (especially GAN-based) where the network generates plausible-looking features in the output that are not present in the true scene. In RF imaging, hallucinated targets are dangerous because they cause false alarms in detection systems. Mitigated by data-consistency losses and hard DC layers. See β Adversarial Loss Hallucination in Scientific Imaging.
Related: Adversarial Loss Hallucination in Scientific Imaging, Adversarial (GAN) Loss
Key Takeaway
-
MSE loss yields the posterior mean β smooth but blurry when the posterior is multimodal.
-
Perceptual loss preserves textures and edge sharpness by measuring distance in VGG feature space.
-
Adversarial loss produces sharp images but risks hallucination β generating features not in the true scene. Avoid for radar and SAR without strong data-consistency constraints.
-
For RF imaging, a combined loss (MSE + perceptual + SSIM + data-consistency) provides the best tradeoff between fidelity and image quality.
-
Data augmentation must respect the physical relationship between scene and measurements β augment jointly in space.
-
Transfer learning reduces the sim-to-real gap: pretrain on synthetic data, fine-tune on real measurements. For MoDL, the denoiser transfers across geometries; for MFβU-Net, retraining is required.
-
When ground truth is unavailable (the real-world RF scenario), supervised training fails β motivating Chapter 23's self-supervised methods.