Chapter Summary

Key Points

1.
The MF-to-U-Net pipeline splits reconstruction into a fixed physics step ( $\hat{\mathbf{c}}^{\text{BP}} = \mathbf{A}^{H}\mathbf{y}$ ) and a learned U-Net image-to-image mapping. The MF image decomposes as $\mathbf{G}\mathbf{c} + \tilde{\mathbf{w}}$ , requiring the network to simultaneously denoise and deconvolve.
2.
For random sensing matrices, $\mathbf{G} \approx \frac{M}{N}\mathbf{I}$ and the task is essentially denoising — CNNs excel here. For physically structured operators (phased arrays, MIMO radar, OFDM sensing), $\mathbf{G}$ has strong sidelobe structure and back-projected noise is coloured with the same correlation. This is the sidelobe corruption problem.
3.
The CommIT group finding: sidelobe artefacts and true scene features are statistically correlated through the shared Gram matrix $\mathbf{G}$ . U-Nets trained on such data cannot reliably distinguish real targets from sidelobe ghosts, especially for high-dynamic-range scenes. This motivates data-consistency layers and physics-informed architectures.
4.
Data-consistency (DC) layers enforce hard or soft measurement constraints after each network block by projecting back onto $\{\mathbf{c} : \|\mathbf{A}\mathbf{c} - \mathbf{y}\| \leq \epsilon\}$ . For orthonormal-row $\mathbf{A}$ , the DC layer replaces measured components with $\mathbf{A}^{H}\mathbf{y}$ and preserves the network's prediction in the null space.
5.
MoDL alternates between a shared CNN denoiser and a conjugate-gradient data-consistency step with learned step sizes $\lambda_k$ . At a fixed point, MoDL solves a regularised inverse problem whose implicit regulariser is defined by the denoiser.
6.
Physics-informed post-processing augments the U-Net input with physics-derived channels: PSF diagonal $\operatorname{diag}(\mathbf{G})$ , residual gradient $\mathbf{A}^{H}(\mathbf{y} - \mathbf{A}\hat{\mathbf{c}})$ , and geometry embeddings. By the data-processing inequality, conditioning on more physics strictly reduces the MMSE unless $\mathbf{G} \propto \mathbf{I}$ .
7.
Loss functions determine what the network optimises for: MSE yields the posterior mean (blurry for multimodal posteriors), perceptual loss preserves texture and target sharpness, adversarial loss risks hallucination. For RF imaging, a combined loss (MSE + perceptual + SSIM + data-consistency) is recommended. Pure adversarial training is dangerous for detection applications.
8.
When ground truth is unavailable — the real-world RF scenario — supervised training fails entirely, motivating self-supervised and unsupervised approaches (Chapter 23).

Looking Ahead

Chapter 21 takes the next logical step beyond MoDL: plug-and-play (PnP) algorithms replace the proximal operator in ADMM or PGD with a pre-trained denoiser, connecting the ideas of Section 20.2 to a rigorous variational framework. Unlike MoDL (which trains the denoiser jointly with the data-consistency step), PnP uses an off-the-shelf denoiser — the sensing operator $\mathbf{A}$ is never used during denoiser training. This flexibility allows the same denoiser to be reused across different RF imaging configurations, at the cost of requiring convergence theory for non-expansive denoisers.

Training Strategies for Imaging Exercises