Ferkans — Interactive Telecom Tutor

Giving the Network an Explicit Map of the Physics

The MF→U-Net pipeline uses $\mathbf{A}$ only once (to form $\hat{\mathbf{c}}^{\text{BP}}$ ) and discards it. MoDL re-applies $\mathbf{A}$ at each CG step but only for data consistency — the denoiser still operates without knowledge of the PSF.

A complementary strategy is physics-informed post-processing: augmenting the U-Net input with additional channels that encode the sensing geometry explicitly. Instead of learning to infer the PSF from training data alone, the network is handed the relevant physics on a plate:

The PSF image $\operatorname{diag}(\mathbf{G})$ (per-pixel energy of the Gram matrix)
The residual back-projection $\mathbf{A}^{H}(\mathbf{y} - \mathbf{A}\hat{\mathbf{c}}_0)$
The noise level map $\sigma^2\operatorname{diag}(\mathbf{G})$
Geometry embeddings encoding array positions, frequencies, or Tx–Rx pairs

These inputs give the U-Net explicit information about the forward model without requiring iterative computations at test time.

,

Definition:
Physics-Informed Post-Processing Network

A physics-informed post-processing network augments the matched-filter image with additional physics-derived input channels:

$\hat{\mathbf{c}} = f_\theta\!\bigl( \underbrace{\hat{\mathbf{c}}^{\text{BP}}}_{\text{MF image}},\;\; \underbrace{\operatorname{diag}(\mathbf{G})}_{\text{PSF diagonal}},\;\; \underbrace{\mathbf{A}^{H}(\mathbf{y} - \mathbf{A}\hat{\mathbf{c}}^{\text{BP}})}_{\text{residual gradient}},\;\; \ldots \bigr).$

The network input is a multi-channel tensor where each channel encodes a different aspect of the measurement physics. The architecture may remain a standard U-Net, ResNet, or attention-based model, but the input representation is enriched.

This approach occupies a middle ground between pure post-processing (Section 20.1) and full MoDL unrolling (Section 20.2). It incorporates physics without requiring differentiable forward/adjoint operators during inference, making it compatible with non-differentiable legacy simulators.

Example: Residual-Feedback Post-Processing

Consider a two-pass architecture:

Pass 1: $\hat{\mathbf{c}}_0 = f_{\theta_1}(\hat{\mathbf{c}}^{\text{BP}})$

Pass 2: $\hat{\mathbf{c}} = f_{\theta_2}\!\bigl(\hat{\mathbf{c}}_0,\; \mathbf{A}^{H}(\mathbf{y} - \mathbf{A}\hat{\mathbf{c}}_0)\bigr)$

Show that the second input to pass 2 is the negative gradient of the data-fidelity term $\frac{1}{2}\|\mathbf{y} - \mathbf{A}\mathbf{c}\|^2$ evaluated at $\hat{\mathbf{c}}_0$ , and explain the connection to MoDL.

Solution

Identify the data-fidelity gradient

The data-fidelity term is $\frac{1}{2}\|\mathbf{y} - \mathbf{A}\mathbf{c}\|^2$ . Its gradient with respect to $\mathbf{c}$ is

$\nabla_{\mathbf{c}} \frac{1}{2}\|\mathbf{y} - \mathbf{A}\mathbf{c}\|^2 = -\mathbf{A}^{H}(\mathbf{y} - \mathbf{A}\mathbf{c}).$

Evaluated at $\hat{\mathbf{c}}_0$ , this is $-\mathbf{A}^{H}(\mathbf{y} - \mathbf{A}\hat{\mathbf{c}}_0)$ .

Interpret as a learned gradient-descent step

The second input to the network is exactly the negative gradient of data fidelity at $\hat{\mathbf{c}}_0$ . Pass 2 receives:

The current estimate $\hat{\mathbf{c}}_0$
The descent direction for improving data consistency

The network learns to combine these into an improved estimate, effectively performing one step of learned gradient descent with an adaptive step size embedded in $\theta_2$ .

Connection to MoDL

Stacking $K$ such passes yields a cascade network:

$\hat{\mathbf{c}}_{k+1} = f_{\theta_k}\!\bigl(\hat{\mathbf{c}}_k,\; \mathbf{A}^{H}(\mathbf{y} - \mathbf{A}\hat{\mathbf{c}}_k)\bigr), \quad k = 0, \ldots, K-1.$

This is a precursor to MoDL: MoDL replaces the single gradient step with a CG solve that takes multiple conjugate gradient steps to minimise the data-consistency objective exactly. Both architectures share the principle of injecting physics-derived information (the gradient or residual) into the network at each stage.

Theorem: Conditioning on PSF Reduces the Deconvolution Task

Let $f_\theta^{\text{blind}}(\hat{\mathbf{c}}^{\text{BP}})$ and $f_\theta^{\text{informed}}(\hat{\mathbf{c}}^{\text{BP}}, \operatorname{diag}(\mathbf{G}))$ denote reconstruction networks with and without PSF conditioning. Under a Gaussian scene prior $\mathbf{c} \sim \mathcal{CN}(\mathbf{0}, \sigma_x^2\mathbf{I})$ and noise $\mathbf{w} \sim \mathcal{CN}(\mathbf{0}, \sigma^2\mathbf{I})$ , the MMSE estimator given $(\hat{\mathbf{c}}^{\text{BP}}, \operatorname{diag}(\mathbf{G}))$ achieves strictly lower MSE than the MMSE estimator given only $\hat{\mathbf{c}}^{\text{BP}}$ , unless $\mathbf{G}$ is a multiple of $\mathbf{I}$ .

If the network does not know the PSF, it must estimate it implicitly from the input image — a harder task. When $\mathbf{G}$ is provided as an additional channel, the network can focus on denoising and regularisation, delegating the geometry-dependent deconvolution to a simple closed-form computation.

Proof

Data-processing inequality

Conditioning on strictly more information cannot increase the MMSE. Since $(\hat{\mathbf{c}}^{\text{BP}}, \operatorname{diag}(\mathbf{G}))$ contains strictly more information about $\mathbf{c}$ than $\hat{\mathbf{c}}^{\text{BP}}$ alone (unless $\mathbf{G}$ is a deterministic function of $\hat{\mathbf{c}}^{\text{BP}}$ for all scenes), the conditional MMSE

$\operatorname{MMSE}(\mathbf{c} \mid \hat{\mathbf{c}}^{\text{BP}}, \operatorname{diag}(\mathbf{G})) \leq \operatorname{MMSE}(\mathbf{c} \mid \hat{\mathbf{c}}^{\text{BP}}).$

Strict inequality for non-trivial PSF

When $\mathbf{G} \neq c\mathbf{I}$ , the PSF diagonal $\operatorname{diag}(\mathbf{G})$ carries information about the per-pixel SNR not present in $\hat{\mathbf{c}}^{\text{BP}}$ : pixel $i$ in $\hat{\mathbf{c}}^{\text{BP}}$ has SNR $\approx \sigma_x^2 G_{ii}^2 / (\sigma^2 G_{ii}) = \sigma_x^2 G_{ii} / \sigma^2$ , which varies with $G_{ii}$ . Knowing this map enables the network to weight contributions from high-SNR pixels more strongly. $\blacksquare$

Historical Note: Physics-Informed Neural Networks in Imaging — A Brief History

2016–2024

The idea of embedding physics into neural networks for imaging predates the deep learning era. In X-ray computed tomography, filtered back-projection (FBP) — a physics-derived preprocessing step — has been combined with learned post-processors since at least 2017 (Jin et al.).

The term "physics-informed neural network" (PINN) was popularised by Raissi et al. (2019) in the context of PDE-constrained learning, but its spirit in imaging is older: the DeepMedic (2017) and Cascade Net (Schlemper et al., 2018) architectures for MRI both embed measurement operators into the network graph.

For RF imaging specifically, the challenge is that the sensing operator $\mathbf{A}$ varies with array geometry, frequency, and scene position — making it more like a family of forward models than a single fixed one. Geometry-conditioned networks that take array positions as part of their input represent the current frontier, bridging learned reconstruction with the array signal processing tradition.

,

Conditioning on Sensing Geometry for Generalisation

A physics-informed network that takes only the matched-filter image and the PSF diagonal still cannot generalise to entirely new sensing geometries without retraining, because the full off-diagonal structure of $\mathbf{G}$ matters for deconvolution and is not captured by $\operatorname{diag}(\mathbf{G})$ alone.

Geometry-conditioned networks take an additional input that encodes the sensing configuration:

Explicit antenna positions: $\{(\mathbf{s}_{i}, \mathbf{r}_{j})\}_{i,j}$ as a fixed-length embedding computed from the array geometry.
Frequency set: the set of OFDM subcarrier frequencies used.
PSF slice: a small representative region of $\mathbf{G}$ near the scene centre (cheaper to compute than the full Gram matrix).

These embeddings are fed into the network via feature-wise linear modulation (FiLM) or as additional input channels. The result is a single network that generalises across operator families, adapting its deconvolution strategy on-the-fly.

This is the architecture principle behind conditional reconstruction networks for RF imaging — a research direction currently explored by the CommIT group.

Physics-Informed Post-Processing with PSF Conditioning

Compare reconstruction quality as physics-derived channels are added to the network input. "MF only" uses just $\hat{\mathbf{c}}^{\text{BP}}$ . "MF + PSF" adds the PSF diagonal $\operatorname{diag}(\mathbf{G})$ as a second channel. "MF + PSF + residual" further adds the residual gradient $\mathbf{A}^{H}(\mathbf{y} - \mathbf{A}\hat{\mathbf{c}}^{\text{BP}})$ .

For the physical sensing matrix, each additional channel measurably reduces the reconstruction error — especially near bright point targets where sidelobe artefacts are strongest.

Parameters

Input Channels

SNR (dB)20

Sensing Matrix

Common Mistake: Computing the Full Gram Matrix Instead of Just What Is Needed

Mistake:

Naively computing the full $N \times N$ Gram matrix $\mathbf{G} = \mathbf{A}^{H}\mathbf{A}$ at inference time to provide the PSF as a physics channel to the network.

Correction:

For realistic scene sizes ( $N = 64 \times 64 = 4096$ voxels), $\mathbf{G}$ has $N^2 \approx 16.8$ million complex entries, which is prohibitively expensive to form and store.

In practice, only the PSF of a central point reflector is needed: this is the column of $\mathbf{G}$ corresponding to the scene centre, computed as $\mathbf{A}^{H}\mathbf{A}\mathbf{e}_{N/2}$ — a single matrix-vector product. This single column captures the dominant sidelobe structure for shift-invariant (or slowly varying) PSFs.

For the PSF diagonal: $\operatorname{diag}(\mathbf{G}) = \|\mathbf{A}\|_{\text{col}}^2$ (column-wise squared norms of $\mathbf{A}$ ) — computed in one pass in $\mathcal{O}(MN)$ time.

Quick Check

Why does providing the PSF diagonal $\operatorname{diag}(\mathbf{G})$ as an additional input channel help a post-processing network?

It allows the network to compute the full inverse of $\mathbf{G}$

It tells the network the per-pixel SNR, enabling geometry-aware weighting

It forces the network to produce measurement-consistent output

It replaces the need for a data-consistency layer