Ferkans — Interactive Telecom Tutor

Why Embed Physics in Neural Networks?

The networks we have seen so far (U-Nets, unfolded algorithms, diffusion models) learn from data alone. When data is scarce or the problem has known physical structure, we can do better by embedding the governing equations directly into the network architecture or loss function. This section covers three approaches:

PINNs: Enforce PDEs (Helmholtz, Maxwell) as soft constraints in the loss.
Equivariant networks: Build symmetries (rotation, translation) into the architecture.
Fourier Neural Operators: Learn resolution-independent PDE solution operators in the Fourier domain.

,

Definition:
Physics-Informed Neural Network (PINN)

A Physics-Informed Neural Network trains a neural network $u_\theta(\mathbf{r}, t)$ to approximate the solution of a PDE by incorporating the PDE residual into the loss:

$\mathcal{L}(\theta) = \underbrace{\frac{1}{N_d}\sum_{i=1}^{N_d} |u_\theta(\mathbf{r}_i, t_i) - u_i^{\text{data}}|^2}_{\text{data loss}} + \underbrace{\frac{\lambda_{\text{PDE}}}{N_c}\sum_{j=1}^{N_c} |\mathcal{F}[u_\theta](\mathbf{r}_j, t_j)|^2}_{\text{PDE residual loss}},$

where $\mathcal{F}[u] = 0$ is the PDE (e.g., the Helmholtz equation), $\{(\mathbf{r}_i, t_i, u_i^{\text{data}})\}$ are observed data points, and $\{(\mathbf{r}_j, t_j)\}$ are collocation points where the PDE is enforced.

The PDE residual is computed using automatic differentiation: spatial and temporal derivatives of $u_\theta$ are exact (not finite-difference approximations).

PINNs are mesh-free: no discretisation grid is needed. The collocation points can be sampled adaptively, concentrating in regions where the residual is large.

Definition:
PINN for the Helmholtz Equation

For time-harmonic RF wave propagation, the governing equation is the Helmholtz equation:

$\nabla^2 u(\mathbf{r}) + \kappa^{2}\,\epsilon_r(\mathbf{r})\,u(\mathbf{r}) = -s(\mathbf{r}),$

where $u$ is the total field, $\kappa = 2\pi/\lambda$ is the wavenumber, $\epsilon_r$ is the relative permittivity, and $s$ is the source term.

A PINN for the Helmholtz equation parameterises $u_\theta(\mathbf{r})$ (complex-valued) and minimises:

$\mathcal{L} = \frac{1}{N_d}\sum_i |u_\theta(\mathbf{r}_i) - u_i^{\text{meas}}|^2 + \frac{\lambda}{N_c}\sum_j |\nabla^2 u_\theta(\mathbf{r}_j) + \kappa^{2} \epsilon_r(\mathbf{r}_j)\,u_\theta(\mathbf{r}_j) + s(\mathbf{r}_j)|^2.$

For the inverse problem (unknown $\epsilon_r$ ), both $u_\theta$ and $\epsilon_r$ (or a neural parameterisation thereof) are jointly optimised.

Example: PINN for 2D Inverse Scattering

Set up a PINN to recover the permittivity distribution $\epsilon_r(\mathbf{r})$ of an unknown object from scattered field measurements at $N_d = 50$ receiver locations, using $N_{\text{inc}} = 8$ incident plane waves at frequency $f = 3$ GHz.

Solution

Network architecture

Two networks:

Field network: $u_\theta^{(k)}(\mathbf{r}) \in \mathbb{C}$ for each incident angle $k$ , with Fourier feature encoding ( $\sigma = 10$ for $\lambda = 10$ cm).
Permittivity network: $\epsilon_r^\phi(\mathbf{r}) = 1 + \text{softplus}(g_\phi(\mathbf{r}))$ , shared across all incident angles.

Loss function

$\mathcal{L} = \sum_{k=1}^{8}\left[\frac{1}{50}\sum_i |u_\theta^{(k)}(\mathbf{r}_i) - u_i^{(k),\text{meas}}|^2 + \frac{\lambda}{N_c}\sum_j |\nabla^2 u_\theta^{(k)} + \kappa^{2} \epsilon_r^\phi\, u_\theta^{(k)} + s^{(k)}|^2\right].$ $The permittivity network$ \epsilon_r^\phi $appears in all$ 8$ PDE residuals, providing strong multi-view constraints.

Training

Adam optimiser with learning rate $10^{-3}$ , $\sim 50$ k iterations. Curriculum: start with low $\lambda$ (data-driven), gradually increase to enforce the PDE constraint. The Fourier feature encoding is critical to resolve the wavelength-scale oscillations of $u$ .

Common Mistake: PINN Spectral Bias

Mistake:

Using a standard MLP (without positional encoding or Fourier features) for a PINN solving a high-frequency Helmholtz equation, and finding the network learns only the low-frequency components.

Correction:

Standard MLPs suffer from spectral bias: they learn low-frequency components much faster than high-frequency ones. For the Helmholtz equation at frequency $f$ , the solution oscillates at spatial scale $\lambda = c/f$ ; if $\lambda$ is small relative to the domain, the MLP cannot represent the oscillations.

Solutions:

Fourier feature encoding: $\gamma(\mathbf{r}) = [\sin(\mathbf{B}\mathbf{r}), \cos(\mathbf{B}\mathbf{r})]$ with $\mathbf{B}$ sampled from $\mathcal{N}(0, \sigma^2)$ , where $\sigma \propto 1/\lambda$ .
Multi-scale architecture: Separate networks for different frequency bands.
Curriculum training: Start with low frequency and gradually increase.

Definition:
Equivariance and Equivariant Networks

A function $f \colon \mathcal{X} \to \mathcal{Y}$ is equivariant to a group $G$ of transformations if applying $T \in G$ to the input produces a corresponding transformation $\rho_T$ of the output:

$f(T \cdot \mathbf{x}) = \rho_T \cdot f(\mathbf{x}) \quad \text{for all } T \in G.$

Special case (invariance): If $\rho_T = \text{id}$ for all $T$ .

Equivariant neural networks build symmetries into the architecture:

Symmetry Group	Transformation	Architecture
$\mathbb{Z}^2$	Spatial shifts	Standard CNN
$SO(2)$	2D rotation	Steerable CNN
$SO(3)$	3D rotation	Spherical CNN
$E(3)$	Euclidean (rotation + translation)	EGNN, PaiNN

The key idea: replace the standard convolution kernel (defined on $\mathbb{R}^2$ ) with a kernel on the group $G$ , ensuring that the output transforms predictably under group actions.

For RF imaging, the relevant symmetries include rotation equivariance for SAR (the scene reflectivity should not depend on imaging geometry orientation), and translation equivariance (standard CNNs).

Definition:
Fourier Neural Operator (FNO)

The Fourier Neural Operator learns a mapping between function spaces $\mathcal{G}_\theta \colon a(\mathbf{r}) \mapsto u(\mathbf{r})$ by parameterising the integral kernel in the Fourier domain:

$(\mathcal{K}_\theta v)(\mathbf{r}) = \mathcal{F}^{-1}\!\left[\mathbf{R}_\theta \cdot \mathcal{F}[v]\right](\mathbf{r}),$

where $\mathcal{F}$ is the Fourier transform and $\mathbf{R}_\theta \in \mathbb{C}^{d_v \times d_v \times k_{\max}}$ is a learnable weight tensor applied to the first $k_{\max}$ Fourier modes. The full FNO layer is:

$v^{(\ell+1)}(\mathbf{r}) = \sigma\!\left(\mathbf{W}^{(\ell)} v^{(\ell)}(\mathbf{r}) + (\mathcal{K}_\theta^{(\ell)} v^{(\ell)})(\mathbf{r})\right).$

Cost: $O(N \log N)$ per layer (FFT). Resolution independence: Fourier modes are truncated at $k_{\max}$ , independent of discretisation $N$ .

FNO is the neural-operator analog of a spectral method in numerical PDEs. For the Helmholtz equation, where convolution with the Green's function is naturally expressed in the Fourier domain, FNO is a particularly natural architecture.

Theorem: Universal Approximation for Neural Operators

Let $\mathcal{G}^\dagger \colon \mathcal{A} \to \mathcal{U}$ be a continuous operator between Banach spaces of functions on a bounded domain $\Omega$ . For any $\epsilon > 0$ , there exists a neural operator $\mathcal{G}_\theta$ with finitely many parameters such that:

$\sup_{a \in K} \|\mathcal{G}_\theta(a) - \mathcal{G}^\dagger(a)\|_{\mathcal{U}} < \epsilon,$

for any compact set $K \subset \mathcal{A}$ .

Specifically, for the FNO with $L$ layers and $k_{\max}$ Fourier modes, the approximation error scales as $O(k_{\max}^{-s})$ for operators $\mathcal{G}^\dagger$ with Sobolev regularity $s$ .

Just as standard neural networks are universal approximators for functions (vectors $\to$ vectors), neural operators are universal approximators for operators (functions $\to$ functions). The FNO's Fourier truncation provides spectral convergence for smooth operators.

Proof

Finite-dimensional projection

Project both input and output functions onto the first $k_{\max}$ Fourier modes: $a \mapsto \hat{a}_k$ for $|k| \leq k_{\max}$ . This reduces the problem to a finite-dimensional mapping $\mathbb{C}^{2k_{\max}+1} \to \mathbb{C}^{2k_{\max}+1}$ .

Universal approximation in finite dimensions

By the standard universal approximation theorem for neural networks, the finite-dimensional mapping can be approximated to arbitrary accuracy by a sufficiently wide/deep network.

Truncation error

The projection error is bounded by the tail of the Fourier series: $\|a - a_{k_{\max}}\| \leq C\,k_{\max}^{-s}$ for $a \in H^s(\Omega)$ . Combined with the Lipschitz continuity of $\mathcal{G}^\dagger$ , the total error is $O(k_{\max}^{-s}) + \epsilon_{\text{NN}}$ . $\blacksquare$

FNO vs. CNN Convergence on Helmholtz Inverse Problem

Compare the test error of FNO and a standard U-Net CNN on the Helmholtz inverse problem (permittivity $\to$ scattered field) as a function of training set size and Fourier modes. The FNO converges faster and generalises better, especially at higher frequencies where the solution has more spatial oscillations. The FNO's resolution independence means it can be trained on a $64 \times 64$ grid and evaluated on $128 \times 128$ without retraining.

Parameters

FNO modes (

k_{\max}

)12

Training samples500

Frequency (GHz)3

Example: FNO as a Differentiable Forward Model for RF Imaging

Describe how to use an FNO to replace the full-wave solver in an iterative reconstruction algorithm for RF imaging.

Solution

Training data generation

Generate $N_{\text{train}} = 1000$ random permittivity distributions (random shapes with varying $\epsilon_r$ ) and compute the scattered fields using MoM or FDTD. Input: $\epsilon_r$ on a $128 \times 128$ grid. Output: $E^s$ (complex-valued) on the same grid.

FNO architecture

Lifting layer: 1-channel input ( $\epsilon_r$ ) $\to$ $d_v = 32$ channels.
4 Fourier layers with $k_{\max} = 16$ modes per dimension.
Projection layer: $d_v$ channels $\to$ 2 channels (real/imaginary parts of $E^s$ ).
Total: $\sim 2$ M parameters.

Iterative reconstruction with FNO forward model

Replace the physics-based forward operator $\mathbf{A}$ in an iterative algorithm (e.g., gradient descent, ADMM) with the trained FNO $\mathcal{G}_\theta$ :

$\hat{\epsilon}_r^{(k+1)} = \hat{\epsilon}_r^{(k)} - \eta\,\nabla_{\epsilon_r}\|\mathcal{G}_\theta(\hat{\epsilon}_r^{(k)}) - \mathbf{y}^{\text{meas}}\|^2.$

The gradient is computed by backpropagating through the FNO ( $< 1$ ms, vs. $\sim 10$ s for the full-wave adjoint).

Why This Matters: Physics-Informed Methods for RF Imaging

Physics-informed approaches address key challenges in RF imaging:

PINNs for inverse scattering: Recover $\epsilon_r(\mathbf{p})$ from sparse scattering data by enforcing the Helmholtz equation as a soft constraint — particularly valuable when the measurement geometry is irregular and standard discretisation is awkward.
Equivariant networks for SAR: Steerable CNNs achieve rotation-equivariant SAR target recognition without data augmentation — the network generalises to unseen azimuth angles by construction.
FNO as fast forward model: Replace the $O(N^2)$ matrix-vector product $\mathbf{A}\mathbf{c}$ with an $O(N \log N)$ FNO evaluation, enabling real-time iterative reconstruction.
FNO for resolution transfer: Train on coarse grids (fast), evaluate on fine grids (detailed) — exploiting the FNO's resolution independence.

,

Quick Check

What is the key advantage of an FNO over a PINN for solving multiple instances of the Helmholtz equation with different permittivity distributions?

FNO is more accurate for a single instance

FNO learns the solution operator, amortising the cost across instances

FNO does not require training data

FNO has fewer parameters

Correction:

FNO learns the solution operator, amortising the cost across instances

A PINN must be retrained for each new permittivity distribution (each new PDE instance). The FNO learns the mapping $\epsilon_r \mapsto E^s$ as an operator, so once trained, it solves new instances in a single forward pass ( $< 1$ ms).

Common Mistake: FNO Out-of-Distribution Generalisation

Mistake:

Training an FNO on simple permittivity distributions (circles, rectangles) and expecting it to generalise to complex real-world scenes (buildings, furniture) without testing.

Correction:

Neural operators, like all learned models, generalise only within the distribution of the training data. An FNO trained on simple geometries may fail dramatically on complex scenes with sharp corners, thin structures, or high-contrast interfaces.

Remedies: (i) Train on a diverse dataset that covers the expected scene complexity. (ii) Use physics-based fine-tuning: initialise with the FNO prediction, then refine with a few iterations of the physics-based solver. (iii) Hybrid approaches: FNO for the smooth part, physics solver for the high-contrast details.

Historical Note: From Galerkin to PINNs

1998--2021

The idea of using neural networks to solve PDEs dates to Lagaris et al. (1998), who used small feedforward networks with boundary condition constraints. The approach remained niche until Raissi, Perdikaris, and Karpathy (2019) demonstrated that modern deep networks with automatic differentiation could solve complex forward and inverse PDE problems — coining the term "Physics-Informed Neural Networks." The Fourier Neural Operator (Li et al., 2021) shifted the paradigm from solving individual PDE instances to learning the solution operator, enabling real-time PDE solving with spectral accuracy.

,

PINN Training for Helmholtz Inverse Scattering

Complexity:

O(N_{\text{epochs}} \cdot (N_d + N_c) \cdot P)

where

P

is the network parameter count. The second-order autodiff for the Laplacian doubles the backpropagation cost.

Input: Scattered field data {(r_i, E^s_i)}, incident fields {E^i_k},

collocation points {r_j}, wavenumber kappa

Output: Permittivity map epsilon_r(r)

1. Initialise field network u_theta and permittivity network eps_phi

2. for epoch = 1, ..., N_epochs do

3. Sample mini-batch of data points and collocation points

4. for each incident wave k do

5. Compute u_theta^(k)(r_j) at collocation points

6. Compute Laplacian via autodiff: nabla^2 u_theta^(k)(r_j)

7. PDE residual: R_j = nabla^2 u + kappa^2 * eps_phi(r_j) * u + s_k(r_j)

8. end for

9. L_data = (1/N_d) sum_i |u_theta(r_i) - E^s_i|^2

10. L_pde = (lambda/N_c) sum_j |R_j|^2

11. L = L_data + L_pde

12. Update theta, phi via Adam(grad L)

13. end for

14. return eps_phi(r) for r in domain

Curriculum scheduling of $\lambda$ (low initially, increasing over epochs) helps avoid getting trapped in the data-only minimum.

PINN (Physics-Informed Neural Network)

A neural network trained with a loss function that includes the residual of a governing PDE, evaluated at collocation points via automatic differentiation. Enables mesh-free PDE solving and inverse problems.

FNO (Fourier Neural Operator)

A neural operator architecture that parameterises integral kernels in the Fourier domain, achieving $O(N \log N)$ cost and resolution-independent PDE solving.

Related: Fourier Neural Operator (FNO)

Key Takeaway

PINNs embed PDE constraints (Helmholtz, Maxwell) directly into the neural network loss, enabling mesh-free inverse scattering but suffering from spectral bias at high frequencies. Equivariant networks build physical symmetries into the architecture, improving data efficiency for SAR and 3D imaging. The Fourier Neural Operator learns the PDE solution operator in the Fourier domain, enabling real-time, resolution-independent forward modelling that can replace expensive full-wave solvers in iterative reconstruction algorithms.

Physics-Informed and Equivariant Networks

Why Embed Physics in Neural Networks?

Definition: Physics-Informed Neural Network (PINN)

Definition: PINN for the Helmholtz Equation

Example: PINN for 2D Inverse Scattering

Network architecture

Loss function

Training

Common Mistake: PINN Spectral Bias

Definition: Equivariance and Equivariant Networks

Definition: Fourier Neural Operator (FNO)

Theorem: Universal Approximation for Neural Operators

Finite-dimensional projection

Universal approximation in finite dimensions

Truncation error

FNO vs. CNN Convergence on Helmholtz Inverse Problem

Parameters

Example: FNO as a Differentiable Forward Model for RF Imaging

Training data generation

FNO architecture

Iterative reconstruction with FNO forward model

Why This Matters: Physics-Informed Methods for RF Imaging

Quick Check

Common Mistake: FNO Out-of-Distribution Generalisation

Historical Note: From Galerkin to PINNs

PINN Training for Helmholtz Inverse Scattering

PINN (Physics-Informed Neural Network)

FNO (Fourier Neural Operator)

Key Takeaway

Definition:
Physics-Informed Neural Network (PINN)

Definition:
PINN for the Helmholtz Equation

Definition:
Equivariance and Equivariant Networks

Definition:
Fourier Neural Operator (FNO)