Ferkans — Interactive Telecom Tutor

ex-ch28-01

Easy

The fundamental matrix $\mathbf{F} \in \mathbb{R}^{3 \times 3}$ has rank 2. How many independent constraints does a single point correspondence $(\mathbf{x}_1, \mathbf{x}_2)$ provide on $\mathbf{F}$ ? How many correspondences are needed for the 8-point algorithm?

Show Hint

The epipolar constraint $\tilde{\mathbf{x}}_2^\mathsf{T}\mathbf{F}\,\tilde{\mathbf{x}}_1 = 0$ is linear in the entries of $\mathbf{F}$ .

$\mathbf{F}$ has $3 \times 3 = 9$ entries but is defined up to scale.

Solution

Constraint count

Each correspondence gives one scalar equation: $\tilde{\mathbf{x}}_2^\mathsf{T}\mathbf{F}\,\tilde{\mathbf{x}}_1 = 0$ . This is linear in the 9 entries of $\mathbf{F}$ .

Degrees of freedom

$\mathbf{F}$ has 9 entries, but it is defined up to scale (only the direction in the 9D space matters), giving 8 DOF. The rank-2 constraint ( $\det(\mathbf{F}) = 0$ ) reduces this to 7. The 8-point algorithm ignores the rank constraint and uses 8 correspondences to solve for $\mathbf{F}$ linearly (up to scale), then enforces rank 2 by SVD truncation.

ex-ch28-02

Easy

Given the essential matrix $\mathbf{E} = [\mathbf{t}]_\times \mathbf{R}$ , show that $\mathbf{E}^\mathsf{T}\mathbf{t} = \mathbf{0}$ (the epipole is in the left null space of $\mathbf{E}$ ).

Show Hint

$[\mathbf{t}]_\times \mathbf{t} = \mathbf{t} \times \mathbf{t} = \mathbf{0}$ .

Solution

Direct computation

$\mathbf{E}^\mathsf{T}\mathbf{t} = \mathbf{R}^\mathsf{T}[\mathbf{t}]_\times^\mathsf{T}\mathbf{t} = -\mathbf{R}^\mathsf{T}[\mathbf{t}]_\times\mathbf{t} = -\mathbf{R}^\mathsf{T}(\mathbf{t} \times \mathbf{t}) = \mathbf{0}$ .

The epipole $\mathbf{e}_2 = \mathbf{t}$ (the projection of camera 1's centre onto image 2) lies in the left null space of $\mathbf{E}$ . $\blacksquare$

ex-ch28-03

Medium

In bundle adjustment, the Jacobian has a block-sparse structure. For a problem with $n_c$ cameras and $n_p$ 3D points, each with $m$ observations per camera, derive the sizes of the Hessian blocks in the Schur complement formulation and explain why eliminating points first is efficient.

Show Hint

The Hessian is $\mathbf{H} = \mathbf{J}^\mathsf{T}\mathbf{J}$ . Partition $\mathbf{H}$ into camera-camera, point-point, and camera-point blocks.

The point-point block is block-diagonal (each point's observations are independent of other points).

Solution

Hessian structure

Let camera parameters be $\mathbf{c} \in \mathbb{R}^{6n_c}$ and point parameters $\mathbf{p} \in \mathbb{R}^{3n_p}$ . The Hessian partitions as:

$\mathbf{H} = \begin{bmatrix} \mathbf{U} & \mathbf{W} \\ \mathbf{W}^\mathsf{T} & \mathbf{V} \end{bmatrix},$

where $\mathbf{U} \in \mathbb{R}^{6n_c \times 6n_c}$ (camera-camera), $\mathbf{V} \in \mathbb{R}^{3n_p \times 3n_p}$ (point-point, block-diagonal with $3 \times 3$ blocks), and $\mathbf{W} \in \mathbb{R}^{6n_c \times 3n_p}$ (camera-point).

Schur complement

Eliminating points: $(\mathbf{U} - \mathbf{W}\mathbf{V}^{-1}\mathbf{W}^\mathsf{T})\,\delta\mathbf{c} = \mathbf{b}'$ . Since $\mathbf{V}$ is block-diagonal, $\mathbf{V}^{-1}$ costs $O(n_p)$ (invert each $3 \times 3$ block). The reduced system is $6n_c \times 6n_c$ , which for $n_c = 100$ cameras is $600 \times 600$ — easily solvable. Eliminating cameras first would give a $3n_p \times 3n_p$ dense system, far larger.

ex-ch28-04

Easy

State the rendering equation for a Lambertian surface (constant BRDF $f_r = \rho/\pi$ ) under a single directional light source from direction $\boldsymbol{\omega}_L$ with irradiance $E_L$ . Simplify the integral.

Show Hint

A directional light source is a delta function in the incoming radiance.

Solution

Substitution

For a directional light: $L_i(\mathbf{p}, \boldsymbol{\omega}_i) = E_L\,\delta(\boldsymbol{\omega}_i - \boldsymbol{\omega}_L)$ . Substituting into the rendering equation:

$L_o(\mathbf{p}, \boldsymbol{\omega}_o) = \frac{\rho}{\pi}\,E_L\,\max(\boldsymbol{\omega}_L \cdot \mathbf{n}, 0).$

This is the classic Lambert's cosine law: the reflected radiance is proportional to the cosine of the incidence angle, independent of the viewing direction $\boldsymbol{\omega}_o$ .

ex-ch28-05

Medium

Derive the discrete volume rendering formula from the continuous integral $\hat{C} = \int_{t_n}^{t_f} T(t)\,\sigma(t)\,\mathbf{c}(t)\,dt$ by assuming constant density $\sigma_i$ and colour $\mathbf{c}_i$ within each interval $[t_i, t_{i+1})$ of length $\delta_i$ .

Show Hint

Within interval $i$ : $T(t) = T_i\,\exp(-\sigma_i(t - t_i))$ for $t \in [t_i, t_{i+1})$ .

Solution

Per-interval integral

$\int_{t_i}^{t_{i+1}} T(t)\,\sigma_i\,\mathbf{c}_i\,dt = T_i\,\mathbf{c}_i\,\sigma_i \int_0^{\delta_i} e^{-\sigma_i\tau}\,d\tau = T_i\,\mathbf{c}_i\,(1 - e^{-\sigma_i\delta_i}).$ $

Transmittance recursion

$T_{i+1} = T_i\,e^{-\sigma_i\delta_i}$ , with $T_1 = 1$ . Therefore $T_i = \exp(-\sum_{j=1}^{i-1}\sigma_j\delta_j)$ .

Full sum

$\hat{C} = \sum_{i=1}^{N} T_i\,(1 - e^{-\sigma_i\delta_i})\,\mathbf{c}_i,$ $which is the standard NeRF discrete rendering formula. Defining$ \alpha_i = 1 - e^{-\sigma_i\delta_i} $, this becomes the familiar alpha-compositing:$ \hat{C} = \sum_i T_i,\alpha_i,\mathbf{c}_i $.$ \blacksquare$

ex-ch28-06

Medium

Under the Born approximation, the RF forward model for a single Tx-Rx pair at frequency $f_k$ is:

$y = \sum_{q=1}^{Q} a_q\,c_q + w, \quad a_q = \kappa_{k}^{2}\,G(\mathbf{r}, \mathbf{p}_{q})\,E^i(\mathbf{p}_{q})\,\Delta V,$

where $c_q = \chi(\mathbf{p}_{q})$ . Show that the full measurement vector $\mathbf{y} = \mathbf{A}\mathbf{c} + \mathbf{w}$ is linear in $\mathbf{c}$ , and identify the sensing matrix $\mathbf{A}$ .

Show Hint

Stack measurements from multiple Tx-Rx pairs and frequencies into a vector.

Solution

Vectorisation

For Tx $i$ , Rx $j$ , frequency $f_k$ : $y_{i,j,k} = \sum_{q=1}^{Q} [\mathbf{A}]_{(i,j,k),q}\,c_q + w_{i,j,k}$ , where $[\mathbf{A}]_{(i,j,k),q} = \kappa_{k}^{2}\,G(\mathbf{r}_{j}, \mathbf{p}_{q})\,G(\mathbf{p}_{q}, \mathbf{s}_{i})\,\Delta V$ .

Matrix form

Stacking all measurements: $\mathbf{y} = \mathbf{A}\mathbf{c} + \mathbf{w}$ , where $\mathbf{A} \in \mathbb{C}^{M \times Q}$ with $M = N_t \times N_r \times N_f$ rows and $Q$ columns. This is a linear system in $\mathbf{c}$ . $\blacksquare$

ex-ch28-07

Hard

Derive the adjoint method gradient for a scalar loss $\mathcal{L}(\mathbf{I})$ where $\mathbf{I}$ solves the MoM system $\mathbf{Z}(\theta)\,\mathbf{I} = \mathbf{V}$ . Show that only one additional linear solve is needed, regardless of the dimension of $\theta$ .

Show Hint

Use the chain rule and implicit differentiation of the linear system.

Define $\boldsymbol{\lambda}$ as the solution of the adjoint system.

Solution

Chain rule

$\frac{\partial \mathcal{L}}{\partial \theta_m} = \frac{\partial \mathcal{L}}{\partial \mathbf{I}}\frac{\partial \mathbf{I}}{\partial \theta_m}$ for each component $\theta_m$ of $\theta$ .

Implicit differentiation

From $\mathbf{Z}\mathbf{I} = \mathbf{V}$ : $\mathbf{Z}\frac{\partial \mathbf{I}}{\partial \theta_m} = -\frac{\partial \mathbf{Z}}{\partial \theta_m}\mathbf{I}$ , so $\frac{\partial \mathbf{I}}{\partial \theta_m} = -\mathbf{Z}^{-1}\frac{\partial \mathbf{Z}}{\partial \theta_m}\mathbf{I}$ .

Adjoint substitution

Substituting: $\frac{\partial \mathcal{L}}{\partial \theta_m} = -\frac{\partial \mathcal{L}}{\partial \mathbf{I}}\mathbf{Z}^{-1}\frac{\partial \mathbf{Z}}{\partial \theta_m}\mathbf{I} = -\boldsymbol{\lambda}^\mathsf{T}\frac{\partial \mathbf{Z}}{\partial \theta_m}\mathbf{I}$ ,

where $\boldsymbol{\lambda}$ solves the single adjoint system $\mathbf{Z}^\mathsf{T}\boldsymbol{\lambda} = (\partial \mathcal{L}/\partial \mathbf{I})^\mathsf{T}$ . This solve is independent of $m$ , so all $|\theta|$ gradient components cost only one adjoint solve. $\blacksquare$

ex-ch28-08

Easy

List three key differences between optical and RF rendering that affect the choice of forward model. For each, state which approximation is valid in the optical regime but breaks down in the RF regime.

Show Hint

Think about wavelength, coherence, and interaction model.

Solution

Three differences

Wavelength vs. feature size: Optical $\lambda \sim 500$ nm $\ll$ scene features; RF $\lambda \sim 1$ mm to 10 cm $\sim$ features. Ray optics valid in optical, not in RF.
Coherence: Optical rendering sums intensities ( $|E_1|^2 + |E_2|^2$ ); RF must sum complex fields ( $|E_1 + E_2|^2$ ), producing interference.
BRDF vs. scattering cross-section: The BRDF assumes smooth surfaces at the wavelength scale; at RF wavelengths, surfaces have sub-wavelength structure requiring wave-theoretic scattering models.

ex-ch28-09

Easy

A 77 GHz automotive radar has angular resolution $\Delta\phi = 5^\circ$ . At a range of 50 m, what is the cross-range resolution? Compare with a camera pixel subtending $0.05^\circ$ .

Show Hint

Cross-range resolution $\approx R\,\Delta\phi$ (in radians).

Solution

Radar cross-range

$\Delta x_{\text{radar}} = R\,\Delta\phi = 50 \times 5\pi/180 \approx 4.36$ m.

Camera cross-range

$\Delta x_{\text{camera}} = R \times 0.05\pi/180 \approx 0.044$ m $= 4.4$ cm.

Comparison

The camera has $\sim 100\times$ better angular resolution than the radar. This demonstrates the complementarity: radar provides range and velocity, camera provides angular detail.

ex-ch28-10

Medium

Prove that for two conditionally independent sensor measurements $Y_1$ and $Y_2$ given scene parameter $\Theta$ , the Fisher information matrix is additive: $\mathbf{J}(\Theta; Y_1, Y_2) = \mathbf{J}(\Theta; Y_1) + \mathbf{J}(\Theta; Y_2)$ .

Show Hint

Use $f(y_1, y_2 | \theta) = f(y_1 | \theta)\,f(y_2 | \theta)$ .

Take the log and compute the second derivative.

Solution

Log-likelihood decomposition

$\log f(y_1, y_2 | \theta) = \log f(y_1 | \theta) + \log f(y_2 | \theta)$ .

FIM as negative Hessian

$\mathbf{J}(\Theta; Y_1, Y_2) = -\mathbb{E}\left[\nabla^2_\theta \log f(Y_1, Y_2 | \theta)\right] = -\mathbb{E}\left[\nabla^2_\theta \log f(Y_1 | \theta)\right] - \mathbb{E}\left[\nabla^2_\theta \log f(Y_2 | \theta)\right]$ .

Conclusion

$= \mathbf{J}(\Theta; Y_1) + \mathbf{J}(\Theta; Y_2)$ .

The independence ensures no cross-terms appear. $\blacksquare$

ex-ch28-11

Medium

In BEV fusion, the camera-to-BEV transformation requires estimating per-pixel depth. If the depth estimate has an error $\Delta d$ , how does this translate to a positional error in the BEV plane for a pixel at image coordinates $(u, v)$ with focal length $f$ ?

Show Hint

Back-project: $\mathbf{P} = d\,\mathbf{K}^{-1}\tilde{\mathbf{x}}$ .

Solution

Back-projection

The 3D point is $\mathbf{P} = d\,\mathbf{K}^{-1}[u, v, 1]^\mathsf{T}$ . The BEV coordinates are $(X, Y)$ from $\mathbf{P}$ .

Error propagation

$\Delta X = \Delta d \cdot (u - c_x)/f$ , $\Delta Y = \Delta d$ . The along-range error equals $\Delta d$ directly; the cross-range error scales with the pixel's off-axis angle.

Example

For $d = 30$ m, $\Delta d = 1$ m, $u - c_x = 400$ pixels, $f = 800$ pixels: $\Delta X = 1 \times 400/800 = 0.5$ m, $\Delta Y = 1$ m. This motivates fusing with radar, which provides accurate range directly.

ex-ch28-12

Hard

For a PINN solving the 2D Helmholtz equation $\nabla^2 u + \kappa^{2} u = -s$ on $[0,1]^2$ , with $u$ parameterised by a 4-layer MLP with tanh activations, derive the form of the PDE residual loss and explain how the second-order spatial derivatives $\partial^2 u/\partial x^2$ and $\partial^2 u/\partial y^2$ are computed via automatic differentiation.

Show Hint

Two nested calls to autograd.

The cost is $O(P)$ per collocation point per derivative.

Solution

PDE residual

At collocation point $\mathbf{r}_j = (x_j, y_j)$ :

$R_j = \frac{\partial^2 u_\theta}{\partial x^2}\bigg|_{\mathbf{r}_j} + \frac{\partial^2 u_\theta}{\partial y^2}\bigg|_{\mathbf{r}_j} + \kappa^{2}\,u_\theta(\mathbf{r}_j) + s(\mathbf{r}_j).$

Automatic differentiation

First call: grad_x = autograd(u, x) gives $\partial u/\partial x$ .
Second call: grad_xx = autograd(grad_x, x) gives $\partial^2 u/\partial x^2$ .
Similarly for $\partial^2 u/\partial y^2$ .
Each call costs $O(P)$ (one backward pass per scalar output).
Total cost: $4 \times O(P)$ per collocation point.

ex-ch28-13

Hard

The Fourier Neural Operator applies a learnable filter $\mathbf{R}_\theta \in \mathbb{C}^{d_v \times d_v}$ to each of the first $k_{\max}$ Fourier modes. For an input on an $N \times N$ grid with $d_v$ channels, compute the total FLOPs per FNO layer and compare with a standard $3 \times 3$ convolution layer.

Show Hint

FFT costs $O(N^2 \log N)$ per channel. The filter multiplication costs $O(k_{\max}^2 \cdot d_v^2)$ .

Solution

FNO layer cost

FFT: $d_v$ channels $\times$ $O(N^2 \log N)$ = $O(d_v N^2 \log N)$ .
Filter: $k_{\max}^2$ modes $\times$ $d_v \times d_v$ matrix multiply = $O(k_{\max}^2 d_v^2)$ .
IFFT: $O(d_v N^2 \log N)$ .
Local linear: $O(d_v^2 N^2)$ .
Total: $O(d_v N^2 \log N + k_{\max}^2 d_v^2 + d_v^2 N^2)$ .

Comparison with CNN

A $3 \times 3$ convolution: $O(9\,d_v^2\,N^2)$ . For $k_{\max} = 16$ , $d_v = 32$ , $N = 128$ : FNO $\approx 2.4$ M FLOPs (FFT + filter), CNN $\approx 150$ M FLOPs. The FNO is more efficient because it captures long-range interactions via the Fourier domain without large kernel sizes.

ex-ch28-14

Hard

A steerable CNN uses kernels expressed in the circular harmonic basis $K_m(\mathbf{r}) = R_m(r)\,e^{jm\phi}$ . Show that under a rotation by angle $\alpha$ , the output feature map of order $m$ is multiplied by $e^{jm\alpha}$ , demonstrating exact rotation equivariance.

Show Hint

Under rotation: $K_m(R_\alpha\mathbf{r}) = e^{jm\alpha}\,K_m(\mathbf{r})$ .

The convolution integral inherits the phase shift.

Solution

Rotated kernel

$K_m(R_\alpha\mathbf{r}) = R_m(r)\,e^{jm(\phi+\alpha)} = e^{jm\alpha}\,K_m(\mathbf{r})$ .

Convolution under rotation

Let $f_\alpha(\mathbf{r}) = f(R_{-\alpha}\mathbf{r})$ be the rotated input. Then:

$(K_m * f_\alpha)(\mathbf{r}) = \int K_m(\mathbf{r} - \mathbf{r}')\,f(R_{-\alpha}\mathbf{r}')\,d\mathbf{r}'.$

Substituting $\mathbf{r}'' = R_{-\alpha}\mathbf{r}'$ :

$= \int K_m(\mathbf{r} - R_\alpha\mathbf{r}'')\,f(\mathbf{r}'')\,d\mathbf{r}'' = e^{jm\alpha}\int K_m(R_{-\alpha}\mathbf{r} - \mathbf{r}'')\,f(\mathbf{r}'')\,d\mathbf{r}''.$

Equivariance

$= e^{jm\alpha}\,(K_m * f)(R_{-\alpha}\mathbf{r}).$ $The output is the original convolution evaluated at the rotated coordinate, times$ e^{jm\alpha} $. This is exact$ SO(2) $equivariance: rotation of the input rotates the output and multiplies the$ m $-th harmonic coefficient by$ e^{jm\alpha} $.$ \blacksquare$

ex-ch28-15

Challenge

Consider an FNO trained to map permittivity $\epsilon_r$ to scattered field $E^s$ on a $64 \times 64$ grid. The FNO uses $k_{\max} = 12$ Fourier modes. Explain how the trained FNO can be evaluated on a $128 \times 128$ grid without retraining, and analyse the conditions under which this resolution transfer is accurate.

Show Hint

The FNO weights are defined in the Fourier domain, independent of spatial resolution.

Zero-pad the higher modes in the $128 \times 128$ FFT.

Solution

Resolution transfer mechanism

The FNO filter $\mathbf{R}_\theta$ acts on the first $k_{\max} = 12$ Fourier modes. On a $64 \times 64$ grid, the FFT has modes $k = 0, \ldots, 31$ . On a $128 \times 128$ grid, modes $k = 0, \ldots, 63$ . The FNO applies $\mathbf{R}_\theta$ to modes $k = 0, \ldots, 11$ (same weights) and zeros modes $k \geq 12$ .

Accuracy conditions

The transfer is accurate when:

The operator $\mathcal{G}^\dagger$ is well-approximated by its first $k_{\max}$ Fourier modes (smooth operator).
The input $\epsilon_r$ on the finer grid does not introduce significant energy in modes $k > 12$ (smooth inputs).
The local linear layer $\mathbf{W}v(\mathbf{r})$ generalises (it acts pointwise, so resolution-independent by construction).

When it fails

For inputs with sharp features (edges, point scatterers) at scales below $\lambda_{\min} \sim 1/k_{\max}$ , the higher modes carry significant energy. The FNO misses these, producing smoothed outputs. Remedy: increase $k_{\max}$ or use a hybrid FNO + physics-based correction.

ex-ch28-16

Medium

In a PINN for inverse scattering with $N_{\text{inc}} = 8$ incident waves and $N_d = 50$ receivers per wave, the data loss has $8 \times 50 = 400$ terms. If we use $N_c = 5000$ collocation points, how should the PDE weight $\lambda$ be chosen to balance data and PDE losses?

Show Hint

Consider the magnitudes: data residuals are $O(|E^s|)$ , PDE residuals are $O(\kappa^2\,|E|)$ .

A common heuristic: $\lambda = N_d / (N_c \cdot \kappa^2)$ .

Solution

Scale analysis

Data loss: $\mathcal{L}_d \sim (1/400)\sum|E_\theta - E^{\text{meas}}|^2 \sim |E|^2$ . PDE loss: $\mathcal{L}_p \sim (\lambda/5000)\sum|\nabla^2 u + \kappa^{2}\epsilon u|^2 \sim \lambda\,\kappa^{4}\,|E|^2$ .

Balancing

For $\mathcal{L}_d \sim \mathcal{L}_p$ : $\lambda \sim 1/\kappa^{4} \sim (\lambda/(2\pi))^4$ . At 3 GHz ( $\lambda = 0.1$ m): $\lambda \sim (0.016)^4 \approx 6.6 \times 10^{-8}$ .

Practical scheduling

Start with $\lambda = 0$ (pure data fitting) and increase to the target value over the first 30% of training. This curriculum avoids the PDE constraint dominating early training when the network outputs are far from the solution.

ex-ch28-17

Challenge

Design a multi-modal fusion system that gracefully handles missing modalities. Specifically, for a radar + camera + LiDAR system, propose an architecture where the network can operate with any subset of modalities available, and prove that the resulting detector's performance degrades monotonically as modalities are removed (it never gets worse by adding a modality).

Show Hint

Use independent encoders + attention-based fusion with masking.

The information-theoretic bound $I(\Theta; Y_1, Y_2) \geq I(\Theta; Y_1)$ provides the theoretical guarantee.

Solution

Architecture

Independent encoders: $\mathbf{z}_r = E_r(\mathbf{y}_{\text{radar}})$ , $\mathbf{z}_c = E_c(\mathbf{y}_{\text{camera}})$ , $\mathbf{z}_l = E_l(\mathbf{y}_{\text{LiDAR}})$ .
Masked attention fusion: $\mathbf{z} = \text{Attention}(\mathbf{z}_r, \mathbf{z}_c, \mathbf{z}_l; \mathbf{m})$ where $\mathbf{m} \in \{0,1\}^3$ indicates available modalities.
Detection head operates on $\mathbf{z}$ .

Training with random dropout

During training, randomly mask each modality with probability $p = 0.2$ . This forces the network to learn to exploit whatever modalities are available.

Monotonicity guarantee

Under optimal Bayesian fusion (which attention approximates): $I(\Theta; \mathbf{z}_r, \mathbf{z}_c, \mathbf{z}_l) \geq I(\Theta; \mathbf{z}_r, \mathbf{z}_c) \geq I(\Theta; \mathbf{z}_r)$ by the chain rule and non-negativity of mutual information. The learned fusion approximates this optimal combination, and modality dropout during training ensures the approximation is tight for all subsets.

ex-ch28-18

Hard

The spectral bias of MLPs means that a standard PINN for the Helmholtz equation at frequency $f$ converges slowly for the high-frequency components of the solution. Quantify this: for an MLP with ReLU activations, what is the expected convergence rate for the $k$ -th Fourier mode of the PINN solution?

Show Hint

The Neural Tangent Kernel (NTK) of a ReLU MLP decays as $O(k^{-2})$ for the $k$ -th mode.

Convergence rate of gradient descent on the $k$ -th mode is proportional to the NTK eigenvalue.

Solution

NTK eigenvalue decay

For a ReLU MLP, the NTK eigenvalues $\lambda_k$ scale as $\lambda_k \sim k^{-(2L+1)}$ for network depth $L$ and Fourier mode $k$ . For a 4-layer MLP: $\lambda_k \sim k^{-9}$ .

Convergence rate

Under gradient descent with learning rate $\eta$ , the error in mode $k$ decays as $\epsilon_k(t) \sim e^{-\eta\lambda_k t}$ . For mode $k = 10$ relative to mode $k = 1$ : $\lambda_{10}/\lambda_1 \sim 10^{-9}$ . Mode 10 converges $10^9$ times slower than mode 1.

Mitigation

Fourier feature encoding modifies the NTK to have $\lambda_k \sim \text{const}$ for $k \leq k_{\max}$ (controlled by the encoding bandwidth $\sigma$ ), eliminating the spectral bias for the targeted frequency range.

ex-ch28-19

Challenge

Formulate a joint multi-view geometry + differentiable RF rendering framework for distributed ISAC: $N_t$ transmitters and $N_r$ receivers at unknown positions observe a scene. The goal is to jointly estimate the scene (reflectivity $\mathbf{c}$ ) and the sensor positions (analogous to bundle adjustment in SfM).

Show Hint

The sensing matrix $\mathbf{A}$ depends on sensor positions: $\mathbf{A} = \mathbf{A}(\{\mathbf{s}_{i}\}, \{\mathbf{r}_{j}\})$ .

Joint optimisation over $\mathbf{c}$ and positions.

Solution

Forward model

$\mathbf{y} = \mathbf{A}(\{\mathbf{s}_{i}\}, \{\mathbf{r}_{j}\})\,\mathbf{c} + \mathbf{w}$ , where $[\mathbf{A}]_{(i,j,k),q} = \kappa_{k}^{2}\,G(\mathbf{r}_{j}, \mathbf{p}_{q})\,G(\mathbf{p}_{q}, \mathbf{s}_{i})\,\Delta V$ .

Joint cost function

$\min_{\mathbf{c}, \{\mathbf{s}_{i}\}, \{\mathbf{r}_{j}\}} \|\mathbf{y} - \mathbf{A}\mathbf{c}\|^2 + \lambda_c\|\mathbf{c}\|_1 + \lambda_p\,\mathcal{R}_{\text{pos}},$ $where$ \mathcal{R}_{\text{pos}}$ regularises positions (e.g., proximity to nominal values from GPS/blueprint).

Alternating optimisation

Fix positions, solve for $\mathbf{c}$ (LASSO/ADMM).
Fix $\mathbf{c}$ , update positions via gradient descent (differentiating $\mathbf{A}$ w.r.t. $\mathbf{s}_{i}, \mathbf{r}_{j}$ ).
Iterate.

This is the RF analog of bundle adjustment: jointly estimating the "scene" (reflectivity) and "camera poses" (sensor positions) from measurements.

ex-ch28-20

Challenge

Prove the universal approximation theorem for neural operators (Theorem 28.5) in the 1D case: for a continuous operator $\mathcal{G}^\dagger \colon L^2([0,1]) \to L^2([0,1])$ , show that truncation to the first $k_{\max}$ Fourier modes followed by a universal approximator in $\mathbb{R}^{2k_{\max}+1}$ can approximate $\mathcal{G}^\dagger$ to arbitrary accuracy on compact sets.

Show Hint

Use the Fourier series truncation error for $H^s$ functions.

Apply the standard universal approximation theorem in finite dimensions.

Solution

Fourier truncation

For $a \in H^s([0,1])$ , the truncation to modes $|k| \leq k_{\max}$ satisfies $\|a - a_{k_{\max}}\|_{L^2} \leq C\,k_{\max}^{-s}\,\|a\|_{H^s}$ .

Finite-dimensional approximation

The mapping $\hat{a} = (\hat{a}_0, \ldots, \hat{a}_{k_{\max}}) \mapsto (\hat{u}_0, \ldots, \hat{u}_{k_{\max}})$ is a continuous function $\mathbb{R}^{2k_{\max}+1} \to \mathbb{R}^{2k_{\max}+1}$ . By the universal approximation theorem, there exists a neural network $\Phi_\theta$ with $\|\Phi_\theta(\hat{a}) - \hat{u}^{\text{true}}\| < \epsilon'$ .

Total error bound

$\|\mathcal{G}_\theta(a) - \mathcal{G}^\dagger(a)\|_{L^2} \leq \underbrace{\|P_{k_{\max}}\mathcal{G}^\dagger(a) - \mathcal{G}^\dagger(a)\|}_{O(k_{\max}^{-s})} + \underbrace{\|\Phi_\theta(\hat{a}) - P_{k_{\max}}\mathcal{G}^\dagger(a)\|}_{< \epsilon'}.$ $Choosing$ k_{\max} $large enough and$ \Phi_\theta $sufficiently expressive, the total error is$ < \epsilon $.$ \blacksquare$

Exercises

ex-ch28-01

Constraint count

Degrees of freedom

ex-ch28-02

Direct computation

ex-ch28-03

Hessian structure

Schur complement

ex-ch28-04

Substitution

ex-ch28-05

Per-interval integral

Transmittance recursion

Full sum

ex-ch28-06

Vectorisation

Matrix form

ex-ch28-07

Chain rule

Implicit differentiation

Adjoint substitution

ex-ch28-08

Three differences

ex-ch28-09

Radar cross-range

Camera cross-range

Comparison

ex-ch28-10

Log-likelihood decomposition

FIM as negative Hessian

Conclusion

ex-ch28-11

Back-projection

Error propagation

Example

ex-ch28-12

PDE residual

Automatic differentiation

ex-ch28-13

FNO layer cost

Comparison with CNN

ex-ch28-14

Rotated kernel

Convolution under rotation

Equivariance

ex-ch28-15

Resolution transfer mechanism

Accuracy conditions

When it fails

ex-ch28-16

Scale analysis

Balancing

Practical scheduling

ex-ch28-17

Architecture

Training with random dropout

Monotonicity guarantee

ex-ch28-18

NTK eigenvalue decay

Convergence rate

Mitigation

ex-ch28-19

Forward model

Joint cost function

Alternating optimisation

ex-ch28-20

Fourier truncation

Finite-dimensional approximation

Total error bound