Exercises
ex-ch28-01
EasyThe fundamental matrix has rank 2. How many independent constraints does a single point correspondence provide on ? How many correspondences are needed for the 8-point algorithm?
The epipolar constraint is linear in the entries of .
has entries but is defined up to scale.
Constraint count
Each correspondence gives one scalar equation: . This is linear in the 9 entries of .
Degrees of freedom
has 9 entries, but it is defined up to scale (only the direction in the 9D space matters), giving 8 DOF. The rank-2 constraint () reduces this to 7. The 8-point algorithm ignores the rank constraint and uses 8 correspondences to solve for linearly (up to scale), then enforces rank 2 by SVD truncation.
ex-ch28-02
EasyGiven the essential matrix , show that (the epipole is in the left null space of ).
.
Direct computation
.
The epipole (the projection of camera 1's centre onto image 2) lies in the left null space of .
ex-ch28-03
MediumIn bundle adjustment, the Jacobian has a block-sparse structure. For a problem with cameras and 3D points, each with observations per camera, derive the sizes of the Hessian blocks in the Schur complement formulation and explain why eliminating points first is efficient.
The Hessian is . Partition into camera-camera, point-point, and camera-point blocks.
The point-point block is block-diagonal (each point's observations are independent of other points).
Hessian structure
Let camera parameters be and point parameters . The Hessian partitions as:
where (camera-camera), (point-point, block-diagonal with blocks), and (camera-point).
Schur complement
Eliminating points: . Since is block-diagonal, costs (invert each block). The reduced system is , which for cameras is β easily solvable. Eliminating cameras first would give a dense system, far larger.
ex-ch28-04
EasyState the rendering equation for a Lambertian surface (constant BRDF ) under a single directional light source from direction with irradiance . Simplify the integral.
A directional light source is a delta function in the incoming radiance.
Substitution
For a directional light: . Substituting into the rendering equation:
This is the classic Lambert's cosine law: the reflected radiance is proportional to the cosine of the incidence angle, independent of the viewing direction .
ex-ch28-05
MediumDerive the discrete volume rendering formula from the continuous integral by assuming constant density and colour within each interval of length .
Within interval : for .
Per-interval integral
$
Transmittance recursion
, with . Therefore .
Full sum
\alpha_i = 1 - e^{-\sigma_i\delta_i}\hat{C} = \sum_i T_i,\alpha_i,\mathbf{c}_i\blacksquare$
ex-ch28-06
MediumUnder the Born approximation, the RF forward model for a single Tx-Rx pair at frequency is:
where . Show that the full measurement vector is linear in , and identify the sensing matrix .
Stack measurements from multiple Tx-Rx pairs and frequencies into a vector.
Vectorisation
For Tx , Rx , frequency : , where .
Matrix form
Stacking all measurements: , where with rows and columns. This is a linear system in .
ex-ch28-07
HardDerive the adjoint method gradient for a scalar loss where solves the MoM system . Show that only one additional linear solve is needed, regardless of the dimension of .
Use the chain rule and implicit differentiation of the linear system.
Define as the solution of the adjoint system.
Chain rule
for each component of .
Implicit differentiation
From : , so .
Adjoint substitution
Substituting: ,
where solves the single adjoint system . This solve is independent of , so all gradient components cost only one adjoint solve.
ex-ch28-08
EasyList three key differences between optical and RF rendering that affect the choice of forward model. For each, state which approximation is valid in the optical regime but breaks down in the RF regime.
Think about wavelength, coherence, and interaction model.
Three differences
-
Wavelength vs. feature size: Optical nm scene features; RF mm to 10 cm features. Ray optics valid in optical, not in RF.
-
Coherence: Optical rendering sums intensities (); RF must sum complex fields (), producing interference.
-
BRDF vs. scattering cross-section: The BRDF assumes smooth surfaces at the wavelength scale; at RF wavelengths, surfaces have sub-wavelength structure requiring wave-theoretic scattering models.
ex-ch28-09
EasyA 77 GHz automotive radar has angular resolution . At a range of 50 m, what is the cross-range resolution? Compare with a camera pixel subtending .
Cross-range resolution (in radians).
Radar cross-range
m.
Camera cross-range
m cm.
Comparison
The camera has better angular resolution than the radar. This demonstrates the complementarity: radar provides range and velocity, camera provides angular detail.
ex-ch28-10
MediumProve that for two conditionally independent sensor measurements and given scene parameter , the Fisher information matrix is additive: .
Use .
Take the log and compute the second derivative.
Log-likelihood decomposition
.
FIM as negative Hessian
.
Conclusion
.
The independence ensures no cross-terms appear.
ex-ch28-11
MediumIn BEV fusion, the camera-to-BEV transformation requires estimating per-pixel depth. If the depth estimate has an error , how does this translate to a positional error in the BEV plane for a pixel at image coordinates with focal length ?
Back-project: .
Back-projection
The 3D point is . The BEV coordinates are from .
Error propagation
, . The along-range error equals directly; the cross-range error scales with the pixel's off-axis angle.
Example
For m, m, pixels, pixels: m, m. This motivates fusing with radar, which provides accurate range directly.
ex-ch28-12
HardFor a PINN solving the 2D Helmholtz equation on , with parameterised by a 4-layer MLP with tanh activations, derive the form of the PDE residual loss and explain how the second-order spatial derivatives and are computed via automatic differentiation.
Two nested calls to autograd.
The cost is per collocation point per derivative.
PDE residual
At collocation point :
Automatic differentiation
- First call:
grad_x = autograd(u, x)gives . - Second call:
grad_xx = autograd(grad_x, x)gives . - Similarly for .
- Each call costs (one backward pass per scalar output).
- Total cost: per collocation point.
ex-ch28-13
HardThe Fourier Neural Operator applies a learnable filter to each of the first Fourier modes. For an input on an grid with channels, compute the total FLOPs per FNO layer and compare with a standard convolution layer.
FFT costs per channel. The filter multiplication costs .
FNO layer cost
- FFT: channels = .
- Filter: modes matrix multiply = .
- IFFT: .
- Local linear: .
- Total: .
Comparison with CNN
A convolution: . For , , : FNO M FLOPs (FFT + filter), CNN M FLOPs. The FNO is more efficient because it captures long-range interactions via the Fourier domain without large kernel sizes.
ex-ch28-14
HardA steerable CNN uses kernels expressed in the circular harmonic basis . Show that under a rotation by angle , the output feature map of order is multiplied by , demonstrating exact rotation equivariance.
Under rotation: .
The convolution integral inherits the phase shift.
Rotated kernel
.
Convolution under rotation
Let be the rotated input. Then:
Substituting :
Equivariance
e^{jm\alpha}SO(2)me^{jm\alpha}\blacksquare$
ex-ch28-15
ChallengeConsider an FNO trained to map permittivity to scattered field on a grid. The FNO uses Fourier modes. Explain how the trained FNO can be evaluated on a grid without retraining, and analyse the conditions under which this resolution transfer is accurate.
The FNO weights are defined in the Fourier domain, independent of spatial resolution.
Zero-pad the higher modes in the FFT.
Resolution transfer mechanism
The FNO filter acts on the first Fourier modes. On a grid, the FFT has modes . On a grid, modes . The FNO applies to modes (same weights) and zeros modes .
Accuracy conditions
The transfer is accurate when:
- The operator is well-approximated by its first Fourier modes (smooth operator).
- The input on the finer grid does not introduce significant energy in modes (smooth inputs).
- The local linear layer generalises (it acts pointwise, so resolution-independent by construction).
When it fails
For inputs with sharp features (edges, point scatterers) at scales below , the higher modes carry significant energy. The FNO misses these, producing smoothed outputs. Remedy: increase or use a hybrid FNO + physics-based correction.
ex-ch28-16
MediumIn a PINN for inverse scattering with incident waves and receivers per wave, the data loss has terms. If we use collocation points, how should the PDE weight be chosen to balance data and PDE losses?
Consider the magnitudes: data residuals are , PDE residuals are .
A common heuristic: .
Scale analysis
Data loss: . PDE loss: .
Balancing
For : . At 3 GHz ( m): .
Practical scheduling
Start with (pure data fitting) and increase to the target value over the first 30% of training. This curriculum avoids the PDE constraint dominating early training when the network outputs are far from the solution.
ex-ch28-17
ChallengeDesign a multi-modal fusion system that gracefully handles missing modalities. Specifically, for a radar + camera + LiDAR system, propose an architecture where the network can operate with any subset of modalities available, and prove that the resulting detector's performance degrades monotonically as modalities are removed (it never gets worse by adding a modality).
Use independent encoders + attention-based fusion with masking.
The information-theoretic bound provides the theoretical guarantee.
Architecture
- Independent encoders: , , .
- Masked attention fusion: where indicates available modalities.
- Detection head operates on .
Training with random dropout
During training, randomly mask each modality with probability . This forces the network to learn to exploit whatever modalities are available.
Monotonicity guarantee
Under optimal Bayesian fusion (which attention approximates): by the chain rule and non-negativity of mutual information. The learned fusion approximates this optimal combination, and modality dropout during training ensures the approximation is tight for all subsets.
ex-ch28-18
HardThe spectral bias of MLPs means that a standard PINN for the Helmholtz equation at frequency converges slowly for the high-frequency components of the solution. Quantify this: for an MLP with ReLU activations, what is the expected convergence rate for the -th Fourier mode of the PINN solution?
The Neural Tangent Kernel (NTK) of a ReLU MLP decays as for the -th mode.
Convergence rate of gradient descent on the -th mode is proportional to the NTK eigenvalue.
NTK eigenvalue decay
For a ReLU MLP, the NTK eigenvalues scale as for network depth and Fourier mode . For a 4-layer MLP: .
Convergence rate
Under gradient descent with learning rate , the error in mode decays as . For mode relative to mode : . Mode 10 converges times slower than mode 1.
Mitigation
Fourier feature encoding modifies the NTK to have for (controlled by the encoding bandwidth ), eliminating the spectral bias for the targeted frequency range.
ex-ch28-19
ChallengeFormulate a joint multi-view geometry + differentiable RF rendering framework for distributed ISAC: transmitters and receivers at unknown positions observe a scene. The goal is to jointly estimate the scene (reflectivity ) and the sensor positions (analogous to bundle adjustment in SfM).
The sensing matrix depends on sensor positions: .
Joint optimisation over and positions.
Forward model
, where .
Joint cost function
\mathcal{R}_{\text{pos}}$ regularises positions (e.g., proximity to nominal values from GPS/blueprint).
Alternating optimisation
- Fix positions, solve for (LASSO/ADMM).
- Fix , update positions via gradient descent (differentiating w.r.t. ).
- Iterate.
This is the RF analog of bundle adjustment: jointly estimating the "scene" (reflectivity) and "camera poses" (sensor positions) from measurements.
ex-ch28-20
ChallengeProve the universal approximation theorem for neural operators (Theorem 28.5) in the 1D case: for a continuous operator , show that truncation to the first Fourier modes followed by a universal approximator in can approximate to arbitrary accuracy on compact sets.
Use the Fourier series truncation error for functions.
Apply the standard universal approximation theorem in finite dimensions.
Fourier truncation
For , the truncation to modes satisfies .
Finite-dimensional approximation
The mapping is a continuous function . By the universal approximation theorem, there exists a neural network with .
Total error bound
k_{\max}\Phi_\theta< \epsilon\blacksquare$