Neural Radiance Fields Recap
Why NeRF Matters for RF Imaging
Neural radiance fields (NeRFs) represent a 3D scene as a continuous function parameterised by a neural network, rather than as a discrete voxel grid or mesh. This shift from discrete to continuous is precisely what makes NeRF attractive for RF imaging: the scene is queried at arbitrary positions, and the rendering integral is differentiable end-to-end. Before we adapt NeRF for radio frequencies, we need a solid understanding of how it works in its native optical domain.
The golden thread of this chapter: NeRF replaces the voxel grid and the sensing matrix with a continuous neural scene function, but to work for RF it must respect the physics of complex-valued fields, diffraction, and specular scattering.
Definition: Neural Radiance Field (NeRF)
Neural Radiance Field (NeRF)
A Neural Radiance Field represents a 3D scene as a continuous volumetric function parameterised by an MLP:
where is a 3D position, is a viewing direction, is the volume density (opacity per unit length), and is the view-dependent colour (RGB).
The density depends only on position (geometry is view-independent), while the colour depends on both position and direction (appearance may be view-dependent):
Definition: Differentiable Volume Rendering
Differentiable Volume Rendering
The colour of a pixel is computed by casting a ray from the camera origin through the pixel and integrating:
where the transmittance is
representing the probability that the ray travels from to without being absorbed.
Numerical approximation. With stratified samples along the ray:
where . This computation is differentiable with respect to via backpropagation.
Theorem: Volume Rendering as Weighted Combination
Define the alpha value at sample . Then the quadrature approximation can be written as:
The weights satisfy , with equality when the ray is fully absorbed before . These weights define a probability distribution over sample positions, which enables hierarchical sampling in the fine network.
Exponential-to-product conversion
We have .
Telescoping sum
Notice since . Therefore .
Probabilistic interpretation
The normalised weights define a categorical distribution over sample indices. Hierarchical sampling draws additional points from this distribution, concentrating samples near surfaces where the density is high.
Definition: Positional Encoding
Positional Encoding
The positional encoding maps low-dimensional coordinates to a higher-dimensional space, enabling the MLP to learn high-frequency functions. For a scalar :
For a 3D position , each coordinate is encoded independently, yielding a -dimensional vector. Typical values: for position (-dimensional), for direction (-dimensional).
Without positional encoding, the MLP's spectral bias causes it to learn only low-frequency functions, producing blurry reconstructions. The encoding effectively lifts the input into a space where the MLP can represent sharp edges and fine details.
Positional Encoding Frequencies
Visualise how positional encoding lifts a 1D (or 3D) input into a higher-dimensional feature space. Each frequency band captures spatial variation at a different scale. Low produces smooth representations; high captures fine detail but may overfit noise in the measurements.
Parameters
Example: Training a NeRF from Posed Images
Describe the training procedure for a NeRF given posed images , where contains the camera extrinsics and intrinsics.
Camera pose estimation
If poses are unknown, run structure-from-motion (e.g., COLMAP) on the image set to recover . For RF applications, Tx/Rx positions are typically known.
Ray sampling and coarse rendering
For each batch, sample rays uniformly from all images. Along each ray, draw stratified samples. Query the coarse MLP: . Volume-render to get .
Hierarchical sampling and fine rendering
Use the coarse weights as a probability distribution to draw additional samples near surfaces. Query the fine MLP on all samples and render .
Loss and optimisation
5 \times 10^{-4}\sim 100300\sim 12\square$
Definition: Instant-NGP: Multi-Resolution Hash Encoding
Instant-NGP: Multi-Resolution Hash Encoding
Instant-NGP (Mueller et al., 2022) replaces the slow positional encoding + deep MLP with a multi-resolution hash table:
- The scene is discretised into resolution levels, each with a grid of resolution and growth factor .
- Grid vertices are mapped to a hash table of size with feature vectors of dimension per entry.
- For a query point , trilinear interpolation retrieves features at each level; the concatenated -dimensional vector feeds a small MLP (2 layers, 64 units).
Result: Training drops from hours to seconds. Rendering reaches near-real-time rates. The key insight is that hash collisions are tolerable --- they are resolved by gradient-based optimisation, which assigns different feature vectors to colliding entries based on the training loss.
Definition: Mip-NeRF: Anti-Aliased Volume Rendering
Mip-NeRF: Anti-Aliased Volume Rendering
Mip-NeRF (Barron et al., 2021) addresses aliasing artifacts in NeRF by replacing point samples with conical frustums:
- Each pixel subtends a cone (not a ray) through the scene.
- The cone is divided into frustums between sample boundaries.
- Each frustum is approximated by a 3D Gaussian .
- The positional encoding is replaced by an integrated positional encoding (IPE): the expected value of the encoding over the Gaussian, computed in closed form.
This eliminates the scale ambiguity of point-sampled NeRF and improves quality for both close-up and distant views.
For RF imaging, Mip-NeRF's cone-tracing philosophy maps naturally to modelling antenna beam widths: a wide beam illuminates a cone, not a ray, and the received signal integrates over that cone.
Historical Note: The NeRF Revolution (2020-2024)
2020-2024When Mildenhall et al. published NeRF in 2020, the paper's ability to synthesise photorealistic novel views from a handful of photographs took the computer vision community by storm. Within two years, over 500 papers extended NeRF to dynamic scenes, large-scale environments, generative modelling, and --- crucially for this chapter --- RF signal propagation.
The original NeRF required 12 hours of training per scene. Instant-NGP (Mueller et al., 2022) compressed this to under 5 seconds. 3D Gaussian Splatting (Kerbl et al., 2023) then achieved real-time rendering at comparable quality. The speed of this evolution --- from a novel idea to a mature, deployable technology in under four years --- is remarkable in computational science.
Common Mistake: NeRF Is Slow Without Acceleration
Mistake:
Assuming that the original NeRF can be used for real-time RF propagation prediction in deployed systems.
Correction:
The original NeRF requires MLP evaluations per ray, and a single RSS prediction casts multiple rays. For real-time applications, use either:
- Instant-NGP (hash encoding, -- speedup);
- Baked NeRF (pre-compute density on a sparse grid); or
- 3D Gaussian Splatting (explicit primitives, FPS).
For RF applications where training happens offline and inference queries are few (e.g., channel prediction for a handful of Tx-Rx pairs), the original NeRF's speed is acceptable.
Quick Check
In the NeRF architecture, which quantity depends on the viewing direction ?
Volume density
Colour
Both density and colour
Neither
Correct. View-dependent colour enables the model to capture specular highlights and other appearance effects that change with viewing angle.
Volume Density
The function representing the differential probability of a ray being absorbed at point . Units: inverse length (m). In the NeRF context, high density indicates opaque material (surfaces), while indicates free space.
Related: Transmittance
Transmittance
The quantity representing the fraction of light (or RF energy) that survives propagation from to along a ray. means no absorption; means full absorption.
Related: Volume Density
Positional Encoding
A fixed mapping that lifts scalar coordinates to a high-dimensional space using sinusoidal functions at exponentially increasing frequencies. This overcomes the spectral bias of MLPs and enables learning of high-frequency scene features.
Related: Hash Encoding (Instant-NGP)
Hash Encoding (Instant-NGP)
A learnable spatial encoding that stores feature vectors in a multi-resolution hash table. Query points are mapped to hash table entries via spatial hashing at each resolution level, and features are retrieved by trilinear interpolation. Replaces positional encoding with orders-of-magnitude speedup.
Related: Positional Encoding
Key Takeaway
NeRF represents 3D scenes as continuous volumetric fields parameterised by an MLP with positional encoding. Differentiable volume rendering integrates density and colour along camera rays, enabling end-to-end training from posed images. Instant-NGP and Mip-NeRF address NeRF's speed and aliasing limitations, respectively --- both improvements are directly relevant to the RF adaptations in the following sections.