Ferkans — Interactive Telecom Tutor

From Rooms to Cities

Most neural scene representation methods (NeRF, SDF, 3DGS) have been demonstrated on single rooms or small areas. Scaling to building-scale, campus-scale, or city-scale RF imaging presents fundamental challenges in memory, computation, and data collection. The right representation for a campus may not yet exist -- and finding it is where the next generation of researchers will make their contributions.

Definition:
Scalability Bottlenecks

Scaling RF imaging to large scenes faces three bottlenecks:

Memory: a single room ( $10 \times 10 \times 3$ m) at 5 cm resolution requires a $200 \times 200 \times 60$ grid ( $2.4$ M voxels). A campus ( $500 \times 500 \times 30$ m) requires $10{,}000 \times 10{,}000 \times 600$ ( $6 \times 10^{10}$ voxels) -- infeasible for any explicit representation.
Computation: rendering a single ray through a campus-scale scene requires evaluating thousands of Gaussians or MLP queries, scaling linearly with scene volume.
Data collection: measuring RF propagation across a campus requires thousands of Tx-Rx locations, weeks of measurement time, and coordination across multiple buildings.

$\text{Computational Cost vs Scene Resolution}$

Compares the memory and computation scaling of voxel grids, neural fields (NeRF), and hierarchical representations as scene size increases. Hierarchical methods maintain tractable cost by adapting resolution to the query location.

Parameters

\text{Scene side length (m)}

50

\text{Resolution (cm)}

5

Definition:
Hierarchical Scene Representations

Hierarchical methods address scalability by dividing the scene:

Spatial partitioning: octrees, block-NeRFs, or tiled 3DGS partition the scene into manageable blocks, each with its own representation.
Level of detail (LOD): coarse representation for distant regions, fine representation for nearby regions. For RF: coarse = path loss model, fine = neural field.
Streaming: load only the relevant blocks into GPU memory as the query region moves.

The composition rule for adjacent blocks must preserve physical consistency: signals crossing block boundaries must maintain phase continuity and amplitude consistency.

Theorem: Complexity of Hierarchical vs Flat Representations

For a scene of side length $L$ at resolution $\Delta x$ , a flat voxel grid requires $\mathcal{O}((L/\Delta x)^3)$ memory. An octree with maximum depth $D = \log_2(L/\Delta x)$ and $K$ occupied leaf nodes requires $\mathcal{O}(K)$ memory, where $K \leq (L/\Delta x)^2$ for scenes with surface-like structure (walls, floors). The memory reduction is:

$\frac{\text{Flat}}{\text{Octree}} = \frac{(L/\Delta x)^3}{(L/\Delta x)^2} = \frac{L}{\Delta x}.$

For a campus at 5 cm: reduction factor $= 500/0.05 = 10{,}000$ .

Indoor scenes are mostly empty space. An octree only stores non-empty regions, and surfaces are 2D manifolds embedded in 3D space. The saving is proportional to the scene depth divided by the resolution.

Proof

Surface scaling

A building's surfaces (walls, floors, ceilings) occupy $\mathcal{O}((L/\Delta x)^2)$ voxels at resolution $\Delta x$ (2D surface area $\propto L^2$ , each surface element $\propto \Delta x^2$ ). Empty space between surfaces contains no scatterers and need not be stored. The octree prunes empty subtrees, retaining only the $K \leq C(L/\Delta x)^2$ leaf nodes that intersect surfaces. $\square$

Example: Campus-Scale Digital Twin Design

Design a hierarchical RF digital twin for a $500 \times 500$ m university campus with 20 buildings. The twin must support 5G channel prediction at 28 GHz within 32 GB GPU memory.

Solution

Spatial partitioning

Divide the campus into $50 \times 50 = 2{,}500$ blocks of $10 \times 10$ m each. Each building block: 3DGS with $\sim 2{,}000$ Gaussians. Outdoor blocks: simplified ray-tracing model (no Gaussians). Total Gaussians: $20 \times 25 \times 2{,}000 = 1{,}000{,}000$ .

LOD strategy

Active zone (within 100 m of any UE): full-resolution Gaussians loaded into GPU. $\sim 300$ blocks $\times 2{,}000$ Gaussians $= 600{,}000$ Gaussians $\times$ 60 bytes $= 36$ MB. Distant zone: replace with analytical path-loss model. Memory fits easily within 32 GB.

Data collection

Phase 1: ray-tracing initialisation from building plans (free). Phase 2: 1,000 CSI measurements from campus Wi-Fi APs for calibration. Phase 3: incremental refinement from ongoing communication traffic (self-supervised). $\square$

Open Problems in Scalable RF Imaging

Block boundary artifacts: how to seamlessly blend neural representations at block boundaries without phase or amplitude discontinuities?
Distributed training: can multiple base stations collaboratively build a campus-scale digital twin from local measurements, without sharing raw data (privacy)?
Incremental mapping: how to extend the map as new areas are explored, without retraining the entire representation?
Compression: learned scene representations must be compressed for storage and transmission. What are the rate-distortion limits for RF scene compression?
Multi-frequency consistency: a campus has sub-6 GHz macro cells and mmWave small cells. Can a single representation serve both frequency bands with shared geometry but frequency-dependent reflectivities?

🔧Engineering Note

GPU Memory Budget for Large-Scale RF Imaging

Modern GPUs provide 24--80 GB memory. A practical budget allocation for campus-scale RF digital twins:

Scene representation: 1--4 GB (Gaussians or neural field weights).
Forward model: 2--8 GB (sensing operator, precomputed kernels).
Optimiser state: 2--8 GB (Adam moments, gradient buffers).
Working memory: 4--16 GB (intermediate computations).

Total: 10--36 GB, fitting within a single A100 (80 GB) or requiring model parallelism across 2 consumer GPUs (2 $\times$ 24 GB). The forward model is often the largest component; the Kronecker structure from Chapter 7 is essential for keeping it tractable.

Common Mistake: Phase Discontinuity at Block Boundaries

Mistake:

Independently training adjacent blocks without enforcing continuity constraints. The result: phase jumps at block boundaries that corrupt channel predictions for signals crossing multiple blocks.

Correction:

Enforce overlap regions where adjacent blocks share measurements. Add a boundary consistency loss: $\mathcal{L}_{\mathrm{boundary}} = \|\boldsymbol{\gamma}_A(\mathbf{x}) - \boldsymbol{\gamma}_B(\mathbf{x})\|^2$ for $\mathbf{x}$ in the overlap zone. Alternatively, use a global phase reference (e.g., from a reference anchor AP).

Hierarchical Scene Representation

A multi-resolution scene representation that allocates fine detail to regions of interest and coarse approximations to distant or empty regions. Enables scaling to campus and city-scale scenes within finite memory budgets.

Key Takeaway

Memory, computation, and data collection are the three scalability bottlenecks. Hierarchical representations (octrees, block-NeRFs, tiled 3DGS) with level-of-detail reduce memory from $\mathcal{O}(N^3)$ to $\mathcal{O}(N^2)$ for surface-dominated scenes. Block boundary consistency, distributed training, and multi-frequency support remain significant open problems.

Scalability to Large Scenes