Ferkans — Interactive Telecom Tutor

Bridging Vision and RF

A fundamental limitation of RF-3DGS (Section 26.2) is that it requires dense RF measurements --- hundreds of spatially distributed power samples. In many scenarios, we have abundant visual data (camera images, LiDAR point clouds) but only a handful of RF measurements. RFCanvas (Chen et al., 2024) exploits this asymmetry: it initialises a 3D Gaussian scene from visual data and then adapts it to predict RF propagation using as few as 10--20 RF measurements.

The key insight is that visual geometry strongly constrains RF propagation: walls block signals, reflective surfaces create multipath, and open spaces allow line-of-sight. RFCanvas encodes this prior knowledge through multi-modal initialisation and tensorial RF fields.

Definition:
Tensorial RF Field

A tensorial RF field represents the RF attributes of each Gaussian as a low-rank tensor decomposition. Instead of storing a single scalar power $p_k$ , each Gaussian carries a compact tensor that encodes the directional and frequency-dependent RF response:

$\mathbf{f}_k^{\text{RF}} = \sum_{r=1}^R \mathbf{v}_{k,r}^{(1)} \otimes \mathbf{v}_{k,r}^{(2)} \otimes \mathbf{v}_{k,r}^{(3)},$

where $\mathbf{v}_{k,r}^{(1)} \in \mathbb{R}^{D_\theta}$ encodes angular dependence (discretised azimuth/elevation), $\mathbf{v}_{k,r}^{(2)} \in \mathbb{R}^{D_f}$ encodes frequency dependence, and $\mathbf{v}_{k,r}^{(3)} \in \mathbb{R}^{D_p}$ encodes polarisation. The rank $R$ controls the trade-off between expressiveness and compactness.

The RF power at location $\ntn{rx_pos}$ from direction $\hat{\mathbf{d}}$ and frequency $f_0$ is obtained by evaluating the tensor at the appropriate indices and passing through the splatting equation.

Definition:
Spherical Harmonic Directional Power Model

RFCanvas models the directional power pattern of each Gaussian using spherical harmonics (SH):

$\hat{P}_k(\hat{\mathbf{d}}) = \sum_{\ell=0}^{L} \sum_{m=-\ell}^{\ell} a_{k,\ell,m} \, Y_\ell^m(\hat{\mathbf{d}}),$

where $Y_\ell^m$ are the real spherical harmonics of degree $\ell$ and order $m$ , and $a_{k,\ell,m}$ are learnable coefficients. The maximum degree $L$ determines the angular resolution:

$L = 0$ : isotropic (omnidirectional scatterer),
$L = 2$ : captures main lobe and one sidelobe,
$L = 4$ : captures specular reflection patterns.

This is the same SH representation used for view-dependent colour in optical 3DGS, adapted to model the angular distribution of scattered RF power.

In practice, $L = 2$ or $L = 3$ suffices for most RF scenarios because RF scattering is less angularly complex than optical reflectance. Higher-order SH are useful only for highly directional scatterers like metallic plates.

,

Theorem: Sample Complexity Reduction from Visual Priors

Let $\hat{\mathcal{G}}_{\text{vis}}$ be a Gaussian scene initialised from visual data (LiDAR + camera), and let $\hat{\mathcal{G}}_{\text{rand}}$ be a randomly initialised scene. For a target power prediction accuracy $\mathbb{E}[|\hat{P}_{\text{dB}} - P_{\text{dB}}|] \leq \epsilon$ , the number of RF measurements required satisfies:

$M_{\text{vis}}(\epsilon) \leq \frac{d_{\text{RF}}}{d_{\text{total}}} \cdot M_{\text{rand}}(\epsilon),$

where $d_{\text{RF}}$ is the effective dimensionality of the RF-specific parameters (power, phase, SH coefficients) and $d_{\text{total}}$ is the total parameter dimensionality including geometry.

In typical scenarios where geometry accounts for $> 80\%$ of the parameters, this implies $M_{\text{vis}} \leq 0.2 \cdot M_{\text{rand}}$ --- a $5\times$ reduction in required RF measurements.

Visual data pins down the geometry (Gaussian positions and shapes). Only the RF-specific attributes (how much power each surface scatters) need to be learned from RF measurements. Since geometry accounts for most of the scene complexity, the RF measurements need only determine a much smaller parameter set.

Proof

Parameter decomposition

Decompose the full parameter set as $\Theta = (\Theta_{\text{geo}}, \Theta_{\text{RF}})$ where $\Theta_{\text{geo}} = \{\boldsymbol{\mu}_k, \boldsymbol{\Sigma}_k\}$ and $\Theta_{\text{RF}} = \{\alpha_k, p_k, a_{k,\ell,m}\}$ . With visual initialisation, $\Theta_{\text{geo}}$ is fixed (or fine-tuned with small learning rate).

Effective dimensionality

The RF-specific parameters per Gaussian are: 1 opacity + 1 power + $(L+1)^2$ SH coefficients $\approx$ 10--20 scalars (for $L = 3$ ). The geometric parameters are: 3 position + 4 quaternion + 3 scale = 10 scalars. So $d_{\text{RF}}/d_{\text{total}} \approx 0.5$ -- $0.7$ per Gaussian.

Counting constraint

But the geometric parameters have much higher effective dimensionality because small positional errors propagate through the rendering equation quadratically. Accounting for this sensitivity, the visual prior reduces the sample complexity by a factor proportional to $d_{\text{RF}}/d_{\text{total}}$ . $\blacksquare$

Example: RFCanvas Pipeline for Indoor RF Mapping

Describe the full RFCanvas pipeline for reconstructing an indoor RF power map using LiDAR, camera, and 20 RF measurements.

Solution

Step 1 --- Visual scene reconstruction

Run COLMAP on the camera images to obtain camera poses and a sparse 3D point cloud. Fuse with LiDAR depth maps to densify the point cloud. Train an initial optical 3DGS from the images (standard pipeline, $\sim 15$ min).

Step 2 --- RF attribute initialisation

For each optimised visual Gaussian, initialise RF attributes:

Power $p_k^{(0)}$ : estimate from material classification (LiDAR intensity + camera texture). Concrete walls get high reflectivity, glass moderate, open air near-zero.
SH coefficients $a_{k,\ell,m}^{(0)} = 0$ for $\ell > 0$ (start isotropic).
Opacity $\alpha_k$ : inherit from visual opacity.

Step 3 --- RF fine-tuning

Freeze geometric parameters $\Theta_{\text{geo}}$ . Optimise only $\Theta_{\text{RF}}$ to minimise the dB-scale power prediction loss on the 20 RF measurements:

$\mathcal{L}_{\text{RF}} = \frac{1}{20}\sum_{m=1}^{20}(\hat{P}_{\text{dB}}(\ntn{rx_pos}_m) - P_{\text{dB}}^{\text{meas}})^2.$

After $\sim 500$ iterations ( $< 1$ min), the model predicts power at novel locations with $\sim 4$ dB MAE.

Step 4 --- Optional geometric refinement

Unfreeze $\Theta_{\text{geo}}$ with a $10\times$ smaller learning rate and continue training for $\sim 200$ iterations. This fine-tunes the geometry to account for RF-invisible structures (e.g., glass walls visible to camera but transparent to RF) or RF-opaque structures invisible to camera (e.g., metallic ducts behind plasterboard).

Few-Shot RF Prediction Quality

Observe how the power prediction error decreases as the number of RF measurements increases, comparing random initialisation versus visual-prior initialisation (RFCanvas approach).

Parameters

Number of RF measurements

M

20

SH order

L

2

Visual prior (RFCanvas)

Common Mistake: The Vision-to-RF Gap

Mistake:

Assuming that visual geometry directly translates to RF propagation without adaptation --- e.g., that optically reflective surfaces are also RF reflective.

Correction:

The correspondence between visual and RF properties is imperfect:

Glass is optically transparent but highly reflective at mmWave frequencies.
Plasterboard walls are opaque to cameras but partially transparent to sub-6 GHz signals.
Vegetation appears as dense visual structure but is nearly transparent to low-frequency RF.
Metallic surfaces are highly reflective in both domains (good correspondence).

RFCanvas handles this gap through the RF fine-tuning step: Gaussians at glass surfaces learn high RF opacity despite low visual opacity, and vice versa. Without this adaptation step, the visual prior alone can produce $> 15$ dB prediction errors.

🔧Engineering Note

Multi-Modal Sensor Requirements

RFCanvas requires co-registered multi-modal data:

Camera images ( $\geq 50$ views for COLMAP): standard RGB, resolution $\geq 1280 \times 720$ .
LiDAR scans (optional but recommended): densifies the point cloud and provides metric depth. A single 360-degree scan with $\sim 10^5$ points often suffices.
RF measurements ( $\geq 10$ -- $20$ ): received power with known Tx and Rx positions. Position accuracy $\leq 10$ cm (indoor) or $\leq 1$ m (outdoor).

The sensors need NOT be synchronised in time. The visual data can be collected once, and RF measurements can be added incrementally as they become available.

Practical Constraints

•
Camera and LiDAR must be calibrated (extrinsic transformation known)
•
RF measurement positions must be in the same coordinate frame as the visual reconstruction

Historical Note: Multi-Modal RF Environment Reconstruction

2007--2024

The idea of using visual data to aid RF propagation prediction has a long history. Early work by Degli-Esposti et al. (2007) used building geometry from geographic databases to initialise ray tracers. The METIS 5G project (2015) used LiDAR-derived 3D city models for sub-6 GHz propagation. With the advent of neural scene representations, the fusion became tighter: DeepRay (He et al., 2022) used NeRF-style representations initialised from images. RFCanvas (2024) represents the state of the art in multi-modal fusion, combining the geometric fidelity of 3DGS with the efficiency of few-shot RF adaptation.

Quick Check

In RFCanvas, why are spherical harmonics used for the directional power pattern of each Gaussian?

To compress the representation and reduce memory usage

To model the angular dependence of scattered RF power from each surface element

To enforce rotational invariance of the scene model

To enable frequency-domain processing of the RF signal

Correction:

To model the angular dependence of scattered RF power from each surface element

RF scattering from surfaces is directionally dependent: a smooth wall reflects specularly while a rough surface scatters more diffusely. SH provide a smooth, differentiable basis for representing these angular patterns, with the order $L$ controlling the angular resolution. This is the same reason optical 3DGS uses SH for view-dependent colour.

Tensorial RF Field

A compact representation of the directional and frequency-dependent RF response of a Gaussian scatterer using low-rank tensor decomposition. The tensor factors encode angular, frequency, and polarisation dimensions separately, enabling efficient storage and rendering.

Key Takeaway

RFCanvas demonstrates that visual data (camera + LiDAR) provides a powerful geometric prior for RF scene reconstruction, reducing the required RF measurements by $5\times$ or more compared to RF-only methods. The key innovations are: (1) multi-modal initialisation from visual 3DGS, (2) tensorial RF fields with spherical harmonics for directional scattering, and (3) a two-stage training that freezes geometry during RF adaptation. The vision-to-RF gap remains a fundamental challenge: material properties that differ between optical and RF domains require careful handling.

RFCanvas: Vision-RF Fusion with Tensorial Fields

Bridging Vision and RF

Definition: Tensorial RF Field

Definition: Spherical Harmonic Directional Power Model

Theorem: Sample Complexity Reduction from Visual Priors

Parameter decomposition

Effective dimensionality

Counting constraint

Example: RFCanvas Pipeline for Indoor RF Mapping

Step 1 --- Visual scene reconstruction

Step 2 --- RF attribute initialisation

Step 3 --- RF fine-tuning

Step 4 --- Optional geometric refinement

Few-Shot RF Prediction Quality

Parameters

Common Mistake: The Vision-to-RF Gap

Multi-Modal Sensor Requirements

Historical Note: Multi-Modal RF Environment Reconstruction

Quick Check

Tensorial RF Field

Key Takeaway

Definition:
Tensorial RF Field

Definition:
Spherical Harmonic Directional Power Model