Primitive-Based Scene Representations

Scenes Have Structure

Indoor environments are not random collections of voxels. They are composed of walls (planes), pillars (cylinders), desks (boxes), and pipes (cylinders). Voxel grids and neural fields ignore this structure, using millions of parameters to represent what could be described by dozens of geometric primitives. This section explores a research direction that we believe has significant untapped potential: representing RF scenes as compositions of geometric primitives, and learning to decompose RF measurements into these building blocks.

Definition:

Primitive-Based Scene Decomposition

A scene is represented as a union of KK geometric primitives:

Ξ³^(x)=βˆ‘k=1KΞ±kβ‹…1Pk(x),\hat{\boldsymbol{\gamma}}(\mathbf{x}) = \sum_{k=1}^{K} \alpha_k \cdot \mathbb{1}_{\mathcal{P}_k}(\mathbf{x}),

where each primitive Pk\mathcal{P}_k is defined by:

  • Type Ο„k∈{box,cylinder,plane,sphere}\tau_k \in \{\text{box}, \text{cylinder}, \text{plane}, \text{sphere}\}
  • Pose (pk,Rk)(\mathbf{p}_k, \mathbf{R}_k): position and orientation
  • Scale sk\mathbf{s}_k: dimensions (width, height, depth)
  • Reflectivity Ξ±k∈C\alpha_k \in \mathbb{C}: complex-valued RF response

The full parameter set per primitive is ΞΈk=(Ο„k,pk,Rk,sk,Ξ±k)\boldsymbol{\theta}_k = (\tau_k, \mathbf{p}_k, \mathbf{R}_k, \mathbf{s}_k, \alpha_k), typically 10--15 parameters. A room with 30 primitives requires ∼450\sim 450 parameters -- compare to 10610^6 voxels or 10610^6 neural network weights.

Definition:

Forward Model for Primitive Scenes

The RF forward model for a primitive-based scene:

y=βˆ‘k=1KΞ±k A[Pk]+w,\mathbf{y} = \sum_{k=1}^{K} \alpha_k \, \mathbf{A}[\mathcal{P}_k] + \mathbf{w},

where A[Pk]\mathbf{A}[\mathcal{P}_k] is the measurement response of primitive kk, computed by integrating the sensing kernel over the primitive's volume:

[A[Pk]]m=∫Pkam(x) dx.[\mathbf{A}[\mathcal{P}_k]]_m = \int_{\mathcal{P}_k} a_m(\mathbf{x}) \, d\mathbf{x}.

For canonical primitives (boxes, cylinders), these integrals have closed-form or efficient semi-analytical expressions in the wavenumber domain. This makes the forward model differentiable with respect to all primitive parameters, enabling gradient-based optimisation.

Theorem: Representation Compression Ratio

Let a scene consist of KK axis-aligned boxes in R3\mathbb{R}^3. A voxel representation at resolution Ξ”x\Delta x requires N=(L/Ξ”x)3N = (L/\Delta x)^3 parameters for a scene of side length LL. The primitive representation requires 7K7K parameters (3 position, 3 scale, 1 reflectivity per box). The compression ratio is:

ρ=N7K=(L/Ξ”x)37K.\rho = \frac{N}{7K} = \frac{(L/\Delta x)^3}{7K}.

For a 10Γ—10Γ—310\times 10\times 3 m room at 5 cm resolution with K=30K = 30 primitives: ρ=2.4Γ—106/210β‰ˆ11,400\rho = 2.4 \times 10^6 / 210 \approx 11{,}400.

Primitive representations exploit the structured, low-dimensional nature of indoor scenes. The compression ratio grows cubically with the inverse resolution, making primitives increasingly advantageous for high-resolution reconstruction.

PrimitiveΒ DecompositionΒ ofΒ aΒ 2DΒ Scene\text{Primitive Decomposition of a 2D Scene}

Demonstrates how a 2D scene (room floor plan) can be approximated by rectangular primitives. Adjust the number of primitives to see the trade-off between compactness and approximation accuracy.

Parameters
10
0.1

Definition:

Optimisation for Primitive Recovery

Given measurements y\mathbf{y}, recover the primitives by minimising:

min⁑K,{ΞΈk}βˆ₯yβˆ’βˆ‘k=1KΞ±kA[Pk]βˆ₯22+Ξ»K,\min_{K, \{\boldsymbol{\theta}_k\}} \left\|\mathbf{y} - \sum_{k=1}^{K} \alpha_k \mathbf{A}[\mathcal{P}_k]\right\|_2^2 + \lambda K,

where Ξ»K\lambda K penalises the number of primitives (Occam's razor). This is a mixed discrete-continuous problem:

  • Discrete: number KK and types {Ο„k}\{\tau_k\} of primitives.
  • Continuous: poses, scales, and reflectivities.

Approaches: (1) greedy addition (add one primitive at a time, optimise, repeat); (2) differentiable relaxation of the indicator function via sigmoid approximation; (3) neural network that predicts primitives from the matched-filter image.

Example: Primitive Decomposition of an Office

An office contains 4 walls, 2 desks, 1 metal filing cabinet, and 1 cylindrical pillar. Parameterise this scene as geometric primitives and compute the total parameter count. Compare with a voxel grid at 5 cm resolution for a 6Γ—8Γ—36 \times 8 \times 3 m room.

Connection to CAD and BIM

Architecture and construction use Computer-Aided Design (CAD) and Building Information Models (BIM) that describe buildings as collections of geometric primitives -- exactly our representation. This creates a powerful synergy:

  • BIM as prior: if a BIM model of the building exists (increasingly common for new construction), it provides a strong geometric prior. The RF imaging task reduces to estimating the electromagnetic properties (reflectivity, permittivity) of known geometric elements.

  • RF-to-BIM: conversely, RF imaging could reconstruct a BIM-like model from wireless measurements, enabling "as-built" documentation without laser scanning.

  • Digital twin fusion: a BIM-initialised primitive representation, refined by RF measurements, creates a physically grounded digital twin for channel prediction and network planning.

Definition:

Differentiable Primitive Rendering

To optimise primitive parameters via gradient descent, the indicator function 1Pk(x)\mathbb{1}_{\mathcal{P}_k}(\mathbf{x}) must be differentiable. Replace the hard indicator with a smooth approximation:

1~Pk(x)=σ ⁣(βˆ’dk(x)Ο΅),\tilde{\mathbb{1}}_{\mathcal{P}_k}(\mathbf{x}) = \sigma\!\left(-\frac{d_k(\mathbf{x})}{\epsilon}\right),

where dk(x)d_k(\mathbf{x}) is the signed distance function (SDF) of primitive kk and Οƒ(β‹…)\sigma(\cdot) is the sigmoid function. As Ο΅β†’0\epsilon \to 0, this recovers the hard indicator.

The SDF of canonical primitives is known in closed form:

  • Box with half-extents s\mathbf{s}: d(x)=βˆ₯max⁑(∣xβˆ’pβˆ£βˆ’s,0)βˆ₯2d(\mathbf{x}) = \|\max(|\mathbf{x} - \mathbf{p}| - \mathbf{s}, \mathbf{0})\|_2
  • Cylinder: d(x)=x2+y2βˆ’rd(\mathbf{x}) = \sqrt{x^2 + y^2} - r
  • Plane with normal n^\hat{\mathbf{n}}: d(x)=n^β‹…(xβˆ’p)d(\mathbf{x}) = \hat{\mathbf{n}} \cdot (\mathbf{x} - \mathbf{p})

This enables end-to-end gradient flow from the measurement loss to the primitive parameters.

Greedy Primitive Addition Algorithm

Complexity: O(Kβ‹…Titerβ‹…Cfwd)\mathcal{O}(K \cdot T_{\mathrm{iter}} \cdot C_{\mathrm{fwd}}) where CfwdC_{\mathrm{fwd}} is the cost of one forward model evaluation.
Input: measurements y, sensing operator A, tolerance epsilon
Output: primitive set {P_1, ..., P_K}
1: residual r <- y
2: K <- 0
3: while ||r||_2 > epsilon do
4: K <- K + 1
5: // Initialise new primitive from matched-filter of residual
6: image_mf <- A^H * r
7: (p_K, s_K, tau_K) <- detect_strongest_primitive(image_mf)
8: alpha_K <- <A[P_K], r> / ||A[P_K]||^2
9: // Joint refinement of all K primitives
10: for iter = 1 to max_iter do
11: theta <- theta - eta * gradient(||y - sum_k alpha_k A[P_k]||^2)
12: end for
13: r <- y - sum_{k=1}^{K} alpha_k A[P_k]
14: end while
15: return {P_1, ..., P_K}

The algorithm alternates between adding new primitives (step 7) and jointly refining all parameters (step 10-12). The matched-filter image guides initialisation, while gradient descent handles the continuous optimisation. The penalty Ξ»K\lambda K in the stopping criterion (step 3) implicitly controls the number of primitives.

Historical Note: Constructive Solid Geometry in Computer Graphics

1960s-1980s

The idea of representing complex shapes as Boolean combinations of simple primitives dates to the 1960s in computer graphics, known as Constructive Solid Geometry (CSG). Requicha's 1980 framework formalised CSG as unions, intersections, and differences of halfspaces. The RF imaging primitive decomposition is a probabilistic, measurement-driven variant of CSG: instead of designing primitives by hand, we infer them from electromagnetic measurements. The connection to CSG suggests that Boolean operations (e.g., a room is a large box minus window openings) could further compress the representation.

Common Mistake: Non-Convexity of Primitive Recovery

Mistake:

Expecting gradient descent on the primitive loss to converge to the global optimum from a random initialisation.

Correction:

The primitive recovery problem is highly non-convex: the number of primitives is discrete, and the loss has many local minima (e.g., swapping two primitives gives the same loss). Use matched-filter-guided initialisation, greedy addition, and multiple random restarts. A neural network that predicts an initial primitive set from the matched-filter image can provide a much better starting point.

Open Research Directions for Primitive Representations

This is a largely unexplored research direction for RF imaging. Key open questions include:

  • Primitive dictionary design: which set of primitive types best covers indoor environments? Should we include curved surfaces, L-shapes, or learn the dictionary from data?

  • RF response modelling: can we compute the RF response A[Pk]\mathbf{A}[\mathcal{P}_k] in closed form for non-trivial primitives? The wavenumber-domain integral over a box is a product of sinc functions; what about cylinders and wedges?

  • Number estimation: how to determine KK from data? The BIC or minimum description length (MDL) principle may provide principled criteria.

  • Non-rigid primitives: deformable primitives (e.g., a superquadric with shape parameters) could capture more complex geometries with few additional parameters.

  • Multi-frequency: do primitives at different frequencies share the same geometry but different reflectivities? This constraint could dramatically improve reconstruction.

We believe this direction offers a fertile intersection of RF imaging, computer vision, and computational geometry.

πŸŽ“CommIT Contribution(2025)

Primitive-Based RF Scene Decomposition (Research Concept)

A. Rezaei, G. Caire β€” CommIT Group Internal Report

The idea of representing RF scenes as compositions of geometric primitives originated in the CommIT group. The key insight is that indoor scenes have low-dimensional geometric structure that voxel grids and neural fields fail to exploit. By parameterising scenes as unions of boxes, cylinders, and planes with complex-valued reflectivities, the reconstruction problem reduces from millions of unknowns to tens of parameters. This enables: (1) physically interpretable reconstructions aligned with building floor plans; (2) extreme compression ratios (>10,000Γ—>10{,}000\times); (3) natural integration with BIM/CAD models for digital twin applications. The research programme includes differentiable rendering of RF primitive responses, greedy decomposition algorithms, and neural networks for primitive prediction.

primitive-decompositionscene-representationdigital-twin

Primitive-Based Scene Representation

A scene representation that decomposes the environment into a union of simple geometric shapes (boxes, cylinders, planes), each with position, orientation, scale, and reflectivity. Achieves extreme compression compared to voxel grids.

Related: Dynamic Scene Imaging

Key Takeaway

Primitive-based representations exploit the geometric structure of indoor scenes, achieving compression ratios exceeding 10,000Γ—10{,}000\times over voxel grids. The approach enables physically interpretable reconstructions, natural BIM/CAD integration, and differentiable optimisation. The main challenges are the non-convexity of primitive recovery and the need for efficient closed-form RF response computation. This is a largely open research direction with significant potential.