3D Gaussian Splatting Recap

Why Gaussian Splatting for RF?

In Chapters 24 and 25 we studied implicit neural scene representations (NeRFs, SDFs) and differentiable rendering. These methods achieve remarkable reconstruction quality but suffer from a fundamental limitation: rendering requires hundreds of MLP evaluations per ray, making real-time inference impractical. The question that motivates this chapter is: can we represent RF scenes with an explicit, GPU-friendly primitive that enables both fast training and real-time rendering?

The answer comes from 3D Gaussian Splatting (3DGS), introduced by Kerbl et al. at SIGGRAPH 2023. Instead of encoding the scene in network weights, 3DGS uses a collection of anisotropic 3D Gaussians as explicit primitives. Each Gaussian carries its own position, shape, opacity, and appearance — and rendering reduces to projecting ("splatting") these Gaussians onto the image plane via standard GPU rasterisation pipelines.

Definition:

3D Gaussian Primitive

A 3D Gaussian primitive is defined by the tuple (μ,Σ,α,f)(\boldsymbol{\mu}, \boldsymbol{\Sigma}, \alpha, \mathbf{f}) where:

  • μR3\boldsymbol{\mu} \in \mathbb{R}^3 is the centre (mean position),
  • ΣR3×3\boldsymbol{\Sigma} \in \mathbb{R}^{3 \times 3} is a positive-definite covariance matrix encoding shape and orientation,
  • α[0,1]\alpha \in [0, 1] is the opacity,
  • f\mathbf{f} is a feature vector encoding appearance (colour, or in the RF setting, scattering attributes).

The spatial influence of the primitive is given by the 3D Gaussian density:

G(p)=exp ⁣(12(pμ)TΣ1(pμ)).G(\mathbf{p}) = \exp\!\left(-\tfrac{1}{2}(\mathbf{p} - \boldsymbol{\mu})^\mathsf{T} \boldsymbol{\Sigma}^{-1} (\mathbf{p} - \boldsymbol{\mu})\right).

The covariance is parameterised as Σ=RSSTRT\boldsymbol{\Sigma} = \mathbf{R}\mathbf{S}\mathbf{S}^\mathsf{T}\mathbf{R}^\mathsf{T} with rotation matrix RSO(3)\mathbf{R} \in \text{SO}(3) (stored as a unit quaternion) and diagonal scale matrix S=diag(sx,sy,sz)\mathbf{S} = \text{diag}(s_x, s_y, s_z).

Definition:

3D Gaussian Splatting Scene Representation

A 3DGS scene consists of NN Gaussian primitives:

G={(μk,Σk,αk,fk)}k=1N.\mathcal{G} = \{(\boldsymbol{\mu}_k, \boldsymbol{\Sigma}_k, \alpha_k, \mathbf{f}_k)\}_{k=1}^N.

The scene is fully characterised by the parameter set Θ={μk,qk,sk,αk,fk}k=1N\Theta = \{\boldsymbol{\mu}_k, \mathbf{q}_k, \mathbf{s}_k, \alpha_k, \mathbf{f}_k\}_{k=1}^N, where qkR4\mathbf{q}_k \in \mathbb{R}^4 is the unit quaternion encoding the rotation and sk=(sx,sy,sz)k\mathbf{s}_k = (s_x, s_y, s_z)_k encodes the scale. Typical scenes use N105N \sim 10^5 to 10610^6 Gaussians for high-quality reconstruction.

Definition:

Differentiable Rasterisation (Splatting)

Differentiable rasterisation renders an image by projecting each 3D Gaussian onto the image plane and compositing in depth order. Given a camera with world-to-camera transform W\mathbf{W} and projection Jacobian J\mathbf{J}, the 3D covariance projects to a 2D covariance:

Σ=JWΣWTJT.\boldsymbol{\Sigma}' = \mathbf{J}\mathbf{W}\boldsymbol{\Sigma}\mathbf{W}^\mathsf{T}\mathbf{J}^\mathsf{T}.

The rendered value at pixel u\mathbf{u} is:

C^(u)=kN(u)fkαkGk(u)j<k(1αjGj(u)),\hat{C}(\mathbf{u}) = \sum_{k \in \mathcal{N}(\mathbf{u})} \mathbf{f}_k \, \alpha_k \, G_k(\mathbf{u}) \prod_{j < k} \bigl(1 - \alpha_j \, G_j(\mathbf{u})\bigr),

where Gk(u)G_k(\mathbf{u}) is the 2D Gaussian evaluated at u\mathbf{u}, the product runs over Gaussians closer to the camera (front-to-back ordering), and N(u)\mathcal{N}(\mathbf{u}) is the set of Gaussians whose projected footprint overlaps u\mathbf{u}.

The entire pipeline is differentiable: gradients of a photometric loss L\mathcal{L} flow back through the compositing and projection to update every Gaussian parameter Θ\Theta.

Theorem: Alpha Compositing as Discretised Volume Rendering

The front-to-back alpha compositing of 3DGS:

C^(u)=k=1KfkαkGk(u)j=1k1(1αjGj(u))\hat{C}(\mathbf{u}) = \sum_{k=1}^K \mathbf{f}_k \, \alpha_k \, G_k(\mathbf{u}) \prod_{j=1}^{k-1} \bigl(1 - \alpha_j \, G_j(\mathbf{u})\bigr)

is the Riemann-sum discretisation of the NeRF volume rendering integral (Chapter 24):

C(r)=0T(t)σ(r(t))c(r(t),d)dt,T(t)=exp ⁣(0tσ(r(s))ds),C(\mathbf{r}) = \int_0^\infty T(t) \, \sigma(\mathbf{r}(t)) \, \mathbf{c}(\mathbf{r}(t), \mathbf{d}) \, dt, \quad T(t) = \exp\!\left(-\int_0^t \sigma(\mathbf{r}(s))\,ds\right),

where each Gaussian's contribution αkGk(u)\alpha_k G_k(\mathbf{u}) corresponds to the opacity 1exp(σkδk)1 - \exp(-\sigma_k \delta_k) at the kk-th sample along the ray.

Both NeRF and 3DGS solve the same rendering problem. NeRF samples the density field along each ray and integrates; 3DGS projects analytic Gaussians onto the image plane and sums. The connection is that splatting a Gaussian is equivalent to evaluating the volume rendering integral analytically for a Gaussian density profile.

,

Definition:

Adaptive Density Control

3DGS uses adaptive density control to add, remove, and split Gaussians during training:

  1. Densification by cloning: Gaussians in under-reconstructed regions (high positional gradient L/μk>τμ\|\partial \mathcal{L}/\partial \boldsymbol{\mu}_k\| > \tau_\mu) are cloned — duplicated with a small offset.
  2. Densification by splitting: Large Gaussians covering too much area (scale sk>τs\|\mathbf{s}_k\| > \tau_s) are split into two smaller Gaussians.
  3. Pruning: Gaussians with opacity below a threshold αk<ϵα\alpha_k < \epsilon_\alpha are removed.
  4. Opacity reset: Periodically, all opacities are reduced to encourage removing unnecessary Gaussians.

This adaptive scheme is critical: it allows the representation to allocate resolution where the scene is complex and remain sparse elsewhere.

3DGS Training Pipeline

Complexity: O(NP)O(N \cdot P) per iteration, where NN is the number of Gaussians and PP is the number of pixels. Tile-based rasterisation reduces this to O(N+TK)O(N + T \cdot K) where TT is the number of tiles and KK the average Gaussians per tile.
Input: Multi-view images {Ii,Πi}i=1M\{I_i, \Pi_i\}_{i=1}^M with camera poses Πi\Pi_i;
initial point cloud from SfM (e.g., COLMAP)
Output: Optimised Gaussian set G\mathcal{G}^*
1. Initialise G(0)\mathcal{G}^{(0)} from SfM points: μk\boldsymbol{\mu}_k = point position,
sk\mathbf{s}_k from nearest-neighbour distances, αk=0.1\alpha_k = 0.1, fk\mathbf{f}_k from point colour
2. for epoch =1,,E= 1, \ldots, E do
3. \quad Sample a training view iUniform({1,,M})i \sim \text{Uniform}(\{1, \ldots, M\})
4. \quad Render: I^i=Rasterise(G,Πi)\hat{I}_i = \text{Rasterise}(\mathcal{G}, \Pi_i) (differentiable splatting)
5. \quad Compute loss: L=(1λSSIM)I^iIi1+λSSIMLSSIM(I^i,Ii)\mathcal{L} = (1 - \lambda_{\text{SSIM}})\|\hat{I}_i - I_i\|_1 + \lambda_{\text{SSIM}} \cdot \mathcal{L}_{\text{SSIM}}(\hat{I}_i, I_i)
6. \quad Backpropagate: ΘL\nabla_\Theta \mathcal{L} through differentiable rasteriser
7. \quad Update: ΘΘηΘL\Theta \leftarrow \Theta - \eta \nabla_\Theta \mathcal{L} (Adam optimiser)
8. \quad if epoch modD=0\bmod D = 0 then
9. \quad\quad Apply adaptive density control (clone, split, prune)
10. \quad end if
11. end for

The training is an order of magnitude faster than NeRF because rendering uses rasterisation (forward pass through sorted tiles) rather than ray marching (hundreds of MLP queries per ray).

Example: 3DGS vs NeRF --- Rendering Speed Comparison

Compare the rendering speed and training time of 3DGS with the original NeRF and Instant-NGP on a standard benchmark (e.g., Mip-NeRF 360 dataset at 10801080p resolution).

Historical Note: From EWA Splatting to 3DGS

2000--2023

Point-based rendering and splatting have a long history in computer graphics. Zwicker et al. (2001) introduced EWA (elliptical weighted average) splatting, which projects 3D ellipsoids onto the image plane as 2D ellipses --- essentially the same geometric operation that 3DGS uses. Pfister et al. (2000) developed surfels (surface elements) as oriented disc primitives. What Kerbl et al. (2023) contributed was the combination of (1) differentiable rasterisation enabling gradient-based optimisation, (2) adaptive density control for automatic resolution allocation, and (3) spherical harmonics for view-dependent appearance --- turning a rendering primitive into a learnable scene representation.

The speed advantage of splatting over ray marching was well known in the graphics community. The insight of 3DGS was that this speed advantage could be combined with analysis-through-synthesis optimisation to create an explicit scene representation that rivals the quality of implicit neural representations.

,

2D Gaussian Splatting Demonstration

Visualise how a collection of 2D Gaussians renders an image through alpha compositing. Adjust the number of Gaussians, their scale, and opacity to see how the representation quality changes.

Parameters
100
0.05
0.5

Quick Check

In 3D Gaussian Splatting, the covariance matrix Σk\boldsymbol{\Sigma}_k is parameterised as Σk=RkSkSkTRkT\boldsymbol{\Sigma}_k = \mathbf{R}_k \mathbf{S}_k \mathbf{S}_k^\mathsf{T} \mathbf{R}_k^\mathsf{T}. Why is this parameterisation preferred over directly optimising Σk\boldsymbol{\Sigma}_k?

It reduces the number of free parameters from 6 to 3

It guarantees that Σk\boldsymbol{\Sigma}_k remains positive semi-definite during optimisation

It allows faster matrix inversion

It enables use of spherical harmonics for view dependence

Common Mistake: 3DGS Is Not True Volumetric Rendering

Mistake:

Assuming that 3DGS performs exact volumetric integration along each ray, equivalent to NeRF.

Correction:

3DGS uses a rasterisation-based approximation: each Gaussian is projected to 2D and composited in depth order. This is a Riemann-sum approximation to the volume rendering integral, not an exact integration. The approximation can produce artifacts when Gaussians overlap significantly in depth or when the scene has complex occlusion patterns. For RF applications, where the "scene" is a collection of scatterers rather than a continuous density field, this approximation is often acceptable.

Splatting

A rendering technique where 3D primitives (typically ellipsoids or Gaussians) are projected ("splatted") onto a 2D image plane. Each primitive contributes a weighted footprint to the image, and overlapping contributions are composited. Splatting is the dual of ray casting: instead of shooting rays through pixels and querying the scene, the scene projects itself onto the image.

Related: Differentiable Rendering

Differentiable Rendering

A rendering pipeline designed so that the gradient of a loss function (comparing rendered and observed images) can be computed with respect to all scene parameters via backpropagation. This enables gradient-based optimisation of scene geometry, appearance, and camera parameters from image observations alone.

Related: Splatting, Analysis Through Synthesis

Analysis Through Synthesis

An inverse-problem strategy where scene parameters are estimated by synthesising (rendering) observations from a parameterised model and comparing them to actual measurements. The parameters are then updated to minimise the discrepancy. In the 3DGS context, the Gaussian parameters are the "scene model" and differentiable rasterisation is the "synthesis" step.

Related: Differentiable Rendering

Key Takeaway

3D Gaussian Splatting represents scenes as explicit collections of anisotropic 3D Gaussians, each carrying position, shape, opacity, and appearance attributes. Differentiable rasterisation enables gradient-based optimisation from multi-view images, achieving training times of minutes and rendering at >100> 100 FPS --- orders of magnitude faster than NeRF. The alpha compositing formula is a discretisation of the same volume rendering integral used by NeRF, connecting the two representations theoretically.

3D Gaussian Splatting: Alpha Compositing Pipeline

Three 3D Gaussians are projected onto the image plane and composited front-to-back via alpha blending: C=iciαij<i(1αj)C = \sum_i c_i \alpha_i \prod_{j<i}(1-\alpha_j). Each Gaussian contributes opacity and color to the final pixel. This differentiable rasterization pipeline — orders of magnitude faster than volumetric ray marching — is the foundation of RF-3DGS, RFCanvas, and RadarSplat.