Multi-View Stereo and Structure from Motion
Why Multi-View Geometry Matters for RF
Every RF imaging system with multiple Tx-Rx pairs is, at its core, a multi-view system. Each Tx-Rx pair "views" the scene from a different geometric perspective β just as cameras in a stereo rig observe the same scene from different positions. The mathematical machinery of multi-view geometry β epipolar constraints, the fundamental matrix, bundle adjustment β transfers directly to RF with one critical difference: RF measurements are coherent (phase-bearing), while camera images are incoherent (intensity only).
This section develops the optical multi-view framework; Section 28.3 adapts it to RF wave propagation.
Definition: Pinhole Camera Model and Projection
Pinhole Camera Model and Projection
A pinhole camera maps a 3D point in world coordinates to a 2D image point via perspective projection:
where and are homogeneous coordinates, and are the camera extrinsics (rotation and translation), and the intrinsic matrix is:
with focal lengths , principal point , and skew (usually 0).
The full projection matrix has 11 degrees of freedom (6 extrinsic + 5 intrinsic).
Definition: Epipolar Geometry and the Fundamental Matrix
Epipolar Geometry and the Fundamental Matrix
Given two cameras observing the same 3D point , the projections and in the two images satisfy the epipolar constraint:
where is the fundamental matrix (rank 2, 7 degrees of freedom).
Geometric interpretation: The point in image 1 constrains the corresponding point in image 2 to lie on the epipolar line . This reduces stereo matching from a 2D search to a 1D search.
When the cameras are calibrated (intrinsics known), the fundamental matrix factors as:
where is the essential matrix (5 DOF), with the skew-symmetric matrix of the baseline translation.
The fundamental matrix encodes the relative geometry between two views. It can be estimated from point correspondences (8-point algorithm) or with the 7-point algorithm exploiting the rank-2 constraint.
Theorem: Properties of the Essential Matrix
The essential matrix satisfies:
- has rank 2 and exactly two equal nonzero singular values.
- The SVD of is , where .
- Given , the rotation and translation can be recovered (up to a four-fold ambiguity resolved by the positive-depth constraint).
The essential matrix captures the rigid-body geometry between two calibrated cameras. Its rank-2 structure reflects the fact that a single point correspondence constrains but does not determine the 3D point β one degree of freedom (depth) remains.
Rank of $[\mathbf{t}]_\times$
The matrix is a skew-symmetric matrix of rank 2 (for ), with null space .
Rank of $\mathbf{E}$
Since is full rank, .
Singular values
. Since , the eigenvalues of this matrix are (multiplicity 2) and . Since preserves singular values, and .
Epipolar Geometry Visualisation
Visualise epipolar geometry for a stereo camera pair. A 3D point projects onto both images; the epipolar lines show the geometric constraint between correspondences. Increasing the baseline separates the epipoles further and increases the disparity (the displacement between corresponding points), which improves depth estimation precision.
Parameters
Definition: Structure from Motion (SfM)
Structure from Motion (SfM)
Structure from Motion jointly estimates 3D scene structure (a sparse point cloud) and camera poses from a collection of unposed images:
- Feature extraction: Detect and describe keypoints in each image (e.g., SIFT, SuperPoint).
- Feature matching: Find correspondences between image pairs.
- Geometric verification: Filter matches using epipolar geometry (the fundamental matrix satisfies for corresponding points).
- Bundle adjustment: Jointly optimise 3D point positions and camera parameters by minimising the reprojection error:
where is the projection function and is a robust loss (e.g., Huber).
SfM is the standard preprocessing step for NeRF and 3DGS: it provides the camera poses needed for training. COLMAP is the most widely used SfM pipeline.
Definition: Bundle Adjustment
Bundle Adjustment
Bundle adjustment is a nonlinear least-squares optimisation that jointly refines 3D point positions and camera parameters. Let denote the unknowns. The cost function is:
where is the set of visibility pairs (point seen in camera ).
The Jacobian of this system has a sparse block structure (each observation depends on exactly one point and one camera), enabling the Schur complement trick: eliminate point variables first, then solve a reduced system over camera variables only.
Levenberg-Marquardt is the standard solver, with cost per iteration , where is the camera parameter dimension and is the number of cameras.
The Schur complement trick reduces a system with millions of 3D points and hundreds of cameras to a dense system of size , making bundle adjustment tractable for large-scale SfM.
Example: The COLMAP SfM Pipeline
Describe the steps to go from a set of uncalibrated photographs to camera poses suitable for training a NeRF.
Image collection
Capture -- images of the scene from diverse viewpoints, with sufficient overlap ( between adjacent views).
SfM with COLMAP
- Feature extraction (SIFT): detect keypoints per image.
- Exhaustive matching: compare all image pairs and find correspondences.
- Incremental SfM: register cameras one by one; after each camera is added, run bundle adjustment to refine all parameters.
- Output: camera intrinsics , extrinsics , and a sparse 3D point cloud.
NeRF/3DGS training
Use COLMAP poses and the original images as training data. For 3DGS, initialise the Gaussians from COLMAP's sparse point cloud. For NeRF, the poses define the ray origins and directions for volume rendering.
Why This Matters: Multi-View Geometry in RF Imaging
The geometry of multi-view imaging has direct parallels in RF:
-
SfM Array calibration: Estimating antenna positions and orientations from calibration measurements is analogous to camera pose estimation in SfM.
-
Epipolar geometry Range-azimuth ambiguity: The fundamental matrix constrains where a correspondence can appear; similarly, the range-azimuth ambiguity in monostatic radar constrains where a scatterer can be localised.
-
Bundle adjustment Autofocus: Joint estimation of scene and nuisance parameters (camera poses / phase errors) from measurements.
-
Stereo disparity Bistatic range difference: In stereo vision, depth is recovered from disparity; in bistatic radar, the target position is recovered from the range difference between Tx and Rx paths.
These parallels motivate adapting computer vision's mature 3D reconstruction pipeline to RF imaging problems.
Quick Check
The fundamental matrix has 7 degrees of freedom. What is the minimum number of point correspondences needed to estimate it (using the classical linear method)?
5
7
8
11
The 7-point algorithm exploits the rank-2 constraint on to find a solution from exactly 7 correspondences.
Historical Note: From Photogrammetry to Computer Vision
1981--1997Epipolar geometry was first formalised in the context of aerial photogrammetry in the early 20th century, where overlapping photographs from aircraft were used to create topographic maps. The fundamental matrix was introduced by Luong and Faugeras in 1996, unifying earlier work on the essential matrix (Longuet-Higgins, 1981) with uncalibrated cameras. The 8-point algorithm, rediscovered by Hartley in 1997, demonstrated that careful normalisation of point coordinates makes the linear estimation of practical and numerically stable.
Epipolar Line
The line in image 2 on which the projection of a 3D point must lie, given its projection in image 1. Computed as .
Bundle Adjustment
Nonlinear least-squares refinement of 3D point positions and camera parameters by minimising the total reprojection error across all views and observed points.
Related: Bundle Adjustment
Common Mistake: SfM Scale Ambiguity
Mistake:
Assuming that SfM recovers metric (absolute) scale from images alone.
Correction:
Monocular SfM recovers structure and motion only up to an unknown global scale factor. The fundamental matrix encodes epipolar geometry but not the absolute baseline length. To recover metric scale, you need at least one known distance (a calibration object) or additional sensor data (GPS, IMU, known object size). In RF imaging, the carrier wavelength provides a natural scale reference that optical SfM lacks.
Key Takeaway
Multi-view geometry β epipolar constraints, the fundamental/essential matrix, and bundle adjustment β provides the mathematical backbone for 3D reconstruction from 2D observations. These concepts transfer directly to RF imaging: Tx-Rx pairs are "cameras," range measurements replace pixel disparities, and autofocus is the RF analog of bundle adjustment.