Ferkans — Interactive Telecom Tutor

Why Multi-View Geometry Matters for RF

Every RF imaging system with multiple Tx-Rx pairs is, at its core, a multi-view system. Each Tx-Rx pair "views" the scene from a different geometric perspective — just as cameras in a stereo rig observe the same scene from different positions. The mathematical machinery of multi-view geometry — epipolar constraints, the fundamental matrix, bundle adjustment — transfers directly to RF with one critical difference: RF measurements are coherent (phase-bearing), while camera images are incoherent (intensity only).

This section develops the optical multi-view framework; Section 28.3 adapts it to RF wave propagation.

Definition:
Pinhole Camera Model and Projection

A pinhole camera maps a 3D point $\mathbf{P} = [X, Y, Z]^\mathsf{T}$ in world coordinates to a 2D image point $\mathbf{x} = [u, v]^\mathsf{T}$ via perspective projection:

$\tilde{\mathbf{x}} = \mathbf{K}[\mathbf{R} \mid \mathbf{t}]\,\tilde{\mathbf{P}},$

where $\tilde{\mathbf{x}} \in \mathbb{R}^3$ and $\tilde{\mathbf{P}} \in \mathbb{R}^4$ are homogeneous coordinates, $\mathbf{R} \in SO(3)$ and $\mathbf{t} \in \mathbb{R}^3$ are the camera extrinsics (rotation and translation), and the intrinsic matrix is:

$\mathbf{K} = \begin{bmatrix} f_x & s & c_x \\ 0 & f_y & c_y \\ 0 & 0 & 1 \end{bmatrix},$

with focal lengths $f_x, f_y$ , principal point $(c_x, c_y)$ , and skew $s$ (usually 0).

The full $3 \times 4$ projection matrix $\mathbf{P} = \mathbf{K}[\mathbf{R} \mid \mathbf{t}]$ has 11 degrees of freedom (6 extrinsic + 5 intrinsic).

Definition:
Epipolar Geometry and the Fundamental Matrix

Given two cameras observing the same 3D point $\mathbf{P}$ , the projections $\mathbf{x}_1$ and $\mathbf{x}_2$ in the two images satisfy the epipolar constraint:

$\tilde{\mathbf{x}}_2^\mathsf{T}\,\mathbf{F}\,\tilde{\mathbf{x}}_1 = 0,$

where $\mathbf{F} \in \mathbb{R}^{3 \times 3}$ is the fundamental matrix (rank 2, 7 degrees of freedom).

Geometric interpretation: The point $\mathbf{x}_1$ in image 1 constrains the corresponding point $\mathbf{x}_2$ in image 2 to lie on the epipolar line $\ell_2 = \mathbf{F}\,\tilde{\mathbf{x}}_1$ . This reduces stereo matching from a 2D search to a 1D search.

When the cameras are calibrated (intrinsics $\mathbf{K}_1, \mathbf{K}_2$ known), the fundamental matrix factors as:

$\mathbf{F} = \mathbf{K}_2^{-\mathsf{T}}\,\mathbf{E}\,\mathbf{K}_1^{-1},$

where $\mathbf{E} = [\mathbf{t}]_\times \mathbf{R}$ is the essential matrix (5 DOF), with $[\mathbf{t}]_\times$ the skew-symmetric matrix of the baseline translation.

The fundamental matrix encodes the relative geometry between two views. It can be estimated from $\geq 8$ point correspondences (8-point algorithm) or $\geq 7$ with the 7-point algorithm exploiting the rank-2 constraint.

Theorem: Properties of the Essential Matrix

The essential matrix $\mathbf{E} = [\mathbf{t}]_\times \mathbf{R}$ satisfies:

$\mathbf{E}$ has rank 2 and exactly two equal nonzero singular values.
The SVD of $\mathbf{E}$ is $\mathbf{E} = \mathbf{U}\,\text{diag}(\sigma, \sigma, 0)\,\mathbf{V}^\mathsf{T}$ , where $\sigma = \|\mathbf{t}\|$ .
Given $\mathbf{E}$ , the rotation and translation can be recovered (up to a four-fold ambiguity resolved by the positive-depth constraint).

The essential matrix captures the rigid-body geometry between two calibrated cameras. Its rank-2 structure reflects the fact that a single point correspondence constrains but does not determine the 3D point — one degree of freedom (depth) remains.

Proof

Rank of $[\mathbf{t}]_\times$

The matrix $[\mathbf{t}]_\times$ is a $3 \times 3$ skew-symmetric matrix of rank 2 (for $\mathbf{t} \neq \mathbf{0}$ ), with null space $\text{null}([\mathbf{t}]_\times) = \text{span}(\mathbf{t})$ .

Rank of $\mathbf{E}$

Since $\mathbf{R}$ is full rank, $\text{rank}(\mathbf{E}) = \text{rank}([\mathbf{t}]_\times \mathbf{R}) = \text{rank}([\mathbf{t}]_\times) = 2$ .

Singular values

$\mathbf{E}^\mathsf{T}\mathbf{E} = \mathbf{R}^\mathsf{T}[\mathbf{t}]_\times^\mathsf{T}[\mathbf{t}]_\times \mathbf{R}$ . Since $[\mathbf{t}]_\times^\mathsf{T}[\mathbf{t}]_\times = \|\mathbf{t}\|^2 \mathbf{I} - \mathbf{t}\mathbf{t}^\mathsf{T}$ , the eigenvalues of this matrix are $\|\mathbf{t}\|^2$ (multiplicity 2) and $0$ . Since $\mathbf{R}$ preserves singular values, $\sigma_1(\mathbf{E}) = \sigma_2(\mathbf{E}) = \|\mathbf{t}\|$ and $\sigma_3(\mathbf{E}) = 0$ . $\blacksquare$

Epipolar Geometry Visualisation

Visualise epipolar geometry for a stereo camera pair. A 3D point projects onto both images; the epipolar lines show the geometric constraint between correspondences. Increasing the baseline separates the epipoles further and increases the disparity (the displacement between corresponding points), which improves depth estimation precision.

Parameters

Baseline distance (m)1

Focal length (pixels)500

Point depth (m)5

Definition:
Structure from Motion (SfM)

Structure from Motion jointly estimates 3D scene structure (a sparse point cloud) and camera poses from a collection of unposed images:

Feature extraction: Detect and describe keypoints in each image (e.g., SIFT, SuperPoint).
Feature matching: Find correspondences between image pairs.
Geometric verification: Filter matches using epipolar geometry (the fundamental matrix $\mathbf{F}$ satisfies $\tilde{\mathbf{x}}_2^\mathsf{T} \mathbf{F}\, \tilde{\mathbf{x}}_1 = 0$ for corresponding points).
Bundle adjustment: Jointly optimise 3D point positions $\{\mathbf{P}_j\}$ and camera parameters $\{\mathbf{K}_k, \mathbf{R}_k, \mathbf{t}_k\}$ by minimising the reprojection error:

$\min_{\{\mathbf{P}_j\}, \{\mathbf{R}_k, \mathbf{t}_k\}} \sum_{k,j} \rho\!\left(\|\pi(\mathbf{K}_k, \mathbf{R}_k, \mathbf{t}_k, \mathbf{P}_j) - \mathbf{x}_{k,j}\|^2\right),$

where $\pi$ is the projection function and $\rho$ is a robust loss (e.g., Huber).

SfM is the standard preprocessing step for NeRF and 3DGS: it provides the camera poses needed for training. COLMAP is the most widely used SfM pipeline.

Definition:
Bundle Adjustment

Bundle adjustment is a nonlinear least-squares optimisation that jointly refines 3D point positions and camera parameters. Let $\theta = (\{\mathbf{P}_j\}, \{\mathbf{R}_k, \mathbf{t}_k\})$ denote the unknowns. The cost function is:

$\min_\theta \sum_{(k,j) \in \mathcal{V}} \rho\!\left(\|\pi_k(\mathbf{P}_j; \theta) - \mathbf{x}_{k,j}\|^2\right),$

where $\mathcal{V}$ is the set of visibility pairs (point $j$ seen in camera $k$ ).

The Jacobian of this system has a sparse block structure (each observation depends on exactly one point and one camera), enabling the Schur complement trick: eliminate point variables first, then solve a reduced system over camera variables only.

Levenberg-Marquardt is the standard solver, with cost per iteration $O(|\mathcal{V}|\,c^2 + n_c^3)$ , where $c$ is the camera parameter dimension and $n_c$ is the number of cameras.

The Schur complement trick reduces a system with millions of 3D points and hundreds of cameras to a dense system of size $\sim 6 n_c$ , making bundle adjustment tractable for large-scale SfM.

Example: The COLMAP SfM Pipeline

Describe the steps to go from a set of uncalibrated photographs to camera poses suitable for training a NeRF.

Solution

Image collection

Capture $100$ -- $300$ images of the scene from diverse viewpoints, with sufficient overlap ( $> 60\%$ between adjacent views).

SfM with COLMAP

Feature extraction (SIFT): detect $\sim 5000$ keypoints per image.
Exhaustive matching: compare all image pairs and find correspondences.
Incremental SfM: register cameras one by one; after each camera is added, run bundle adjustment to refine all parameters.
Output: camera intrinsics $\mathbf{K}_k$ , extrinsics $(\mathbf{R}_k, \mathbf{t}_k)$ , and a sparse 3D point cloud.

NeRF/3DGS training

Use COLMAP poses and the original images as training data. For 3DGS, initialise the Gaussians from COLMAP's sparse point cloud. For NeRF, the poses define the ray origins and directions for volume rendering.

Why This Matters: Multi-View Geometry in RF Imaging

The geometry of multi-view imaging has direct parallels in RF:

SfM $\leftrightarrow$ Array calibration: Estimating antenna positions and orientations from calibration measurements is analogous to camera pose estimation in SfM.
Epipolar geometry $\leftrightarrow$ Range-azimuth ambiguity: The fundamental matrix constrains where a correspondence can appear; similarly, the range-azimuth ambiguity in monostatic radar constrains where a scatterer can be localised.
Bundle adjustment $\leftrightarrow$ Autofocus: Joint estimation of scene and nuisance parameters (camera poses / phase errors) from measurements.
Stereo disparity $\leftrightarrow$ Bistatic range difference: In stereo vision, depth is recovered from disparity; in bistatic radar, the target position is recovered from the range difference between Tx and Rx paths.

These parallels motivate adapting computer vision's mature 3D reconstruction pipeline to RF imaging problems.

,

Quick Check

The fundamental matrix $\mathbf{F}$ has 7 degrees of freedom. What is the minimum number of point correspondences needed to estimate it (using the classical linear method)?

5

7

8

11

Correction:

7

The 7-point algorithm exploits the rank-2 constraint on $\mathbf{F}$ to find a solution from exactly 7 correspondences.

Historical Note: From Photogrammetry to Computer Vision

1981--1997

Epipolar geometry was first formalised in the context of aerial photogrammetry in the early 20th century, where overlapping photographs from aircraft were used to create topographic maps. The fundamental matrix was introduced by Luong and Faugeras in 1996, unifying earlier work on the essential matrix (Longuet-Higgins, 1981) with uncalibrated cameras. The 8-point algorithm, rediscovered by Hartley in 1997, demonstrated that careful normalisation of point coordinates makes the linear estimation of $\mathbf{F}$ practical and numerically stable.

Epipolar Line

The line in image 2 on which the projection of a 3D point must lie, given its projection in image 1. Computed as $\ell_2 = \mathbf{F}\,\tilde{\mathbf{x}}_1$ .

Bundle Adjustment

Nonlinear least-squares refinement of 3D point positions and camera parameters by minimising the total reprojection error across all views and observed points.

Related: Bundle Adjustment

Common Mistake: SfM Scale Ambiguity

Mistake:

Assuming that SfM recovers metric (absolute) scale from images alone.

Correction:

Monocular SfM recovers structure and motion only up to an unknown global scale factor. The fundamental matrix encodes epipolar geometry but not the absolute baseline length. To recover metric scale, you need at least one known distance (a calibration object) or additional sensor data (GPS, IMU, known object size). In RF imaging, the carrier wavelength provides a natural scale reference that optical SfM lacks.

Key Takeaway

Multi-view geometry — epipolar constraints, the fundamental/essential matrix, and bundle adjustment — provides the mathematical backbone for 3D reconstruction from 2D observations. These concepts transfer directly to RF imaging: Tx-Rx pairs are "cameras," range measurements replace pixel disparities, and autofocus is the RF analog of bundle adjustment.

Multi-View Stereo and Structure from Motion

Why Multi-View Geometry Matters for RF

Definition: Pinhole Camera Model and Projection

Definition: Epipolar Geometry and the Fundamental Matrix

Theorem: Properties of the Essential Matrix

Rank of $[\mathbf{t}]_\times$

Rank of $\mathbf{E}$

Singular values

Epipolar Geometry Visualisation

Parameters

Definition: Structure from Motion (SfM)

Definition: Bundle Adjustment

Example: The COLMAP SfM Pipeline

Image collection

SfM with COLMAP

NeRF/3DGS training

Why This Matters: Multi-View Geometry in RF Imaging

Quick Check

Historical Note: From Photogrammetry to Computer Vision

Epipolar Line

Bundle Adjustment

Common Mistake: SfM Scale Ambiguity

Key Takeaway

Definition:
Pinhole Camera Model and Projection

Definition:
Epipolar Geometry and the Fundamental Matrix

Definition:
Structure from Motion (SfM)

Definition:
Bundle Adjustment