Fast Algorithms for Structured Operators

The Computational Bottleneck

Every iterative reconstruction algorithm β€” ISTA, FISTA, ADMM, Chambolle--Pock β€” requires repeated evaluation of the forward operator A\mathbf{A} and its adjoint AH\mathbf{A}^{H}. For a 2D imaging problem with MM measurements and QQ voxels, a naive matrix-vector product costs O(MQ)O(MQ) floating-point operations. When M=2048M = 2048 and Q=16,384Q = 16{,}384, this is manageable. But in 3D problems or dense multi-frequency configurations, MM and QQ can each exceed 10610^6, and storing the full matrix β€” let alone multiplying by it β€” becomes infeasible.

The point is that the sensing operator in RF imaging is not an arbitrary dense matrix. It has structure inherited from the physics: separable array geometries produce Kronecker products, uniform grids produce Fourier structure, and the combination of both yields algorithms that are orders of magnitude faster than the naive approach.

Definition:

Kronecker Product

For A1∈CM1Γ—N1\mathbf{A}_{1} \in \mathbb{C}^{M_1 \times N_1} and A2∈CM2Γ—N2\mathbf{A}_{2} \in \mathbb{C}^{M_2 \times N_2}, the Kronecker product A1βŠ—A2∈CM1M2Γ—N1N2\mathbf{A}_{1} \otimes \mathbf{A}_{2} \in \mathbb{C}^{M_1 M_2 \times N_1 N_2} is defined by

(A1βŠ—A2)(i1βˆ’1)M2+i2,β€…β€Š(j1βˆ’1)N2+j2=[A1]i1,j1 [A2]i2,j2.(\mathbf{A}_{1} \otimes \mathbf{A}_{2})_{(i_1-1)M_2 + i_2,\;(j_1-1)N_2 + j_2} = [\mathbf{A}_{1}]_{i_1,j_1}\,[\mathbf{A}_{2}]_{i_2,j_2}.

Equivalently, A1βŠ—A2\mathbf{A}_{1} \otimes \mathbf{A}_{2} is the block matrix whose (i1,j1)(i_1, j_1)-block is [A1]i1,j1 A2[\mathbf{A}_{1}]_{i_1,j_1}\,\mathbf{A}_{2}.

The Kronecker product appears naturally whenever the sensing geometry is separable β€” for example, when the transmit array, receive array, and frequency grid factor independently. This is the typical situation in our RF imaging system (The Kronecker Structure and Sensing Operator Properties).

Theorem: The Vec Trick for Kronecker Products

Let A1∈CM1Γ—N1\mathbf{A}_{1} \in \mathbb{C}^{M_1 \times N_1}, A2∈CM2Γ—N2\mathbf{A}_{2} \in \mathbb{C}^{M_2 \times N_2}, and X∈CN2Γ—N1\mathbf{X} \in \mathbb{C}^{N_2 \times N_1}. Then

(A1βŠ—A2) vec(X)=vec(A2 X A1T).(\mathbf{A}_{1} \otimes \mathbf{A}_{2})\,\text{vec}(\mathbf{X}) = \text{vec}(\mathbf{A}_{2}\,\mathbf{X}\,\mathbf{A}_{1}^{T}).

Instead of forming the M1M2Γ—N1N2M_1 M_2 \times N_1 N_2 Kronecker product and multiplying by the N1N2N_1 N_2-dimensional vectorized image, we can reshape the image into a matrix and multiply from both sides. The right-hand side involves two smaller matrix products, which is dramatically cheaper.

Complexity Reduction via the Vec Trick

The naive product (A1βŠ—A2) vec(X)(\mathbf{A}_{1} \otimes \mathbf{A}_{2})\, \text{vec}(\mathbf{X}) costs O(M1M2N1N2)O(M_1 M_2 N_1 N_2) operations and requires storing the full Kronecker product.

The factored computation vec(A2 X A1T)\text{vec}(\mathbf{A}_{2}\,\mathbf{X}\,\mathbf{A}_{1}^{T}) costs:

  • A2X\mathbf{A}_{2} \mathbf{X}: O(M2N2N1)O(M_2 N_2 N_1)
  • Result Γ—A1T\times \mathbf{A}_{1}^{T}: O(M2M1N1)O(M_2 M_1 N_1)
  • Total: O(M2N2N1+M1M2N1)=O(N1(M2N2+M1M2))O(M_2 N_2 N_1 + M_1 M_2 N_1) = O(N_1(M_2 N_2 + M_1 M_2))

For our RF imaging system with M1=M2=32M_1 = M_2 = 32 and N1=N2=128N_1 = N_2 = 128, the naive approach requires ∼109\sim 10^9 operations. The factored approach requires ∼106\sim 10^6 β€” a three-orders-of-magnitude reduction.

Storage is equally dramatic: the full matrix occupies M1M2N1N2=2048Γ—16,384β‰ˆ33.5M_1 M_2 N_1 N_2 = 2048 \times 16{,}384 \approx 33.5 million complex entries, while the two factors together need only M1N1+M2N2=32Γ—128+32Γ—128=8,192M_1 N_1 + M_2 N_2 = 32 \times 128 + 32 \times 128 = 8{,}192 entries.

Example: Kronecker Structure in the RF Imaging Sensing Operator

Consider a monostatic RF imaging system where the sensing operator for a single subcarrier has the form

A=Dfreqβ‹…(XTβŠ—INr)β‹…diag(vec(BβŠ™C)),\mathbf{A} = \mathbf{D}_{\text{freq}} \cdot (\mathbf{X}^T \otimes \mathbf{I}_{N_r}) \cdot \text{diag}(\text{vec}(\mathbf{B} \odot \mathbf{C})),

where X\mathbf{X} is the pilot matrix and B,C\mathbf{B}, \mathbf{C} contain steering vector entries.

For a uniform linear array imaging a rectangular grid, show that the core of the operator β€” the steering-vector part β€” has Kronecker structure, and identify the factors.

Definition:

The Fast Fourier Transform (FFT)

The discrete Fourier transform (DFT) of x∈CN\mathbf{x} \in \mathbb{C}^N is

[Fx]k=βˆ‘n=0Nβˆ’1xn eβˆ’j2Ο€kn/N,k=0,…,Nβˆ’1.[\mathbf{F}\mathbf{x}]_k = \sum_{n=0}^{N-1} x_n\,e^{-j 2\pi kn/N}, \quad k = 0, \ldots, N-1.

The FFT computes Fx\mathbf{F}\mathbf{x} in O(Nlog⁑N)O(N \log N) operations using the Cooley-Tukey divide-and-conquer factorization, compared to O(N2)O(N^2) for the direct sum.

When the Kronecker factor Ax\mathbf{A}_{x} is a DFT matrix (as occurs for uniform spatial sampling), the matrix-vector product AxTx\mathbf{A}_{x}^{T} \mathbf{x} reduces to an FFT. This is the case whenever the imaging grid has uniform spacing in the direction corresponding to Ax\mathbf{A}_{x}.

In the Kronecker-factored computation vec(AyCAxT)\text{vec}(\mathbf{A}_{y} \mathbf{C} \mathbf{A}_{x}^{T}), if both factors are DFT matrices, each column-wise or row-wise multiplication is an FFT. The total cost drops to O(QxQylog⁑Qx+QxQylog⁑Qy)=O(Qlog⁑Q)O(Q_x Q_y \log Q_x + Q_x Q_y \log Q_y) = O(Q \log \sqrt{Q}).

Definition:

Non-Uniform FFT (NUFFT)

The non-uniform FFT evaluates sums of the form

f(ΞΊm)=βˆ‘q=1Qcq eβˆ’j κmTpq,m=1,…,M,f(\boldsymbol{\kappa}_m) = \sum_{q=1}^{Q} c_q\, e^{-j\,\boldsymbol{\kappa}_m^T \mathbf{p}_q}, \quad m = 1, \ldots, M,

where the frequency points {κm}\{\boldsymbol{\kappa}_m\} lie on a non-uniform grid, in O(Qlog⁑Q+M)O(Q \log Q + M) operations.

The NUFFT proceeds in three steps:

  1. Gridding: Convolve the non-uniform samples onto an oversampled uniform grid using a compactly supported kernel (e.g., Kaiser-Bessel or Gaussian).
  2. FFT: Apply a standard FFT on the oversampled grid.
  3. Deapodization: Correct for the convolution kernel in the transform domain.

Accuracy depends on the oversampling factor (typically 2Γ—2\times) and the kernel width (typically 6--12 points). For imaging applications, relative errors below 10βˆ’610^{-6} are standard.

The NUFFT is essential for RF imaging because the wavenumber samples {ΞΊm}\{\boldsymbol{\kappa}_m\} determined by the array geometry and carrier frequencies rarely fall on a uniform grid. The NUFFT avoids the need to interpolate measurements onto a regular grid (which introduces additional approximation error).

Historical Note: The FFT: A Discovery Rediscovered

1805--1965

The Cooley-Tukey FFT algorithm, published in 1965, reduced the DFT from O(N2)O(N^2) to O(Nlog⁑N)O(N \log N) and is often cited as one of the ten most important algorithms of the twentieth century. But the idea was not new: Gauss had described an equivalent procedure in 1805 for interpolating asteroid orbits, and Runge published a similar scheme in 1903. The 1965 rediscovery by Cooley and Tukey was driven by Cold War signal processing needs β€” specifically, detecting nuclear tests from seismic data. The timing was perfect: digital computers had become fast enough to exploit the algorithm, and the applications were urgent enough to fund the research.

Theorem: Adjoint of a Kronecker-Structured Operator

If A=A1βŠ—A2\mathbf{A} = \mathbf{A}_{1} \otimes \mathbf{A}_{2}, then the adjoint (conjugate transpose) factors as

AH=A1HβŠ—A2H,\mathbf{A}^{H} = \mathbf{A}_{1}^{H} \otimes \mathbf{A}_{2}^{H},

and the adjoint application satisfies

AHvec(Y)=vec(A2HYA1β€Ύ).\mathbf{A}^{H} \text{vec}(\mathbf{Y}) = \text{vec}(\mathbf{A}_{2}^{H} \mathbf{Y} \overline{\mathbf{A}_{1}}).

In particular, both the forward operator A\mathbf{A} and the adjoint AH\mathbf{A}^{H} can be evaluated using the vec trick, with no additional memory or algorithmic complexity.

The adjoint of a Kronecker product is the Kronecker product of the adjoints β€” the factorization is preserved. This is essential because every proximal algorithm requires both Ac\mathbf{A}\mathbf{c} (forward) and AHy\mathbf{A}^{H} \mathbf{y} (adjoint) at every iteration.

Kronecker-Factored Forward and Adjoint Operations

Complexity: O(N1(M2N2+M1M2))O(N_1(M_2 N_2 + M_1 M_2)) for forward; same for adjoint
Input: Factors A1∈CM1Γ—N1\mathbf{A}_{1} \in \mathbb{C}^{M_1 \times N_1},
A2∈CM2Γ—N2\mathbf{A}_{2} \in \mathbb{C}^{M_2 \times N_2};
image vector c∈CN1N2\mathbf{c} \in \mathbb{C}^{N_1 N_2}
Output: y=(A1βŠ—A2)c\mathbf{y} = (\mathbf{A}_{1} \otimes \mathbf{A}_{2})\mathbf{c}
1. Reshape c\mathbf{c} into C∈CN2Γ—N1\mathbf{C} \in \mathbb{C}^{N_2 \times N_1}
2. T←A2 C\mathbf{T} \leftarrow \mathbf{A}_{2} \, \mathbf{C} \qquad (cost O(M2N2N1)O(M_2 N_2 N_1))
3. R←T A1T\mathbf{R} \leftarrow \mathbf{T} \, \mathbf{A}_{1}^{T} \qquad (cost O(M2M1N1)O(M_2 M_1 N_1))
4. y←vec(R)\mathbf{y} \leftarrow \text{vec}(\mathbf{R})
---
Adjoint: c^=(A1HβŠ—A2H)y\hat{\mathbf{c}} = (\mathbf{A}_{1}^{H} \otimes \mathbf{A}_{2}^{H})\mathbf{y}
1. Reshape y\mathbf{y} into Y∈CM2Γ—M1\mathbf{Y} \in \mathbb{C}^{M_2 \times M_1}
2. T←A2HY\mathbf{T} \leftarrow \mathbf{A}_{2}^{H} \mathbf{Y}
3. R←T A1β€Ύ\mathbf{R} \leftarrow \mathbf{T} \, \overline{\mathbf{A}_{1}}
4. c^←vec(R)\hat{\mathbf{c}} \leftarrow \text{vec}(\mathbf{R})

When A1\mathbf{A}_{1} and A2\mathbf{A}_{2} are DFT matrices, steps 2--3 reduce to batched FFTs, further lowering the cost to O(N1N2log⁑max⁑(N1,N2))O(N_1 N_2 \log \max(N_1, N_2)).

Kronecker Vec Trick: Speedup over Naive Product

Compare the wall-clock time of the naive dense matrix-vector product (A1βŠ—A2)c(\mathbf{A}_{1} \otimes \mathbf{A}_{2})\mathbf{c} against the factored computation vec(A2CA1T)\text{vec}(\mathbf{A}_{2} \mathbf{C} \mathbf{A}_{1}^{T}) as the problem dimension grows. The speedup is most dramatic when the factor dimensions are balanced (M1β‰ˆM2M_1 \approx M_2, N1β‰ˆN2N_1 \approx N_2).

Parameters
64

Maximum size of each Kronecker factor

Example: NUFFT vs DFT for Non-Uniform Wavenumber Samples

In a bistatic RF imaging configuration, the wavenumber samples {ΞΊm}\{\boldsymbol{\kappa}_m\} are determined by the transmitter and receiver positions and do not lie on a uniform grid. Compare the direct evaluation

[Ac]m=βˆ‘q=1Qcq eβˆ’jΞΊmTpq[\mathbf{A}\mathbf{c}]_m = \sum_{q=1}^{Q} c_q\,e^{-j\boldsymbol{\kappa}_m^T \mathbf{p}_q}

against the NUFFT approximation in terms of accuracy and computational cost for Q=1282Q = 128^2 grid points and M=2048M = 2048 measurements.

NUFFT Approximation Error vs Kernel Width

Evaluate the relative error of the NUFFT approximation compared to direct DFT evaluation as a function of the interpolation kernel width WW. Demonstrates that W=6W = 6--1212 suffices for imaging applications.

Parameters
64
2

Common Mistake: Kronecker Factor Ordering

Mistake:

Writing A=A2βŠ—A1\mathbf{A} = \mathbf{A}_{2} \otimes \mathbf{A}_{1} instead of A=A1βŠ—A2\mathbf{A} = \mathbf{A}_{1} \otimes \mathbf{A}_{2} and then applying the vec trick as vec(A2XA1T)\text{vec}(\mathbf{A}_{2} \mathbf{X} \mathbf{A}_{1}^{T}).

Correction:

The Kronecker product is not commutative: A1βŠ—A2β‰ A2βŠ—A1\mathbf{A}_{1} \otimes \mathbf{A}_{2} \neq \mathbf{A}_{2} \otimes \mathbf{A}_{1} in general. The vec trick maps the first factor to the right-multiplication and the second factor to the left-multiplication:

(A1βŠ—A2)vec(X)=vec(A2XA1T).(\mathbf{A}_{1} \otimes \mathbf{A}_{2})\text{vec}(\mathbf{X}) = \text{vec}(\mathbf{A}_{2} \mathbf{X} \mathbf{A}_{1}^{T}).

Swapping the factors produces a permuted result. Always verify by checking the dimensions: X\mathbf{X} must be N2Γ—N1N_2 \times N_1 (rows match the second factor's columns).

Theorem: Spectral Properties of Kronecker Products

Let A1\mathbf{A}_{1} and A2\mathbf{A}_{2} have singular values {Οƒi(1)}\{\sigma_i^{(1)}\} and {Οƒj(2)}\{\sigma_j^{(2)}\} respectively. Then:

  1. The singular values of A1βŠ—A2\mathbf{A}_{1} \otimes \mathbf{A}_{2} are {Οƒi(1)Οƒj(2)}\{\sigma_i^{(1)} \sigma_j^{(2)}\} for all pairs (i,j)(i,j).

  2. βˆ₯A1βŠ—A2βˆ₯=βˆ₯A1βˆ₯β‹…βˆ₯A2βˆ₯\|\mathbf{A}_{1} \otimes \mathbf{A}_{2}\| = \|\mathbf{A}_{1}\| \cdot \|\mathbf{A}_{2}\|.

  3. ΞΊ(A1βŠ—A2)=ΞΊ(A1)β‹…ΞΊ(A2)\kappa(\mathbf{A}_{1} \otimes \mathbf{A}_{2}) = \kappa(\mathbf{A}_{1}) \cdot \kappa(\mathbf{A}_{2}), where ΞΊ\kappa denotes the condition number.

The condition number of the full sensing operator is the product of the condition numbers of the factors. If each factor is moderately ill-conditioned (say ΞΊ=100\kappa = 100), the Kronecker product can be severely ill-conditioned (ΞΊ=104\kappa = 10^4). This multiplicative growth of ill-conditioning is a fundamental challenge in structured inverse problems.

Quick Check

For A1∈C32Γ—128\mathbf{A}_{1} \in \mathbb{C}^{32 \times 128} and A2∈C64Γ—128\mathbf{A}_{2} \in \mathbb{C}^{64 \times 128}, the Kronecker product A1βŠ—A2\mathbf{A}_{1} \otimes \mathbf{A}_{2} is a matrix of size:

2048Γ—16,3842048 \times 16{,}384

96Γ—25696 \times 256

4096Γ—32,7684096 \times 32{,}768

64Γ—12864 \times 128

Quick Check

When applying the vec trick to compute (A1βŠ—A2)vec(X)(\mathbf{A}_{1} \otimes \mathbf{A}_{2})\text{vec}(\mathbf{X}) with A1∈C32Γ—128\mathbf{A}_{1} \in \mathbb{C}^{32 \times 128} and A2∈C64Γ—128\mathbf{A}_{2} \in \mathbb{C}^{64 \times 128}, what shape should X\mathbf{X} be?

128Γ—128128 \times 128

128Γ—32128 \times 32

64Γ—3264 \times 32

32Γ—12832 \times 128

Definition:

Multi-Factor Kronecker Products

For three or more factors, the Kronecker product extends associatively:

A1βŠ—A2βŠ—A3=(A1βŠ—A2)βŠ—A3.\mathbf{A}_{1} \otimes \mathbf{A}_{2} \otimes \mathbf{A}_{3} = (\mathbf{A}_{1} \otimes \mathbf{A}_{2}) \otimes \mathbf{A}_{3}.

The vec trick generalizes to tensor contractions. For the three-factor case (e.g., frequency Γ—\times Rx Γ—\times Tx in the RF imaging model), reshape c\mathbf{c} into a 3D tensor C∈CN3Γ—N2Γ—N1\mathcal{C} \in \mathbb{C}^{N_3 \times N_2 \times N_1} and apply each factor along the corresponding mode:

Y=CΓ—1A3Γ—2A2Γ—3A1,\mathcal{Y} = \mathcal{C} \times_1 \mathbf{A}_{3} \times_2 \mathbf{A}_{2} \times_3 \mathbf{A}_{1},

where Γ—k\times_k denotes the mode-kk product. The total cost is O(N1N2N3(M1+M2+M3))O(N_1 N_2 N_3 (M_1 + M_2 + M_3)) instead of O(M1M2M3β‹…N1N2N3)O(M_1 M_2 M_3 \cdot N_1 N_2 N_3).

In the RF imaging system of The Kronecker Structure and Sensing Operator Properties, the three factors correspond to the transmit steering (ATx\mathbf{A}_{\text{Tx}}), receive steering (ARx\mathbf{A}_{\text{Rx}}), and frequency phase (Afreq\mathbf{A}_{\text{freq}}) components.

Common Mistake: Near-Field Breaks Exact Kronecker Structure

Mistake:

Assuming that the sensing operator always has exact Kronecker structure, even when the array is in the near-field of the imaging region.

Correction:

The Kronecker factorization A=A1βŠ—A2\mathbf{A} = \mathbf{A}_{1} \otimes \mathbf{A}_{2} is exact only in the far-field regime, where the steering vector separates into azimuth and elevation components. In the near-field, the steering vector for each antenna depends on the full 3D distance βˆ₯pnβˆ’pqβˆ₯\|\mathbf{p}_n - \mathbf{p}_q\|, which does not factor.

For our system with Fresnel number β‰ˆ0.36\approx 0.36 (Section 5 of the code reference), near-field effects are non-negligible. In practice, one can:

  1. Use the Kronecker structure as a preconditioner (fast approximate inverse).
  2. Store and apply the exact operator, but use the Kronecker approximation for gradient computations where high accuracy is less critical.
  3. Apply a NUFFT-based approach that handles the non-uniform sampling directly.

Why This Matters: Kronecker Structure in MIMO Channel Estimation

The Kronecker structure exploited here for RF imaging is the same structure that appears in MIMO channel estimation. When the transmit and receive correlation matrices are separable (R=RrβŠ—Rt\mathbf{R} = \mathbf{R}_r \otimes \mathbf{R}_t), the vectorized channel vec(H)\text{vec}(\mathbf{H}) has a Kronecker-structured covariance. The LMMSE channel estimator then exploits this factorization in exactly the same way as our vec trick β€” the two-sided matrix multiplication replaces a large matrix inversion.

The insight that separable structure reduces computational cost from O(N4)O(N^4) to O(N3)O(N^3) in MIMO estimation (Kronecker Channel Models) is the same insight applied here to reduce the imaging operator cost.

Kronecker product

The tensor product of two matrices AβŠ—B\mathbf{A} \otimes \mathbf{B}, formed by replacing each entry aija_{ij} with the block aijBa_{ij}\mathbf{B}. The result has dimensions (M1M2)Γ—(N1N2)(M_1 M_2) \times (N_1 N_2).

Related: {{Ref:Gloss Vec Operator}}

Vec operator

The vectorization vec(X)\text{vec}(\mathbf{X}) stacks the columns of X∈CmΓ—n\mathbf{X} \in \mathbb{C}^{m \times n} into a single vector in Cmn\mathbb{C}^{mn}. The inverse operation reshapes a vector back into a matrix.

Related: {{Ref:Gloss Kronecker Product}}

Non-uniform FFT (NUFFT)

An algorithm that evaluates Fourier sums at non-uniformly spaced frequency points in O(Nlog⁑N+M)O(N \log N + M) time, where NN is the number of source points and MM is the number of target points. Uses gridding, FFT, and deapodization.

Key Takeaway

The sensing operator A\mathbf{A} in RF imaging inherits Kronecker structure from the separable array geometry. The vec trick converts a dense matrix-vector product into two smaller matrix multiplications, reducing cost by orders of magnitude. When the factors are DFT matrices (uniform grids), the FFT provides an additional logarithmic speedup. For non-uniform wavenumber sampling, the NUFFT achieves near-optimal complexity with controllable approximation error. Every iterative reconstruction algorithm benefits directly from these fast operator evaluations.