Ferkans — Interactive Telecom Tutor

Why Preconditioning Accelerates Reconstruction

Iterative reconstruction algorithms (gradient descent, ISTA, ADMM, OAMP) converge at a rate determined by the condition number of $\mathbf{A}^{H}\mathbf{A}$ . When $\kappa(\mathbf{A}) \gg 1$ --- typical for RF imaging with incomplete angular coverage --- convergence can be painfully slow: the number of iterations scales linearly with $\kappa$ for first-order methods and as $\sqrt{\kappa}$ for accelerated methods. Preconditioning transforms the problem so that the effective condition number is closer to 1, dramatically reducing the iteration count. The Kronecker structure of $\mathbf{A}$ makes certain preconditioners computationally cheap.

Definition:
Preconditioner

A preconditioner for the normal equations $\mathbf{A}^{H}\mathbf{A}\mathbf{c} = \mathbf{A}^{H}\mathbf{y}$ is an invertible matrix $\mathbf{P}$ such that the preconditioned system

$\mathbf{P}^{-1}\mathbf{A}^{H}\mathbf{A}\mathbf{c} = \mathbf{P}^{-1}\mathbf{A}^{H}\mathbf{y}$

has a smaller effective condition number $\kappa(\mathbf{P}^{-1}\mathbf{A}^{H}\mathbf{A}) \ll \kappa(\mathbf{A}^{H}\mathbf{A})$ .

An ideal preconditioner approximates $\mathbf{A}^{H}\mathbf{A}$ while being cheap to apply: $\mathbf{P} \approx \mathbf{A}^{H}\mathbf{A}$ but $\mathbf{P}^{-1}\mathbf{v}$ costs $O(N)$ or $O(N \log N)$ .

Definition:
Diagonal (Jacobi) Preconditioner

The simplest preconditioner is the diagonal (Jacobi) preconditioner:

$\mathbf{P}_{\text{diag}} = \text{diag}(\mathbf{A}^{H}\mathbf{A}) = \text{diag}\bigl(\|\mathbf{A}_{1}\|^2, \ldots, \|\mathbf{A}_{N}\|^2\bigr)$

where $\mathbf{A}_{q}$ is the $q$ -th column of $\mathbf{A}$ . This equalizes the column norms, correcting for the non-uniform illumination pattern.

For a Kronecker-structured matrix, the diagonal of $\mathbf{A}^{H}\mathbf{A}$ is itself a Kronecker product:

$\text{diag}(\mathbf{A}^{H}\mathbf{A}) = \text{diag}(\mathbf{A}_{3}^{H}\mathbf{A}_{3}) \otimes \text{diag}(\mathbf{A}_{2}^{H}\mathbf{A}_{2}) \otimes \text{diag}(\mathbf{A}_{1}^{H}\mathbf{A}_{1})$

requiring storage for only $n_1 + n_2 + n_3$ values.

In Caire's framework (Eq. 24--25), the backpropagation operator $\hat{\mathbf{c}}^{\text{BP}} = \mathbf{A}^{H}\mathbf{D}^{-1}\mathbf{y}$ uses a diagonal matrix $\mathbf{D}$ that accounts for path loss and antenna gain variations. This is precisely the diagonal preconditioner applied in the measurement domain.

Theorem: Kronecker Preconditioning via Factor Inversion

For $\mathbf{A} = \mathbf{A}_{3} \otimes \mathbf{A}_{2} \otimes \mathbf{A}_{1}$ , define the Kronecker preconditioner:

$\mathbf{P}_{\text{Kron}} = (\mathbf{A}_{3}^{H}\mathbf{A}_{3} + \alpha_3 \mathbf{I}) \otimes (\mathbf{A}_{2}^{H}\mathbf{A}_{2} + \alpha_2 \mathbf{I}) \otimes (\mathbf{A}_{1}^{H}\mathbf{A}_{1} + \alpha_1 \mathbf{I})$

where $\alpha_k > 0$ are regularization parameters. Then:

$\mathbf{P}_{\text{Kron}}^{-1}$ can be applied in $O(n_1^3 + n_2^3 + n_3^3)$ time by inverting each factor separately.
The effective condition number is

$\kappa(\mathbf{P}_{\text{Kron}}^{-1}\mathbf{A}^{H}\mathbf{A}) = \prod_{k=1}^{3} \frac{\sigma_{\max}^2(\mathbf{A}_{k}) + \alpha_k}{\sigma_{\min}^2(\mathbf{A}_{k}) + \alpha_k} \cdot \frac{\sigma_{\min}^2(\mathbf{A}_{k})}{\sigma_{\max}^2(\mathbf{A}_{k})}.$
With $\alpha_k = 0$ , $\mathbf{P}_{\text{Kron}}^{-1}\mathbf{A}^{H}\mathbf{A} = \mathbf{I}$ (perfect preconditioning), but this requires each factor to be invertible.

Since $\mathbf{A}^{H}\mathbf{A}$ factors as a Kronecker product, we can invert it factor by factor instead of inverting the full $N \times N$ matrix. The regularization parameters $\alpha_k$ handle the case when individual factors are rank-deficient, at the cost of imperfect preconditioning.

Proof

Kronecker inverse

By the Kronecker inverse property:

$\mathbf{P}_{\text{Kron}}^{-1} = (\mathbf{A}_{3}^{H}\mathbf{A}_{3} + \alpha_3\mathbf{I})^{-1} \otimes (\mathbf{A}_{2}^{H}\mathbf{A}_{2} + \alpha_2\mathbf{I})^{-1} \otimes (\mathbf{A}_{1}^{H}\mathbf{A}_{1} + \alpha_1\mathbf{I})^{-1}.$

Each factor inverse has size $n_k \times n_k$ , so the total computation is $O(n_1^3 + n_2^3 + n_3^3)$ compared to $O(N^3) = O((n_1 n_2 n_3)^3)$ for direct inversion. $\blacksquare$

Example: Preconditioned CG for Tikhonov Reconstruction

A 2D imaging system with $\mathbf{A} = \mathbf{A}_{\text{ang}} \otimes \mathbf{A}_{f}$ has $\kappa(\mathbf{A}) = 200$ . The Tikhonov solution

$\hat{\mathbf{c}} = (\mathbf{A}^{H}\mathbf{A} + \lambda\mathbf{I})^{-1}\mathbf{A}^{H}\mathbf{y}$

is computed by conjugate gradient (CG) on the normal equations.

(a) How many CG iterations are needed without preconditioning? (b) How many with the Kronecker preconditioner, given $\kappa(\mathbf{A}_{\text{ang}}) = 20$ and $\kappa(\mathbf{A}_{f}) = 10$ ?

Solution

Unpreconditioned CG convergence

CG reduces the error by a factor of $2(\sqrt{\kappa_{\text{eff}}} - 1)/(\sqrt{\kappa_{\text{eff}}} + 1)$ per iteration, where $\kappa_{\text{eff}} = (\sigma_{\max}^2 + \lambda)/(\sigma_{\min}^2 + \lambda)$ . For small $\lambda$ , $\kappa_{\text{eff}} \approx \kappa^2 = 40{,}000$ . To reduce error by $10^{-6}$ :

$k \approx \frac{\sqrt{\kappa_{\text{eff}}}}{2}\ln(2/\epsilon) \approx \frac{200}{2}\ln(2 \times 10^6) \approx 1450 \text{ iterations}.$

Kronecker-preconditioned CG

The Kronecker preconditioner $\mathbf{P} = (\mathbf{A}_{\text{ang}}^H\mathbf{A}_{\text{ang}} + \lambda\mathbf{I}) \otimes (\mathbf{A}_{f}^{H}\mathbf{A}_{f} + \lambda\mathbf{I})$ reduces the effective condition number to approximately $\kappa_{\text{eff}}^{\text{prec}} \approx 4$ -- $10$ (the residual mismatch from regularization). CG now converges in:

$k \approx \frac{\sqrt{10}}{2}\ln(2 \times 10^6) \approx 23 \text{ iterations}.$

Speedup: $\mathbf{63\times}$ fewer iterations, with each iteration costing only marginally more due to the preconditioner application.

Preconditioned CG with Kronecker Structure

Complexity: Each iteration:

O(n_1^2 n_2 n_3 + n_1 n_2^2 n_3 + n_1 n_2 n_3^2)

for the Kronecker matvec, plus

O(n_1^2 n_2 n_3 + \ldots)

for the preconditioner. Total iterations:

O(\sqrt{\kappa_{\text{prec}}}\log(1/\epsilon))

.

Input: Factor matrices

\mathbf{A}_{1}, \mathbf{A}_{2}, \mathbf{A}_{3}

;

observations

\mathbf{y}

; regularization

\alpha_1, \alpha_2, \alpha_3

;

tolerance

\epsilon

.

Precompute:

\mathbf{P}_k^{-1} = (\mathbf{A}_{k}^{H}\mathbf{A}_{k} + \alpha_k\mathbf{I})^{-1}

for

k = 1,2,3

(three small eigendecompositions).

Initialize:

\mathbf{c}_{0} = \mathbf{0}

,

\mathbf{r}_0 = \mathbf{A}^{H}\mathbf{y}

(Kronecker matvec),

\mathbf{z}_0 = (\mathbf{P}_3^{-1} \otimes \mathbf{P}_2^{-1} \otimes \mathbf{P}_1^{-1})\mathbf{r}_0

(Kronecker matvec),

\mathbf{d}_0 = \mathbf{z}_0

.

for

t = 0, 1, 2, \ldots

:

\quad \mathbf{q}_t = (\mathbf{A}^{H}\mathbf{A} + \text{diag}(\alpha))\mathbf{d}_t

(Kronecker matvec)

\quad \beta_t = \mathbf{r}_t^H\mathbf{z}_t / \mathbf{d}_t^H\mathbf{q}_t

\quad \mathbf{c}_{t+1} = \mathbf{c}_{t} + \beta_t \mathbf{d}_t

\quad \mathbf{r}_{t+1} = \mathbf{r}_t - \beta_t \mathbf{q}_t

\quad

if

\|\mathbf{r}_{t+1}\| < \epsilon

: return

\mathbf{c}_{t+1}

\quad \mathbf{z}_{t+1} = \mathbf{P}^{-1}\mathbf{r}_{t+1}

(Kronecker matvec)

\quad \gamma_t = \mathbf{r}_{t+1}^H\mathbf{z}_{t+1}/\mathbf{r}_t^H\mathbf{z}_t

\quad \mathbf{d}_{t+1} = \mathbf{z}_{t+1} + \gamma_t\mathbf{d}_t

The preconditioner application $\mathbf{P}^{-1}\mathbf{r}$ is itself a Kronecker matvec: reshape $\mathbf{r}$ into a tensor and multiply each mode by $\mathbf{P}_k^{-1}$ . No large matrix is ever formed.

Preconditioning Effect on Convergence

Compare the convergence of unpreconditioned and Kronecker-preconditioned conjugate gradient for a Tikhonov reconstruction problem. Observe how the number of iterations drops dramatically with preconditioning, especially for ill-conditioned systems.

Parameters

Antennas per array8

N_f

16

Angular coverage (degrees)60

\lambda

(regularization)0.01

⚠️Engineering Note

k-Space Density Compensation (Reweighting)

When the k-space sampling is non-uniform (typical for multi-static systems), the backprojection image $\mathbf{A}^{H}\mathbf{y}$ overemphasizes densely sampled regions. Density compensation reweights the measurements before backprojection:

$\hat{\mathbf{c}}^{\text{DC}} = \mathbf{A}^{H}\mathbf{W}\mathbf{y}$

where $\mathbf{W} = \text{diag}(w_1, \ldots, w_M)$ and $w_m$ is inversely proportional to the local k-space sampling density around the $m$ -th measurement.

This is equivalent to a left preconditioner that approximately whitens the normal equations. For Kronecker-structured $\mathbf{A}$ , the weights factor as $\mathbf{W} = \mathbf{W}_3 \otimes \mathbf{W}_2 \otimes \mathbf{W}_1$ when the sampling density is separable.

Quick Check

Which statement about Kronecker preconditioning is correct?

The preconditioner application costs $O(N^3)$ where $N$ is the number of voxels.

The Kronecker preconditioner inverts each factor separately, costing $O(n_1^3 + n_2^3 + n_3^3)$ .

Preconditioning increases the condition number to improve convergence.

Diagonal preconditioning is always sufficient for RF imaging.

Correction:

The Kronecker preconditioner inverts each factor separately, costing

O(n_1^3 + n_2^3 + n_3^3)

.

This is the key advantage: inverting three small matrices instead of one enormous one, enabled by the Kronecker product inverse property.

Key Takeaway

Preconditioning reduces the effective condition number and accelerates iterative reconstruction. The Kronecker structure enables a powerful preconditioner $\mathbf{P}_{\text{Kron}}$ that inverts each factor separately at cost $O(n_k^3)$ instead of $O(N^3)$ . For a typical system, this reduces CG iterations from thousands to tens. k-Space density compensation provides an alternative view as measurement-domain reweighting.

Preconditioning the Sensing Operator