Exercises

ex-ch04-01

Easy

Let A1∈C4Γ—8\mathbf{A}_{1} \in \mathbb{C}^{4 \times 8} and A2∈C6Γ—8\mathbf{A}_{2} \in \mathbb{C}^{6 \times 8}.

(a) What are the dimensions of A1βŠ—A2\mathbf{A}_{1} \otimes \mathbf{A}_{2}?

(b) How many complex entries does the full Kronecker product contain?

(c) How many complex entries do the two factors contain together?

(d) What is the ratio of (b) to (c)?

ex-ch04-02

Easy

Verify the vec trick by hand for 2Γ—22 \times 2 matrices. Let A1=(abcd)\mathbf{A}_{1} = \begin{pmatrix} a & b \\ c & d \end{pmatrix} and A2=(efgh)\mathbf{A}_{2} = \begin{pmatrix} e & f \\ g & h \end{pmatrix}. Compute both sides of

(A1βŠ—A2)vec(X)=vec(A2XA1T)(\mathbf{A}_{1} \otimes \mathbf{A}_{2})\text{vec}(\mathbf{X}) = \text{vec}(\mathbf{A}_{2} \mathbf{X} \mathbf{A}_{1}^{T})

for X=(1001)\mathbf{X} = \begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix} and verify they are equal.

ex-ch04-03

Medium

Show that the spectral norm of a Kronecker product satisfies βˆ₯A1βŠ—A2βˆ₯=βˆ₯A1βˆ₯β‹…βˆ₯A2βˆ₯\|\mathbf{A}_{1} \otimes \mathbf{A}_{2}\| = \|\mathbf{A}_{1}\| \cdot \|\mathbf{A}_{2}\|.

ex-ch04-04

Medium

The Lipschitz constant of the gradient βˆ‡f(c)=AH(Acβˆ’y)\nabla f(\mathbf{c}) = \mathbf{A}^{H}(\mathbf{A}\mathbf{c} - \mathbf{y}) is Lf=βˆ₯AHAβˆ₯L_f = \|\mathbf{A}^{H}\mathbf{A}\|.

(a) When A=A1βŠ—A2\mathbf{A} = \mathbf{A}_{1} \otimes \mathbf{A}_{2}, express LfL_f in terms of the spectral norms of the factors.

(b) How does the ISTA step size ΞΌ=1/Lf\mu = 1/L_f change when we add a second Kronecker factor?

ex-ch04-05

Medium

Implement the Kronecker-factored forward operator in Python. Given A1 (shape M1Γ—N1M_1 \times N_1) and A2 (shape M2Γ—N2M_2 \times N_2), write a function kron_forward(c, A1, A2) that computes (A1βŠ—A2)c(\mathbf{A}_{1} \otimes \mathbf{A}_{2})\mathbf{c} using the vec trick. Verify by comparing against np.kron(A1, A2) @ c for random inputs.

ex-ch04-06

Medium

For the NUFFT with oversampling factor Οƒ\sigma and interpolation kernel width WW:

(a) What is the total memory required for the oversampled grid when the original grid has N2N^2 points?

(b) What is the computational cost of the FFT on the oversampled grid?

(c) For N=128N = 128, Οƒ=2\sigma = 2, W=12W = 12, and M=2048M = 2048 non-uniform points, compare the total NUFFT cost against the direct evaluation cost.

ex-ch04-07

Hard

Prove that for the three-factor Kronecker product A=A1βŠ—A2βŠ—A3\mathbf{A} = \mathbf{A}_{1} \otimes \mathbf{A}_{2} \otimes \mathbf{A}_{3}, the factored computation via mode products has cost

O(N1N2N3(M1+M2+M3))O(N_1 N_2 N_3(M_1 + M_2 + M_3))

compared to the naive O(M1M2M3β‹…N1N2N3)O(M_1 M_2 M_3 \cdot N_1 N_2 N_3). Determine the conditions under which the speedup exceeds 100Γ—100\times.

ex-ch04-08

Easy

A GPU has 4,096 CUDA cores, each capable of one complex multiply-add per clock cycle at 1.5 GHz.

(a) What is the theoretical peak throughput in complex GFLOP/s?

(b) How long does a 32Γ—12832 \times 128 complex matrix times a 128Γ—128128 \times 128 complex matrix take at peak throughput?

(c) Why is the actual time significantly longer than (b)?

ex-ch04-09

Medium

Explain why CuPy is preferred over PyTorch for GPU-accelerating an existing NumPy-based RF imaging reconstruction pipeline, whereas PyTorch is preferred for training an unrolled network.

ex-ch04-10

Hard

For a function f:Rn→Rmf : \mathbb{R}^n \to \mathbb{R}^m with n=10,000n = 10{,}000 and m=1m = 1 (a scalar loss), compare the cost of computing the full gradient using:

(a) Forward-mode AD (one JVP per input dimension).

(b) Reverse-mode AD (one VJP with u=1u = 1).

(c) Finite differences with central differences.

Express costs in terms of CfC_f (the cost of one forward evaluation of ff).

ex-ch04-11

Hard

An unrolled ISTA algorithm with T=50T = 50 iterations processes images of size Q=1282Q = 128^2.

(a) How much memory does standard reverse-mode AD require to store all intermediate iterates?

(b) If gradient checkpointing stores only every T\sqrt{T}-th iterate, how much memory is needed?

(c) What is the computational overhead of checkpointing (expressed as a fraction of the original forward pass cost)?

ex-ch04-12

Medium

Derive the ADMM primal and dual residuals for the β„“1\ell_1-regularized imaging problem

min⁑c12βˆ₯Acβˆ’yβˆ₯2+Ξ»βˆ₯cβˆ₯1\min_{\mathbf{c}} \frac{1}{2}\|\mathbf{A}\mathbf{c} - \mathbf{y}\|^2 + \lambda\|\mathbf{c}\|_1

with splitting z=c\mathbf{z} = \mathbf{c}.

ex-ch04-13

Easy

For a noise vector w∼CN(0,Οƒ2IM)\mathbf{w} \sim \mathcal{CN}(\mathbf{0}, \sigma^2\mathbf{I}_M) with M=2048M = 2048 and Οƒ=0.05\sigma = 0.05:

(a) Compute the expected value of βˆ₯wβˆ₯2\|\mathbf{w}\|^2.

(b) Compute Ξ΄=E[βˆ₯wβˆ₯]\delta = \mathbb{E}[\|\mathbf{w}\|] (use the approximation E[βˆ₯wβˆ₯]β‰ˆΟƒM\mathbb{E}[\|\mathbf{w}\|] \approx \sigma\sqrt{M}).

(c) What stopping threshold does the discrepancy principle with Ο„=1.2\tau = 1.2 give?

ex-ch04-14

Hard

The condition number of the Kronecker product A1βŠ—A2\mathbf{A}_{1} \otimes \mathbf{A}_{2} is ΞΊ1β‹…ΞΊ2\kappa_1 \cdot \kappa_2. For ISTA, the number of iterations to achieve Ξ΅\varepsilon-accuracy scales as O(ΞΊlog⁑(1/Ξ΅))O(\kappa \log(1/\varepsilon)).

(a) If ΞΊ1=50\kappa_1 = 50 and ΞΊ2=100\kappa_2 = 100, how many ISTA iterations are needed for Ξ΅=10βˆ’4\varepsilon = 10^{-4}? (Use the bound T=ΞΊlog⁑(1/Ξ΅)/log⁑2T = \kappa \log(1/\varepsilon) / \log 2.)

(b) What is the total cost (in FLOPs) if each iteration uses the Kronecker vec trick with M1=M2=32M_1 = M_2 = 32, N1=N2=128N_1 = N_2 = 128?

(c) Compare with the cost of forming the full Kronecker product once and using ISTA with dense matrix-vector products.

ex-ch04-15

Challenge

The matched-filter image is c^BP=AHDβˆ’1y\hat{\mathbf{c}}^{\text{BP}} = \mathbf{A}^{H} \mathbf{D}^{-1} \mathbf{y} where D=diag(AHA)\mathbf{D} = \text{diag}(\mathbf{A}^{H} \mathbf{A}).

(a) When A=A1βŠ—A2\mathbf{A} = \mathbf{A}_{1} \otimes \mathbf{A}_{2}, show that the diagonal of AHA\mathbf{A}^{H}\mathbf{A} is the Kronecker product of the diagonals: diag(AHA)=diag(A1HA1)βŠ—diag(A2HA2)\text{diag}(\mathbf{A}^{H}\mathbf{A}) = \text{diag}(\mathbf{A}_{1}^{H}\mathbf{A}_{1}) \otimes \text{diag}(\mathbf{A}_{2}^{H}\mathbf{A}_{2}).

(b) Use this to compute the matched-filter image using only the Kronecker factors.

(c) For the RF imaging system with M1=M2=32M_1 = M_2 = 32, N1=N2=128N_1 = N_2 = 128, what is the computational cost of computing c^BP\hat{\mathbf{c}}^{\text{BP}} via the factored approach?

ex-ch04-16

Medium

Implement a simple convergence monitor in Python that tracks: (i) objective value, (ii) fixed-point residual, (iii) data residual for ISTA applied to min⁑c12βˆ₯Acβˆ’yβˆ₯2+Ξ»βˆ₯cβˆ₯1\min_{\mathbf{c}} \frac{1}{2}\|\mathbf{A}\mathbf{c} - \mathbf{y}\|^2 + \lambda\|\mathbf{c}\|_1.

The monitor should print a warning if the objective increases at any iteration.

ex-ch04-17

Hard

Show that for FISTA (accelerated proximal gradient), the objective value is not guaranteed to decrease monotonically, unlike ISTA. Construct a simple 2D example where the FISTA objective oscillates.

ex-ch04-18

Challenge

Design a convergence diagnostic for Plug-and-Play ADMM where the proximal step is replaced by a neural denoiser Dσ\mathcal{D}_\sigma. Since the algorithm does not minimize a well-defined objective, the standard objective decrease criterion does not apply.

(a) What quantities can be monitored?

(b) Under what condition on Dσ\mathcal{D}_\sigma does the fixed-point residual guarantee convergence?

(c) How would you estimate the spectral norm βˆ₯JDβˆ₯\|\mathbf{J}_{\mathcal{D}}\| in practice?

ex-ch04-19

Medium

Explain why the near-field breaks the exact Kronecker structure of the sensing operator, and propose a strategy that retains computational efficiency while handling near-field effects.

ex-ch04-20

Easy

State whether each of the following diagnostics requires access to the ground-truth image:

(a) Fixed-point residual. (b) PSNR vs iteration. (c) Discrepancy principle. (d) Primal-dual gap. (e) Objective value.