Exercises

ch18-ex01-ista-to-lista

Easy

Write out the ISTA iteration for the LASSO problem min⁑c12βˆ₯yβˆ’Acβˆ₯2+Ξ»βˆ₯cβˆ₯1\min_{\mathbf{c}} \frac{1}{2}\|\mathbf{y} - \mathbf{A}\mathbf{c}\|^2 + \lambda\|\mathbf{c}\|_1 and identify which quantities become learnable parameters in LISTA. State the LISTA initialisation for each learnable parameter.

ch18-ex02-admm-split

Easy

Write out the three ADMM updates for min⁑c12βˆ₯yβˆ’Acβˆ₯2+Ξ»βˆ₯cβˆ₯1\min_{\mathbf{c}} \frac{1}{2}\|\mathbf{y} - \mathbf{A}\mathbf{c}\|^2 + \lambda\|\mathbf{c}\|_1 with splitting variable z=c\mathbf{z} = \mathbf{c}. Identify the closed-form solution for the c\mathbf{c}-update.

ch18-ex03-weight-tying

Easy

A 15-layer LISTA network for M=200M = 200, N=800N = 800 has weight matrices Wk∈R800Γ—800\mathbf{W}_k \in \mathbb{R}^{800 \times 800} and Sk∈R800Γ—200\mathbf{S}_k \in \mathbb{R}^{800 \times 200}. Compute the parameter count with and without weight tying.

ch18-ex04-oamp-unrolled-structure

Easy

Draw a block diagram of one layer of the unrolled OAMP-ProxNet architecture. Label the inputs, outputs, and learnable components. Identify which components use the sensing matrix A\mathbf{A} and which do not.

ch18-ex05-convergence-check

Easy

Given the state evolution recursion Οƒt+12=1Ξ΄(Οƒ2+mmse(Οƒt2))\sigma_{t+1}^2 = \frac{1}{\delta}(\sigma^2 + \text{mmse}(\sigma_t^2)) with Ξ΄=0.5\delta = 0.5, Οƒ2=0.01\sigma^2 = 0.01, and mmse(Οƒ2)=0.1β‹…min⁑(Οƒ2,1)\text{mmse}(\sigma^2) = 0.1 \cdot \min(\sigma^2, 1) (Bernoulli-Gaussian signal with sparsity 0.1 and unit variance), compute the state evolution trajectory for T=5T = 5 layers starting from Οƒ02=2.0\sigma_0^2 = 2.0.

ch18-ex06-gradient-flow

Medium

For a KK-layer LISTA network, derive the gradient βˆ‚L/βˆ‚Ο„k\partial \mathcal{L}/\partial \tau_k of the MSE loss with respect to the threshold at layer kk. Show that the Jacobian of soft-thresholding is a diagonal matrix with entries in {0,1}\{0, 1\}.

ch18-ex07-oamp-orthogonality

Medium

Show that the OAMP linear estimator WLE=AH(AAH+vI)βˆ’1\mathbf{W}_{\text{LE}} = \mathbf{A}^{H}(\mathbf{A}\mathbf{A}^{H} + v\mathbf{I})^{-1} satisfies 1Ntr⁑(WLEA)=1Nβˆ‘iΟƒi2Οƒi2+v\frac{1}{N}\operatorname{tr}(\mathbf{W}_{\text{LE}}\mathbf{A}) = \frac{1}{N}\sum_{i} \frac{\sigma_i^2}{\sigma_i^2 + v} where {Οƒi}\{\sigma_i\} are the singular values of A\mathbf{A}.

ch18-ex08-kronecker-complexity

Medium

For a Kronecker-structured sensing matrix A=A1βŠ—A2\mathbf{A} = \mathbf{A}_1 \otimes \mathbf{A}_2 with A1∈CM1Γ—N1\mathbf{A}_1 \in \mathbb{C}^{M_1 \times N_1} and A2∈CM2Γ—N2\mathbf{A}_2 \in \mathbb{C}^{M_2 \times N_2}, compare the cost of computing Ac\mathbf{A}\mathbf{c} and AHy\mathbf{A}^{H}\mathbf{y} with and without exploiting the Kronecker structure.

ch18-ex09-intermediate-supervision

Medium

In intermediate supervision, the loss includes terms at every layer: Ltotal=βˆ‘k=1KwkL(c^(k),cβˆ—)\mathcal{L}_{\text{total}} = \sum_{k=1}^K w_k \mathcal{L}(\hat{\mathbf{c}}^{(k)}, \mathbf{c}^*). Argue that this helps with vanishing gradients and propose a weight schedule {wk}\{w_k\}.

ch18-ex10-ista-vs-oamp-condition

Medium

A sensing matrix A∈CMΓ—N\mathbf{A} \in \mathbb{C}^{M \times N} has singular values uniformly distributed in [0.1,10][0.1, 10] with M/N=0.5M/N = 0.5. Compute the ISTA step size 1/L1/L and the ISTA contraction rate. Compare with the OAMP per-singular-value weights wi=Οƒi/(Οƒi2+v)w_i = \sigma_i/(\sigma_i^2 + v) and argue why OAMP converges faster.

ch18-ex11-hst-weighted-l1

Medium

Show that spatially-varying soft-thresholding with thresholds Ο„=(Ο„1,…,Ο„N)\boldsymbol{\tau} = (\tau_1, \ldots, \tau_N) is the proximal operator of the weighted β„“1\ell^1 norm R(c)=βˆ‘iΟ„i∣ci∣R(\mathbf{c}) = \sum_i \tau_i |c_i|.

ch18-ex12-lista-contraction

Hard

For a weight-tied LISTA network, prove that the map T(c)=SΟ„(Wc+Sy)\mathcal{T}(\mathbf{c}) = \mathcal{S}_\tau(\mathbf{W}\mathbf{c} + \mathbf{S}\mathbf{y}) is contractive if βˆ₯Wβˆ₯<1\|\mathbf{W}\| < 1. Derive the fixed-point equation.

ch18-ex13-ladmm-convergence

Hard

Prove that if the learned proximal operator GΞΈkz\mathcal{G}_{\theta_k^z} in L-ADMM is firmly nonexpansive, then the L-ADMM iterates converge to a fixed point (assuming ρkβ‰₯ρmin⁑>0\rho_k \geq \rho_{\min} > 0).

ch18-ex14-spectral-normalisation

Hard

Design a training procedure for a convergent LISTA network that guarantees βˆ₯Wkβˆ₯<1\|\mathbf{W}_k\| < 1 at every training step using spectral normalisation. Analyse the effect on convergence rate.

ch18-ex15-proxnet-divergence

Hard

In unrolled OAMP, the state evolution requires the divergence of the ProxNet denoiser: div⁑(DΞΈ)=1Nβˆ‘iβˆ‚[DΞΈ(r)]iβˆ‚ri\operatorname{div}(\mathcal{D}_\theta) = \frac{1}{N}\sum_i \frac{\partial [\mathcal{D}_\theta(\mathbf{r})]_i}{\partial r_i}.

  1. Show that computing the exact divergence requires NN backward passes.
  2. Derive the Monte Carlo estimator using Stein's identity.
  3. Prove unbiasedness and compute the variance.

ch18-ex16-generalisation-weight-tying

Hard

For a KK-layer unrolled network with per-layer parameters PlayerP_{\text{layer}}, compare the generalisation bound with and without weight tying. Quantify the improvement for K=20K = 20.

ch18-ex17-alista-derivation

Challenge

Derive the ALISTA optimal weight matrix. Starting from the LISTA update c^(k+1)=SΟ„k(Wc^(k)+Sy)\hat{\mathbf{c}}^{(k+1)} = \mathcal{S}_{\tau_k}(\mathbf{W}\hat{\mathbf{c}}^{(k)} + \mathbf{S}\mathbf{y}), show that the optimal W\mathbf{W} minimising the worst-case contraction rate over all ss-sparse signals is:

Wβˆ—=Iβˆ’AH(AAH+ΞΌβˆ—I)βˆ’1A\mathbf{W}^* = \mathbf{I} - \mathbf{A}^{H}(\mathbf{A}\mathbf{A}^{H} + \mu^*\mathbf{I})^{-1}\mathbf{A}

and find the equation determining ΞΌβˆ—\mu^*.

ch18-ex18-rip-bound

Challenge

Prove that for a sensing matrix A\mathbf{A} satisfying the RIP of order 2s2s with constant Ξ΄2s<2βˆ’1\delta_{2s} < \sqrt{2} - 1, the LISTA network with ALISTA weights achieves stable recovery:

βˆ₯c^(K)βˆ’cβˆ—βˆ₯≀ρKβˆ₯c^(0)βˆ’cβˆ—βˆ₯+Csβˆ₯cβˆ—βˆ’csβˆ—βˆ₯1+DΟƒ2\|\hat{\mathbf{c}}^{(K)} - \mathbf{c}^*\| \leq \rho^K \|\hat{\mathbf{c}}^{(0)} - \mathbf{c}^*\| + \frac{C}{\sqrt{s}}\|\mathbf{c}^* - \mathbf{c}^*_s\|_1 + D\sqrt{\sigma^2}

where ρ=Ξ΄2s/(1βˆ’Ξ΄2s)<1\rho = \delta_{2s}/(1-\delta_{2s}) < 1.

ch18-ex19-hst-group-lasso

Challenge

For the OTFS channel estimation problem with angular-delay-Doppler groups {Gβ„“}β„“=1L\{\mathcal{G}_\ell\}_{\ell=1}^L, derive the proximal operator of the group LASSO penalty R(c)=βˆ‘β„“Ξ»β„“βˆ₯cGβ„“βˆ₯2R(\mathbf{c}) = \sum_\ell \lambda_\ell \|\mathbf{c}_{\mathcal{G}_\ell}\|_2. Show that it performs group-wise soft-thresholding.

ch18-ex20-end-to-end-training

Challenge

Design an end-to-end training procedure for a 10-layer unrolled OAMP network with ProxNet denoisers for a 128Γ—128128 \times 128 RF imaging problem. Address:

(a) Training data generation (how many scenes, what distribution). (b) Loss function (MSE, perceptual, or combined). (c) Training schedule (layer-wise pre-training followed by end-to-end). (d) Memory management (gradient checkpointing). (e) Validation strategy (in-distribution and out-of-distribution).