Ferkans — Interactive Telecom Tutor

ex-ch21-01

Easy

Define the right-rotationally-invariant (RRI) ensemble in terms of the SVD $\mathbf{A} = \mathbf{U}\boldsymbol{\Lambda}\mathbf{V}^{\mathsf{H}}$ . Give two examples of matrices inside the RRI class and one outside.

Show Hint

RRI concerns the right singular basis $\mathbf{V}$ .

i.i.d. Gaussian is RRI; a diagonal matrix with structured diagonal entries is not.

Solution

Definition

$\mathbf{A}$ is RRI iff $\mathbf{V}$ is Haar-distributed on the unitary group, independently of $\mathbf{U}$ and $\boldsymbol{\Lambda}$ .

Examples

Inside RRI: (i) i.i.d. Gaussian matrices; (ii) random sub-sampled DFT / Hadamard matrices. Outside RRI: a deterministic Vandermonde matrix with fixed nodes.

ex-ch21-02

Easy

Write the OAMP iteration in one line and identify the three normalizations (trace on $\mathbf{W}_t$ , subtract $\langle\eta'\rangle\mathbf{r}_t$ , rescale by $C_t$ ). What does each normalization accomplish?

Show Hint

Linear step: $\mathbf{r}_t = \hat{\mathbf{x}}_t + \mathbf{W}_t(\ntn{obs}-\mathbf{A}\hat{\mathbf{x}}_t)$ .

Denoise: $\hat{\mathbf{x}}_{t+1} = C_t[\eta_t(\mathbf{r}_t) - \langle\eta_t'\rangle\mathbf{r}_t]$ .

Solution

Normalizations

(1) $\mathrm{tr}(\mathbf{W}_t\mathbf{A}) = N$ makes the linear step unity-gain on $\mathbf{x}$ . (2) Subtracting $\langle\eta'\rangle\mathbf{r}_t$ kills the input bias of $\eta_t$ (divergence-free denoiser). (3) $C_t = 1/(1 - \langle\eta'\rangle)$ restores unit gain on $\mathbf{x}$ after the subtraction.

ex-ch21-03

Easy

In GAMP, the input denoiser $g_{\text{in}}$ is determined by the signal prior; what determines the output denoiser $g_{\text{out}}$ ? Write $g_{\text{out}}$ for the standard Gaussian likelihood $p(y|z) = \mathcal{N}(y;z,\sigma^2)$ .

Show Hint

$g_{\text{out}}$ depends on the measurement likelihood.

For Gaussian likelihoods the conditional mean of $z$ is available in closed form.

Solution

Determinants

$g_{\text{out}}(y,\hat{z},\tau_p)$ is determined by the per- element likelihood $p(y|z)$ and the current state $\tau_p$ .

Gaussian case

$g_{\text{out}}(y,\hat{z},\tau_p) = \frac{y-\hat{z}}{\sigma^2+\tau_p}$ , a simple scaled residual.

ex-ch21-04

Easy

In LISTA the matrices $\{\mathbf{W}_t,\mathbf{S}_t,\lambda_t\}$ are learnable. What are the standard ISTA-based initial values, and why is initialization from these values crucial?

Show Hint

ISTA: $\mathbf{x}^{t+1} = \eta_{\text{st}}(\mathbf{x}^t + \mathbf{A}^{H}(\ntn{obs}-\mathbf{A}\mathbf{x}^t);\lambda)$ with step $1/L$ .

Random init of 100k+ parameters rarely finds a good basin.

Solution

ISTA init

$\mathbf{W}_t = \mathbf{A}^{\mathsf{H}}/L$ , $\mathbf{S}_t = \mathbf{I} - \mathbf{A}^{\mathsf{H}}\mathbf{A}/L$ , $\lambda_t = \lambda/L$ , where $L$ is the Lipschitz constant of the gradient, $L = \|\mathbf{A}\|_2^2$ .

Why

The ISTA init is provably convergent and gives the network a meaningful starting point. Random init typically fails to recover any convergent behaviour even after long training — the loss landscape has many bad basins at random points.

ex-ch21-05

Medium

Consider $\mathbf{A} = \sqrt{N/M}\,\mathbf{S}\mathbf{F}$ where $\mathbf{F}$ is $N\times N$ DFT and $\mathbf{S}$ selects $M$ rows. Show that $\mathbf{A}\mathbf{A}^{\mathsf{H}} = (N/M)\mathbf{I}_M$ . Using this, compute $\hat{\mathbf{W}}_t$ explicitly and show it reduces to a scalar multiple of $\mathbf{A}^{\mathsf{H}}$ .

Show Hint

DFT: $\mathbf{F}\mathbf{F}^H = N\mathbf{I}_N$ .

$\mathbf{S}\mathbf{S}^H = \mathbf{I}_M$ .

Solution

Verify the Gram

$\mathbf{A}\mathbf{A}^{\mathsf{H}} = (N/M)\mathbf{S}\mathbf{F}\mathbf{F}^{\mathsf{H}}\mathbf{S}^{\mathsf{H}} = (N/M)\mathbf{S}(N\mathbf{I}_N)\mathbf{S}^{\mathsf{H}}/N$ . Since $\mathbf{S}\mathbf{S}^{\mathsf{H}} = \mathbf{I}_M$ (rows are distinct standard basis vectors), this equals $(N/M)\mathbf{I}_M$ .

LMMSE filter

$\hat{\mathbf{W}}_t = \mathbf{A}^{\mathsf{H}}((N/M)\mathbf{I}_M + (\sigma^2/v_t)\mathbf{I}_M)^{-1} = \alpha_t \mathbf{A}^{\mathsf{H}}$ with $\alpha_t = 1/((N/M)+\sigma^2/v_t)$ .

Normalization

$\mathrm{tr}(\hat{\mathbf{W}}_t\mathbf{A}) = \alpha_t\cdot M\cdot(N/M) = \alpha_t N$ , so $C_t = N/(\alpha_t N) = 1/\alpha_t$ and $\mathbf{W}_t = \mathbf{A}^{\mathsf{H}}$ scaled to satisfy the trace constraint.

ex-ch21-06

Medium

Derive the Kronecker identity $(\mathbf{A}_1\otimes\mathbf{A}_2)\,\mathrm{vec}(\mathbf{X}) = \mathrm{vec}(\mathbf{A}_2\mathbf{X}\mathbf{A}_1^{\mathsf{T}})$ and use it to express the OAMP linear step for $\mathbf{A} = \mathbf{A}_1\otimes\mathbf{A}_2$ as a matrix equation in the reshaped signal $\mathbf{X}$ .

Show Hint

$\mathrm{vec}$ stacks columns.

This identity is the reason Kronecker structure reduces complexity dramatically.

Solution

Identity

$(\mathbf{A}_1\otimes\mathbf{A}_2)\mathrm{vec}(\mathbf{X}) = \mathrm{vec}(\mathbf{A}_2\mathbf{X}\mathbf{A}_1^{\mathsf{T}})$ . One direction: the $(i,j)$ entry of both sides coincides.

OAMP step

Reshape $\mathbf{x} = \mathrm{vec}(\mathbf{X})$ . The linear step $\ntn{obs}-\mathbf{A}\hat{\mathbf{x}}_t$ becomes $\mathbf{Y} - \mathbf{A}_2\hat{\mathbf{X}}_t\mathbf{A}_1^{\mathsf{T}}$ (reshaped). The Kronecker LMMSE exploits the diagonal structure in the per-factor SVD bases.

ex-ch21-07

Medium

The Hutchinson estimator computes $\mathrm{tr}(\mathbf{M}) \approx \frac{1}{K}\sum_{k=1}^{K}\mathbf{z}_k^{\mathsf{T}}\mathbf{M}\mathbf{z}_k$ with $\mathbf{z}_k$ having i.i.d. $\pm 1$ entries. Show that it is unbiased and compute its variance in terms of $\|\mathbf{M}\|_F^2$ .

Show Hint

$\mathbb{E}[\mathbf{z}\mathbf{z}^T] = \mathbf{I}$ .

For the variance, compute $\mathrm{Var}(\mathbf{z}^T\mathbf{M}\mathbf{z})$ .

Solution

Unbiasedness

$\mathbb{E}[\mathbf{z}^{\mathsf{T}}\mathbf{M}\mathbf{z}] = \sum_{i,j}M_{ij}\mathbb{E}[z_i z_j] = \sum_i M_{ii} = \mathrm{tr}(\mathbf{M})$ .

Variance

For Rademacher $\mathbf{z}$ , $\mathrm{Var}(\mathbf{z}^{\mathsf{T}}\mathbf{M}\mathbf{z}) = 2\sum_{i\ne j}M_{ij}^2 = 2(\|\mathbf{M}\|_F^2 - \sum_i M_{ii}^2)$ . With $K$ probes, the estimator has variance $O(K^{-1}\|\mathbf{M}\|_F^2)$ .

ex-ch21-08

Medium

Prove that for i.i.d. Gaussian $\mathbf{A}$ with variance $1/M$ , the AMP and OAMP matched-filter LMMSE updates coincide asymptotically. (Hence OAMP inherits AMP's behaviour on this ensemble.)

Show Hint

Singular values of $\mathbf{A}$ follow Marchenko-Pastur.

The LMMSE filter applied to the MP spectrum degenerates to a scaled adjoint.

Solution

Spectrum

The non-zero squared singular values $\lambda_i^2$ of $\mathbf{A}$ concentrate on the MP support $[(1-\sqrt{\delta})^2, (1+\sqrt{\delta})^2]$ with $\delta = M/N$ . In the large- $N$ limit the spectrum has fixed shape.

LMMSE filter collapses

$\hat{\mathbf{W}}_t = \mathbf{A}^{H}(\mathbf{A}\mathbf{A}^{H} + (\sigma^2/v_t)\mathbf{I})^{-1}$ . Diagonalizing in the SVD basis, each squared singular value is mapped to $\lambda_i/(\lambda_i^2+\sigma^2/v_t)$ . For flat MP spectra this averages to a scalar, yielding $\mathbf{W}_t = c\cdot\mathbf{A}^{H}$ .

Match to AMP

Plugging this back and applying the trace-normalization $\mathrm{tr}(\mathbf{W}_t\mathbf{A}) = N$ gives exactly the AMP matched filter in the asymptotic limit.

ex-ch21-09

Medium

For 1-bit GAMP with $y_i = \mathrm{sign}(z_i+w_i)$ , $w_i\sim\mathcal{N}(0,\sigma_w^2)$ , derive the output denoiser $g_{\text{out}}(y,\hat{z},\tau_p)$ from the truncated-Gaussian posterior.

Show Hint

Combine $z\sim\mathcal{N}(\hat{z},\tau_p)$ with $y = \mathrm{sign}(z+w)$ to get a truncated Gaussian.

Let $s = \sqrt{\tau_p+\sigma_w^2}$ .

Solution

Posterior

$p(z|y,\hat{z},\tau_p) \propto \mathcal{N}(z;\hat{z},\tau_p)\Phi(y\cdot z/\sigma_w)$ . This is a truncated-Gaussian with mean $\mathbb{E}[z|y] = \hat{z} + y\cdot\tau_p\phi(\hat{z}/s)/(s\Phi(y\hat{z}/s))$ .

Output denoiser

$g_{\text{out}}(y,\hat{z},\tau_p) = (\mathbb{E}[z|y] - \hat{z})/\tau_p = y\cdot\phi(\hat{z}/s)/(s\Phi(y\hat{z}/s))$ . Bounded as $|\hat{z}|\to\infty$ (saturation), reflecting that a 1-bit measurement carries at most 1 bit of information.

ex-ch21-10

Medium

Write the expectation-consistency fixed-point equations that define VAMP and compare them term-by-term with the OAMP iteration.

Show Hint

EC matches first and second moments between two cavity distributions.

The two cavities correspond to the prior factor and the likelihood factor.

Solution

EC conditions

At a VAMP fixed point the marginal means and variances $(\hat{\mathbf{x}}_k, \gamma_k)$ from the two factors agree: $\hat{\mathbf{x}}_1 = \hat{\mathbf{x}}_2$ and $\gamma_1 = \gamma_2$ . The update recursions iterate toward this consistency.

Mapping to OAMP

The prior-factor update is the Onsager-free denoiser; the likelihood-factor update is the LMMSE step. Across the two factors the Onsager correction manifests as the extrinsic- information subtraction in message passing — identical to OAMP's divergence-free normalization.

ex-ch21-11

Medium

In LAMP, the Onsager coefficient $b_t$ is trained rather than computed. Argue heuristically why this helps for non-i.i.d. $\mathbf{A}$ , and state what happens in the limit where the learned $b_t$ equals the AMP analytical value on i.i.d. Gaussian matrices.

Show Hint

AMP's analytical $b_t = \delta^{-1}\langle\eta'\rangle$ only holds for i.i.d. Gaussian.

A learned $b_t$ can capture spectrum-dependent corrections.

Solution

Flexibility

For non-i.i.d. matrices the correct Onsager factor depends on the matrix spectrum in a complicated way. A learned scalar $b_t$ discovers the effective feedback gain from data, bypassing the analytical formula.

i.i.d. limit

Trained on i.i.d. Gaussian data, $b_t$ converges to the AMP value $\delta^{-1}\langle\eta'\rangle$ and LAMP reduces to plain AMP with learned step sizes / denoiser parameters.

ex-ch21-12

Hard

Prove that if $\eta_t$ is Lipschitz and componentwise, then the divergence-free denoiser $\tilde{\eta}_t(\mathbf{r}) = C_t[\eta_t(\mathbf{r}) - \langle\eta_t'\rangle\mathbf{r}]$ has the orthogonality property $\frac{1}{N}\mathbb{E}\langle\tilde{\eta}_t(\mathbf{r})-\mathbf{x},\,\mathbf{r}-\mathbf{x}\rangle = 0$ when $\mathbf{r} = \mathbf{x}+\tau\mathbf{z}$ with $\mathbf{z}\sim\mathcal{N}(0,\mathbf{I})$ .

Show Hint

Use Stein's lemma: $\mathbb{E}[Zf(X+\tau Z)] = \tau\mathbb{E}[f'(X+\tau Z)]$ .

The $C_t$ scaling is chosen precisely to ensure orthogonality.

Solution

Expand the inner product

$\langle\tilde{\eta}_t(\mathbf{r})-\mathbf{x},\mathbf{r}-\mathbf{x}\rangle = \langle C_t\eta_t(\mathbf{r})-C_t\langle\eta_t'\rangle\mathbf{r}-\mathbf{x},\tau\mathbf{z}\rangle$ .

Apply Stein

Componentwise, $\mathbb{E}[z_i\eta_t(r_i)] = \tau\mathbb{E}[\eta_t'(r_i)]$ . Similarly $\mathbb{E}[z_i r_i] = \tau$ . Summing and dividing by $N$ gives $\tau[C_t\langle\eta_t'\rangle - C_t\langle\eta_t'\rangle] = 0$ .

Conclusion

The subtraction of $\langle\eta_t'\rangle\mathbf{r}$ is exactly what cancels the first-order Stein term. Orthogonality holds independently of $C_t$ (which only affects gain).

ex-ch21-13

Hard

For VAMP on an RRI matrix with spectrum $\mu(\lambda^2)$ , derive the scalar state-evolution update for the effective noise variance $\tau^2$ in terms of the eta-transform $\eta(\gamma) = \mathbb{E}_\mu[1/(1+\gamma^{-1}\lambda^2)]$ of the spectrum.

Show Hint

The VAMP LMMSE step diagonalizes in the SVD basis.

Integrate the per-singular-value MSE against $\mu$ .

Solution

LMMSE in SVD basis

Per coordinate, the posterior variance is $\lambda^2/(\lambda^2+\sigma^2/v_t)$ . Averaging over $\mu$ gives $\eta^{-1}$ -like integrals.

State evolution

$\tau_t^2 = \frac{1}{\delta}\cdot\frac{v_t\cdot\eta(v_t/\sigma^2)}{1-\eta(v_t/\sigma^2)}$ , closing the recursion with $v_{t+1} = \mathcal{E}(\tau_t^2;p_X)$ .

ex-ch21-14

Hard

Extend GAMP to the Poisson likelihood $p(y|z) = e^{-e^z}(e^z)^y/y!$ . Derive the output denoiser $g_{\text{out}}$ and explain why damping is necessary for this likelihood.

Show Hint

For Poisson, $\mathbb{E}[z|y,\hat{z},\tau_p]$ does not have a simple closed form — use a Laplace approximation.

The Poisson score grows exponentially in $z$ , which is the source of instability.

Solution

Laplace approximation

Maximize $-\frac{(z-\hat{z})^2}{2\tau_p} - e^z + y z$ over $z$ to get $z^\star$ satisfying $e^{z^\star} = y - (z^\star-\hat{z})/\tau_p$ . Solve iteratively; the second derivative gives the posterior variance.

Output denoiser

$g_{\text{out}}(y,\hat{z},\tau_p) = (z^\star - \hat{z})/\tau_p$ with Jacobian $-\partial_z g_{\text{out}} = 1/(\tau_p + e^{-z^\star})$ .

Damping necessity

The denoiser's gradient $e^{z^\star}/(\text{something})$ can explode when $z^\star$ is large. Damping with $\beta\approx 0.5$ on the vector iterates $\hat{\mathbf{x}},\hat{\mathbf{s}}$ prevents the oscillations that would otherwise derail GAMP.

ex-ch21-15

Hard

Prove that LISTA with optimal layer-wise parameters achieves a linear convergence rate strictly better than ISTA under the RIP condition $\delta_{2s} < 1/3$ .

Show Hint

Construct LISTA parameters that reproduce scaled ISTA; this gives an upper bound on the LISTA optimum.

Optimize the per-layer step size using RIP bounds.

Solution

Scaled ISTA as LISTA special case

Set $\mathbf{W}_t = \alpha_t\mathbf{A}^{H}$ , $\mathbf{S}_t = \mathbf{I}-\alpha_t\mathbf{A}^{H}\mathbf{A}$ , $\lambda_t = \alpha_t\lambda$ . This is ISTA with step $\alpha_t$ .

Layerwise optimum

Minimize the one-step contraction under the RIP bound to get $\alpha_t^\star = 1/(1+\delta_{2s})$ and $q^\star = 2\delta_{2s}/(1-\delta_{2s})<1$ when $\delta_{2s}<1/3$ .

LISTA beats this

The full LISTA parameterization contains $\{\mathbf{W}_t,\mathbf{S}_t\}$ independent of $\alpha$ , with many more degrees of freedom. Its training optimum thus dominates the scaled-ISTA bound, giving a strictly smaller contraction $q<q^\star$ .

ex-ch21-16

Challenge

Design an OAMP variant for the structured sensing matrix $\mathbf{A} = \mathbf{D}\mathbf{F}$ where $\mathbf{D}$ is diagonal with i.i.d. random $\pm 1$ entries and $\mathbf{F}$ is the $N\times N$ DFT. Specifically, (a) compute the Gram $\mathbf{A}^{H}\mathbf{A}$ , (b) specify the LMMSE filter, and (c) analyze the per-iteration complexity.

Show Hint

$\mathbf{D}^H\mathbf{D} = \mathbf{I}$ for $\pm 1$ diagonals.

DFT multiplication costs $O(N\log N)$ .

Solution

Gram

$\mathbf{A}^{H}\mathbf{A} = \mathbf{F}^H\mathbf{D}^H\mathbf{D}\mathbf{F} = \mathbf{F}^H\mathbf{F} = N\mathbf{I}_N$ . The operator is an orthogonal $N\times N$ matrix up to scale.

LMMSE filter

$\hat{\mathbf{W}}_t = \mathbf{A}^{H}(\mathbf{A}\mathbf{A}^{H} + (\sigma^2/v_t)\mathbf{I})^{-1}$ . But since $\mathbf{A}\mathbf{A}^{H} = N\mathbf{I}_M$ , $\hat{\mathbf{W}}_t = \alpha_t\mathbf{A}^{H}$ with $\alpha_t = 1/(N+\sigma^2/v_t)$ .

Complexity

Per iteration: $\mathbf{A}\hat{\mathbf{x}}$ and $\mathbf{A}^{H}\mathbf{r}$ each cost $O(N\log N)$ (diagonal mod + FFT). Divergence-free normalization is scalar. Total: $O(N\log N)$ per iteration.

ex-ch21-17

Challenge

Propose and analyze a hybrid estimator that runs LDVAMP for a fixed number of layers, then switches to analytical OAMP for a refinement pass. Argue why this hybrid could outperform either algorithm alone.

Show Hint

LDVAMP adapts to empirical distributions; OAMP inherits asymptotic guarantees.

Treat the LDVAMP output as a warm start for OAMP.

Solution

Pipeline

Run $T_{\text{learned}}$ LDVAMP layers, producing $\hat{\mathbf{x}}^{T_{\text{learned}}}$ and variance estimate $v^{T_{\text{learned}}}$ . Initialize OAMP at this point and run $T_{\text{refine}}$ analytical iterations.

Why this helps

LDVAMP quickly reaches a low-MSE regime exploiting empirical priors and structured correlations. Within this regime OAMP's scalar state-evolution description applies, so the refinement pass enjoys predictable guarantees and removes residual bias due to training-set idiosyncrasies. The hybrid combines the adaptivity of learning with the robustness of analytical analysis.

Analysis

One can bound the hybrid MSE by $\min(\mathrm{MSE}_{\text{LDVAMP}}, \mathcal{F}^{T_{\text{refine}}}(v^{T_{\text{learned}}}))$ , showing the hybrid is no worse than either component up to constants that depend on the warm-start quality.

Exercises

ex-ch21-01

Definition

Examples

ex-ch21-02

Normalizations

ex-ch21-03

Determinants

Gaussian case

ex-ch21-04

ISTA init

Why

ex-ch21-05

Verify the Gram

LMMSE filter

Normalization

ex-ch21-06

Identity

OAMP step

ex-ch21-07

Unbiasedness

Variance

ex-ch21-08

Spectrum

LMMSE filter collapses

Match to AMP

ex-ch21-09

Posterior

Output denoiser

ex-ch21-10

EC conditions

Mapping to OAMP

ex-ch21-11

Flexibility

i.i.d. limit

ex-ch21-12

Expand the inner product

Apply Stein

Conclusion

ex-ch21-13

LMMSE in SVD basis

State evolution

ex-ch21-14

Laplace approximation

Output denoiser

Damping necessity

ex-ch21-15

Scaled ISTA as LISTA special case

Layerwise optimum

LISTA beats this

ex-ch21-16

Gram

LMMSE filter

Complexity

ex-ch21-17

Pipeline

Why this helps

Analysis