Exercises
ex-ch21-01
EasyDefine the right-rotationally-invariant (RRI) ensemble in terms of the SVD . Give two examples of matrices inside the RRI class and one outside.
RRI concerns the right singular basis .
i.i.d. Gaussian is RRI; a diagonal matrix with structured diagonal entries is not.
Definition
is RRI iff is Haar-distributed on the unitary group, independently of and .
Examples
Inside RRI: (i) i.i.d. Gaussian matrices; (ii) random sub-sampled DFT / Hadamard matrices. Outside RRI: a deterministic Vandermonde matrix with fixed nodes.
ex-ch21-02
EasyWrite the OAMP iteration in one line and identify the three normalizations (trace on , subtract , rescale by ). What does each normalization accomplish?
Linear step: .
Denoise: .
Normalizations
(1) makes the linear step unity-gain on . (2) Subtracting kills the input bias of (divergence-free denoiser). (3) restores unit gain on after the subtraction.
ex-ch21-03
EasyIn GAMP, the input denoiser is determined by the signal prior; what determines the output denoiser ? Write for the standard Gaussian likelihood .
depends on the measurement likelihood.
For Gaussian likelihoods the conditional mean of is available in closed form.
Determinants
is determined by the per- element likelihood and the current state .
Gaussian case
, a simple scaled residual.
ex-ch21-04
EasyIn LISTA the matrices are learnable. What are the standard ISTA-based initial values, and why is initialization from these values crucial?
ISTA: with step .
Random init of 100k+ parameters rarely finds a good basin.
ISTA init
, , , where is the Lipschitz constant of the gradient, .
Why
The ISTA init is provably convergent and gives the network a meaningful starting point. Random init typically fails to recover any convergent behaviour even after long training β the loss landscape has many bad basins at random points.
ex-ch21-05
MediumConsider where is DFT and selects rows. Show that . Using this, compute explicitly and show it reduces to a scalar multiple of .
DFT: .
.
Verify the Gram
. Since (rows are distinct standard basis vectors), this equals .
LMMSE filter
with .
Normalization
, so and scaled to satisfy the trace constraint.
ex-ch21-06
MediumDerive the Kronecker identity and use it to express the OAMP linear step for as a matrix equation in the reshaped signal .
stacks columns.
This identity is the reason Kronecker structure reduces complexity dramatically.
Identity
. One direction: the entry of both sides coincides.
OAMP step
Reshape . The linear step becomes (reshaped). The Kronecker LMMSE exploits the diagonal structure in the per-factor SVD bases.
ex-ch21-07
MediumThe Hutchinson estimator computes with having i.i.d. entries. Show that it is unbiased and compute its variance in terms of .
.
For the variance, compute .
Unbiasedness
.
Variance
For Rademacher , . With probes, the estimator has variance .
ex-ch21-08
MediumProve that for i.i.d. Gaussian with variance , the AMP and OAMP matched-filter LMMSE updates coincide asymptotically. (Hence OAMP inherits AMP's behaviour on this ensemble.)
Singular values of follow Marchenko-Pastur.
The LMMSE filter applied to the MP spectrum degenerates to a scaled adjoint.
Spectrum
The non-zero squared singular values of concentrate on the MP support with . In the large- limit the spectrum has fixed shape.
LMMSE filter collapses
. Diagonalizing in the SVD basis, each squared singular value is mapped to . For flat MP spectra this averages to a scalar, yielding .
Match to AMP
Plugging this back and applying the trace-normalization gives exactly the AMP matched filter in the asymptotic limit.
ex-ch21-09
MediumFor 1-bit GAMP with , , derive the output denoiser from the truncated-Gaussian posterior.
Combine with to get a truncated Gaussian.
Let .
Posterior
. This is a truncated-Gaussian with mean .
Output denoiser
. Bounded as (saturation), reflecting that a 1-bit measurement carries at most 1 bit of information.
ex-ch21-10
MediumWrite the expectation-consistency fixed-point equations that define VAMP and compare them term-by-term with the OAMP iteration.
EC matches first and second moments between two cavity distributions.
The two cavities correspond to the prior factor and the likelihood factor.
EC conditions
At a VAMP fixed point the marginal means and variances from the two factors agree: and . The update recursions iterate toward this consistency.
Mapping to OAMP
The prior-factor update is the Onsager-free denoiser; the likelihood-factor update is the LMMSE step. Across the two factors the Onsager correction manifests as the extrinsic- information subtraction in message passing β identical to OAMP's divergence-free normalization.
ex-ch21-11
MediumIn LAMP, the Onsager coefficient is trained rather than computed. Argue heuristically why this helps for non-i.i.d. , and state what happens in the limit where the learned equals the AMP analytical value on i.i.d. Gaussian matrices.
AMP's analytical only holds for i.i.d. Gaussian.
A learned can capture spectrum-dependent corrections.
Flexibility
For non-i.i.d. matrices the correct Onsager factor depends on the matrix spectrum in a complicated way. A learned scalar discovers the effective feedback gain from data, bypassing the analytical formula.
i.i.d. limit
Trained on i.i.d. Gaussian data, converges to the AMP value and LAMP reduces to plain AMP with learned step sizes / denoiser parameters.
ex-ch21-12
HardProve that if is Lipschitz and componentwise, then the divergence-free denoiser has the orthogonality property when with .
Use Stein's lemma: .
The scaling is chosen precisely to ensure orthogonality.
Expand the inner product
.
Apply Stein
Componentwise, . Similarly . Summing and dividing by gives .
Conclusion
The subtraction of is exactly what cancels the first-order Stein term. Orthogonality holds independently of (which only affects gain).
ex-ch21-13
HardFor VAMP on an RRI matrix with spectrum , derive the scalar state-evolution update for the effective noise variance in terms of the eta-transform of the spectrum.
The VAMP LMMSE step diagonalizes in the SVD basis.
Integrate the per-singular-value MSE against .
LMMSE in SVD basis
Per coordinate, the posterior variance is . Averaging over gives -like integrals.
State evolution
, closing the recursion with .
ex-ch21-14
HardExtend GAMP to the Poisson likelihood . Derive the output denoiser and explain why damping is necessary for this likelihood.
For Poisson, does not have a simple closed form β use a Laplace approximation.
The Poisson score grows exponentially in , which is the source of instability.
Laplace approximation
Maximize over to get satisfying . Solve iteratively; the second derivative gives the posterior variance.
Output denoiser
with Jacobian .
Damping necessity
The denoiser's gradient can explode when is large. Damping with on the vector iterates prevents the oscillations that would otherwise derail GAMP.
ex-ch21-15
HardProve that LISTA with optimal layer-wise parameters achieves a linear convergence rate strictly better than ISTA under the RIP condition .
Construct LISTA parameters that reproduce scaled ISTA; this gives an upper bound on the LISTA optimum.
Optimize the per-layer step size using RIP bounds.
Scaled ISTA as LISTA special case
Set , , . This is ISTA with step .
Layerwise optimum
Minimize the one-step contraction under the RIP bound to get and when .
LISTA beats this
The full LISTA parameterization contains independent of , with many more degrees of freedom. Its training optimum thus dominates the scaled-ISTA bound, giving a strictly smaller contraction .
ex-ch21-16
ChallengeDesign an OAMP variant for the structured sensing matrix where is diagonal with i.i.d. random entries and is the DFT. Specifically, (a) compute the Gram , (b) specify the LMMSE filter, and (c) analyze the per-iteration complexity.
for diagonals.
DFT multiplication costs .
Gram
. The operator is an orthogonal matrix up to scale.
LMMSE filter
. But since , with .
Complexity
Per iteration: and each cost (diagonal mod + FFT). Divergence-free normalization is scalar. Total: per iteration.
ex-ch21-17
ChallengePropose and analyze a hybrid estimator that runs LDVAMP for a fixed number of layers, then switches to analytical OAMP for a refinement pass. Argue why this hybrid could outperform either algorithm alone.
LDVAMP adapts to empirical distributions; OAMP inherits asymptotic guarantees.
Treat the LDVAMP output as a warm start for OAMP.
Pipeline
Run LDVAMP layers, producing and variance estimate . Initialize OAMP at this point and run analytical iterations.
Why this helps
LDVAMP quickly reaches a low-MSE regime exploiting empirical priors and structured correlations. Within this regime OAMP's scalar state-evolution description applies, so the refinement pass enjoys predictable guarantees and removes residual bias due to training-set idiosyncrasies. The hybrid combines the adaptivity of learning with the robustness of analytical analysis.
Analysis
One can bound the hybrid MSE by , showing the hybrid is no worse than either component up to constants that depend on the warm-start quality.