Exercises
ex-ch20-01
EasyWrite down the AMP iteration for soft-thresholding with threshold and . Identify the Onsager term explicitly.
Start from .
For soft-thresholding, , so is the fraction of non-zero components of .
Full iteration
\delta^{-1}|\hat{\mathbf{x}}^{t+1}|_0/N\cdot\mathbf{r}^t$ is the Onsager correction.
ex-ch20-02
EasyShow that the derivative of the soft-threshold denoiser is the indicator of being above threshold: almost everywhere. Where is non-differentiable, and what does the SE analysis require about this?
Differentiate each piece of the piecewise-linear definition.
Non-differentiable points occur where exactly.
Piecewise derivative
for (derivative ); for (derivative ); for (derivative ). Thus for .
SE requirement
SE requires Lipschitz, not differentiable everywhere. Soft- thresholding is 1-Lipschitz, so SE applies. The measure-zero set does not affect the expectation because has a continuous density.
ex-ch20-03
EasyFor a Bernoulli--Gaussian prior , write the state-evolution recursion for AMP with soft-thresholding.
.
Split the expectation over (probability ) and (probability ).
SE recursion
X \sim \mathcal{N}(0,1)Z \sim \mathcal{N}(0,1)$ independent.
Closed-form MSE
Each expectation reduces to Gaussian integrals involving the standard normal CDF and PDF . The non-zero term gives, with and , , which can be fully expanded using standard Gaussian identities (see Donoho--Maleki--Montanari 2011).
ex-ch20-04
EasyVerify that in the noiseless case , the SE recursion for L1-AMP takes the form when , for some function independent of . Under what condition does ?
Change variables: let so that .
The scaling is crucial for making the recursion homogeneous.
Scaling
With the MSE of soft-thresholding scales as because both the input noise and the threshold scale identically. Precisely, .
Convergence condition
The recursion is geometric with ratio . It converges to zero iff , i.e., . Optimising gives below the Donoho--Tanner curve as the recovery region.
ex-ch20-05
MediumDerive the MMSE denoiser for a Bernoulli--Gaussian prior observed in Gaussian noise of variance . Show that is smooth (in contrast to the soft-threshold) and compute its derivative.
Apply Bayes' rule to obtain the posterior weight .
Conditional on , .
Posterior weight
$
Posterior mean
$
Derivative
By the Stein identity, . The posterior variance is , giving a smooth, bounded .
ex-ch20-06
MediumProve the Stein identity: for a smooth bounded and with independent of , (This is the identity that connects divergence of the MMSE denoiser to its posterior variance.)
Use integration by parts, treating as the derivative of a Gaussian density.
For a standard Gaussian , (Stein's lemma).
Reduce to Stein's lemma
Write , so conditional on , . by Stein's lemma applied to the conditional Gaussian.
Unconditional form
Take the expectation over : . Since and a direct computation shows the cross terms equal , rearrangement yields the claimed identity.
ex-ch20-07
MediumImplement AMP with soft-thresholding in 20 lines of NumPy. For , a sparse Bernoulli--Gaussian signal with , , and where , run 20 iterations and report the normalised MSE.
Use for entries.
Initialise , .
Don't forget the Onsager term — use np.mean(np.abs(u) > lam) as the empirical divergence.
Core loop
import numpy as np
rng = np.random.default_rng(0)
N, M, rho, sigma = 500, 250, 0.1, 1e-2
A = rng.normal(0, 1/np.sqrt(M), size=(M, N))
x = rng.normal(size=N) * (rng.uniform(size=N) < rho)
y = A @ x + sigma * rng.normal(size=M)
xh = np.zeros(N); r = y.copy(); rprev = np.zeros(M)
for t in range(20):
tau = np.linalg.norm(r)/np.sqrt(M)
lam = 1.5 * tau
u = A.T @ r + xh
xh = np.sign(u) * np.maximum(np.abs(u) - lam, 0)
b = np.mean(np.abs(u) > lam) * (N/M)
r = y - A @ xh + b * r
mse = np.linalg.norm(xh - x)**2 / np.linalg.norm(x)**2
Expected output
Typical run produces NMSE around in 10--15 iterations. If the Onsager term is removed (set ), MSE plateaus at a much larger value — the ISTA residual.
ex-ch20-08
MediumExplain the calibration equation linking L1-AMP's threshold to the effective LASSO regulariser. What does it mean for the effective penalty to be smaller than the AMP threshold?
Start from the AMP fixed-point equations and compare to LASSO's KKT conditions.
Note that is a fraction in when .
Fixed-point equation
At an AMP fixed point: . Denote the scaling factor by .
Matching LASSO KKT
LASSO KKT: . Substituting AMP's fixed-point gives , while AMP says . Comparing: .
Interpretation
The effective LASSO penalty is a fraction of the AMP threshold because the Onsager term absorbs part of the residual. Practically: if you want LASSO at penalty , set AMP's threshold to , where is recomputed from the current Onsager coefficient.
ex-ch20-09
MediumLet . Show that is monotone increasing and continuous in . Conclude that the AMP recursion is monotone in .
MSE of any fixed denoiser is monotone in the input noise variance.
Monotone bounded sequences converge.
Monotonicity of scalar MSE
For a Lipschitz denoiser, the input distribution becomes more spread-out as increases, so the best achievable MSE cannot decrease. A formal proof couples two copies and uses the data-processing inequality.
Monotone iteration
Since is monotone non-decreasing, is a monotone dynamical system: either (and the sequence is non-decreasing) or (and the sequence is non-increasing). In either case, by monotone-convergence, converges to a fixed point.
ex-ch20-10
MediumConstruct a sensing matrix that makes AMP diverge and provide a numerical demonstration. Use a sub-sampled DFT matrix with , .
Take where is the DFT and is a random row subsampling.
Run AMP for 30 iterations, plot against .
Setup
F = np.fft.fft(np.eye(N)) / np.sqrt(N)
idx = rng.choice(N, M, replace=False)
A = np.sqrt(N/M) * F[idx] # random rows of DFT
Apply the same AMP loop as in Exercise 7.
Observation
The residual norm typically grows super-linearly after a few iterations and overflows. Damping with stabilises the iteration but the MSE trajectory no longer matches the SE prediction.
ex-ch20-11
MediumCompute (numerically) the Donoho--Tanner phase-transition curve by scanning the plane and checking whether the L1-AMP SE recursion with optimal converges to zero.
For each , sweep and find the one minimising .
The boundary is where .
SE slope computation
For each , compute numerically by Monte Carlo or Gaussian quadrature. Minimise over . The curve is the largest such that the minimum equals .
Expected curve
The resulting curve is monotone, concave, starts at and approaches . For one finds , in agreement with Donoho--Tanner combinatorial-geometry predictions.
ex-ch20-12
HardDerive Taylor-expansion cancellation that gives the Onsager term. Concretely, show that without the Onsager term the conditional distribution of given the history has a bias proportional to , and compute the proportionality constant.
Use the cavity decomposition .
Expand around .
Cavity decomposition
Isolate column : . The pseudo-data at index is . Linearising introduces a self-feedback term of size .
Correction factor
Tracking the Taylor expansion to first order, one finds that the bias in the denoiser input due to self-feedback is at each coordinate. Aggregating: the residual at step should have added to cancel this bias — precisely the Onsager term.
ex-ch20-13
HardShow that damped AMP with parameter can be viewed as a single AMP step with an augmented denoiser. Use this to explain qualitatively why damping breaks state evolution (except in the limit ).
Define an effective denoiser that depends on the previous estimate.
Recall that SE requires the denoiser to act coordinate-wise on pseudo-data that is i.i.d. across iterations.
Rewriting as augmented denoiser
The damped update is an affine combination of the old iterate and the raw denoiser output. If we include as a side input to the denoiser, the iteration looks like a single AMP step with a history-dependent .
Breakdown of SE
SE demands that the denoiser output at iteration depend only on (a Gaussian conditionally on history). Damped AMP's effective denoiser explicitly depends on , introducing correlation across iterations that changes the covariance structure of the pseudo-data and breaks the scalar recursion.
Limit $\theta\to 0$
In the continuous-time limit (many damped steps per ``true'' step) the iterate traces a gradient flow on the AMP fixed-point equation. SE re-emerges in this limit but with a different time-scaling.
ex-ch20-14
HardFor a non-negative sparse signal with prior , design a denoiser that is Bayes-optimal and verify via simulation that the AMP MSE converges to the state-evolution fixed point predicted by this denoiser.
Use the posterior: for truncated-Gaussian-Exponential mixture.
The resulting denoiser is non-negative, monotone, and smooth on .
Posterior
Compute for : the posterior is a mixture of a point mass at and a truncated Gaussian. The posterior mean is available in closed form via the error function.
Implementation
Evaluate the posterior mean in closed form; its derivative is (posterior variance)/ by the Stein identity. Replace the soft-threshold in the AMP loop with this denoiser and compute the Onsager coefficient as the empirical mean of the derivative.
Empirical check
The MSE trajectory should closely match the scalar . For , , this MSE is typically 2--3 dB below the L1-AMP MSE.
ex-ch20-15
HardShow that AMP with the identity denoiser reduces to pseudo-inverse iteration and is unstable for . Compute the state-evolution recursion for this degenerate case and identify when it has a fixed point.
, so and the Onsager coefficient is .
The recursion becomes (identity denoiser has MSE if , else infinite).
SE recursion
With : . So .
Stability
The fixed point is when . For the recursion diverges — the identity denoiser does not exploit any prior information, so AMP reduces to an underdetermined pseudo-inverse which cannot average down the noise.
ex-ch20-16
ChallengeProve that, in the large- limit, running AMP on with i.i.d.\ Gaussian entries achieves the replica-symmetric MMSE when is matched to the prior. Contrast with LASSO's MSE computed from the same state-evolution machinery.
Use Bayati--Montanari SE convergence + the fact that the MMSE denoiser minimises the scalar MSE at every noise level.
Connection: replica-symmetric MMSE = .
Optimality of MMSE at each step
For any , is minimised over all measurable by the posterior mean .
Fixed-point comparison
Denote by the fixed point achieved with and by the fixed point with optimal soft-thresholding. Since at every the MMSE curve is below the L1 curve, the fixed points satisfy , with strict inequality whenever the prior is non-Gaussian.
Replica-symmetric interpretation
The replica-symmetric cavity calculation yields an equation identical to the Bayes-AMP state-evolution fixed point (Reeves-- Pfister 2019). Thus the AMP MSE equals the conjectured (and rigorously proved in the i.i.d. Gaussian case) information- theoretic MMSE limit.
ex-ch20-17
ChallengeDesign a D-AMP variant using a 3-layer neural denoiser trained on Bernoulli--Gaussian signal plus AWGN at various noise levels. Compare its MSE trajectory to L1-AMP and Bayes-AMP. Explain theoretically why learned denoisers can in principle match Bayes-AMP and why, in practice, they often fall short.
Train a small MLP to map (u, tau) to the posterior mean estimate.
Estimate divergence via Monte Carlo: for .
Architecture
MLP with input , two hidden layers of 64 units each, ReLU activations, scalar output. Train with MSE loss on synthetic pairs .
D-AMP loop
Use the D-AMP pseudo-code: apply the network to the pseudo-data vector, estimate divergence by the Monte-Carlo rule, update the residual with the estimated divergence.
In-principle optimality
A universal approximator of the posterior mean would make D-AMP equivalent to Bayes-AMP. In practice, (i) the network is trained over a finite set of noise levels, introducing bias for intermediate ; (ii) the MC divergence estimate has variance; (iii) training imbalance favours large signal components. Each introduces a fixed- point gap.