Denoiser Design for AMP
Why Denoiser Design Matters
State evolution tells us that AMP's terminal MSE is the fixed point of . Therefore the denoiser is the algorithmic knob that controls AMP's behaviour. A poor denoiser creates a worse fixed point; a good one pushes AMP toward the information-theoretic limit.
This section surveys the three canonical denoiser families and shows how each one realises a classical estimator β soft-thresholding LASSO, posterior mean MMSE/Bayes-optimal, and learned networks D-AMP β all within the same AMP scaffolding.
Theorem: L1-AMP Fixed Point = LASSO Solution
Let AMP be run with the soft-threshold denoiser and a fixed threshold schedule . Let the SE recursion have a unique stable fixed point with . Then the AMP fixed point coincides with the LASSO solution for an effective regulariser .
LASSO is the minimiser of a convex functional; AMP with soft-thresholding is an efficient iterative solver whose fixed point coincides with that minimiser. The relation is known as the calibration equation and is how the AMP threshold maps to the conventional LASSO regulariser.
Stationarity of AMP
At a fixed point we have and , i.e.\ .
Subgradient condition for LASSO
The LASSO KKT conditions state that . Using the soft-threshold identity iff , the AMP fixed-point equation above rearranges exactly into the LASSO KKT conditions with .
Uniqueness
When the LASSO has a unique minimiser (generic case for i.i.d.\ Gaussian with ), the AMP fixed point must coincide with it.
Analytic LASSO Risk via AMP
Theorem TL1-AMP Fixed Point = LASSO Solution has a non-obvious corollary: because AMP's terminal MSE equals the LASSO MSE and because AMP's MSE is predicted exactly by the scalar state-evolution fixed point, we obtain a closed-form prediction for the high-dimensional LASSO risk, parameterised by .
Before this connection was made (Bayati--Montanari 2012, Donoho--Montanari 2016), sharp asymptotic predictions for LASSO in the proportional regime were beyond reach. AMP is thus both an algorithm and an analytical tool.
Definition: MMSE Denoiser
MMSE Denoiser
For a prior and Gaussian observation , , the MMSE denoiser is the posterior mean where is the zero-mean Gaussian density with variance . Its derivative satisfies the Tweedie / Stein identity .
The Stein identity means that for the MMSE denoiser the Onsager coefficient has a beautiful interpretation: it is the posterior variance divided by the effective noise variance β i.e., the fraction of the input variance that the denoiser cannot remove.
Example: MMSE for Bernoulli--Gaussian Prior
Let . Derive in closed form.
Posterior over the mixture component
The posterior probability of the "non-zero" component given is
Posterior mean
Conditioning on gives a Gaussian-Gaussian update with mean . Combining, This is a shrinker (small pulled harder to zero than soft-thresholding) that is smooth, strictly monotone, and matches the true prior.
Interpretation
In Bayes-optimal AMP this denoiser replaces soft-thresholding. The state-evolution fixed point is the conjectured (and, for many priors, proved) Bayes MMSE β typically strictly smaller than the LASSO MSE, especially near the phase-transition boundary.
Minimax Denoisers and Parameter-Free AMP
What if is unknown? Donoho--Maleki--Montanari (2011) construct a minimax soft-threshold that minimises the worst-case MSE over all -sparse priors. The resulting is a universal constant β a single number per . Parameter-free AMP uses this and requires no prior knowledge of signal amplitudes or exact sparsity level.
Minimax AMP is 1--2 dB worse than oracle-Bayes AMP, but only requires knowing . It is the right default for sparse-recovery problems where the prior is poorly specified.
Definition: D-AMP: Denoising-based AMP
D-AMP: Denoising-based AMP
D-AMP (Denoising-based AMP) replaces the scalar denoiser with a general (possibly neural, possibly non-local) image denoiser that takes an estimate of the noise level and produces a denoised output: The scalar Onsager coefficient is replaced by the divergence , estimated via Monte Carlo (Ramani et al., 2008; Metzler et al., 2016).
Learned Denoisers and Deep Unfolding
D-AMP opens the door to learned denoisers: train a CNN (e.g., DnCNN) to denoise Gaussian noise at a given level , then plug it into the AMP scaffold. Because the effective input noise is guaranteed Gaussian by the Onsager machinery, the denoiser's training distribution matches the deployment distribution at every iteration.
Taking this one step further: unfold the AMP iteration into a -layer feed-forward network where each denoiser's weights are trained end-to-end. This is LAMP / Learned AMP (Borgerding--Schniter 2017) β the subject of Chapter 21.5 and central to deep-unfolding architectures for RF imaging (Book 2, Chapter 27.4).
Denoiser Choices for AMP
| Denoiser | Equivalent Estimator | Prior Info Needed | MSE vs Bayes Limit | Notes |
|---|---|---|---|---|
| Soft-threshold | LASSO () | Threshold schedule only | Loose (gap depends on ) | Piecewise linear, convex equivalent |
| Minimax soft-threshold | Minimax LASSO | only | 1-2 dB from Bayes, no tuning | Parameter-free AMP |
| MMSE | Posterior mean | Full prior | Matches replica prediction | Requires known prior |
| Hard-threshold | IHT-like | Threshold only | Loose; discontinuous SE fails | Not Lipschitz |
| Neural denoiser (D-AMP) | Learned prior | Training data | Can match/beat MMSE | Divergence estimated via MC |
Denoiser MSE Curves
Plot scalar denoising MSE as a function of the effective noise for soft-threshold (LASSO), MMSE (Bayes), and naive identity denoisers on a Bernoulli--Gaussian prior. The steeper the descent of the MSE curve, the better the state-evolution fixed point.
Parameters
Std. of non-zero entries
Soft-threshold factor $\lambda=\alpha\tau$
Learned Denoisers for Structured Compressed Sensing
The CommIT group at TU Berlin has investigated learned-denoiser AMP variants for large-scale communication problems where priors are only implicitly specified (training data) or vary across realisations (dynamic spectrum, user activity detection). The focus is on wrapping denoiser networks around the OAMP/VAMP scaffolding of Chapter 21 so that the Gaussianity-of-input property can be preserved under structured matrices typical of wireless channels.
Quick Check
If we run AMP with soft-thresholding and the state-evolution fixed point is strictly positive, what does that tell us about the problem?
AMP has a bug
The pair lies above the Donoho--Tanner curve, so recovery cannot drive MSE to zero at this sparsity level
The noise variance is too small
Use hard-thresholding instead
Correct. A positive stable fixed point means that with the given regularisation, the SE recursion does not collapse to zero β there is a non-vanishing residual error dictated by the geometry of the problem.
Quick Check
Which denoiser choice makes AMP asymptotically Bayes-optimal (in the proportional regime)?
Soft-threshold with minimax
Posterior mean matched to the true prior
Hard-threshold with threshold
Identity:
When is the Bayes-optimal scalar denoiser for the prior, the SE fixed point coincides with the replica-symmetric prediction of the Bayes MMSE β optimal in the proportional asymptotic regime.
Common Mistake: Non-Lipschitz Denoisers Break State Evolution
Mistake:
Using hard-thresholding, rank-truncation with a hard cutoff, or any discontinuous denoiser in an AMP-like framework and applying the scalar state-evolution formula.
Correction:
The Bayati--Montanari state-evolution theorem requires the denoiser to be Lipschitz (or at least pseudo-Lipschitz of finite order) in its first argument. Discontinuous denoisers violate this and yield non-Gaussian pseudo-data. If you must use hard-thresholding, smooth it (e.g., replace by a steep sigmoid) and monitor divergence carefully.
Key Takeaway
The choice of denoiser is the principal design lever in AMP. Soft- thresholding recovers LASSO; matched posterior means realise Bayes-optimal inference; learned neural denoisers extend the reach to unknown priors. In every case, the Onsager correction (or its divergence-based analogue) keeps the pseudo-data Gaussian so that the denoiser operates under the conditions for which it is designed.