Denoiser Design for AMP

Why Denoiser Design Matters

State evolution tells us that AMP's terminal MSE is the fixed point of Ξ¨(Ο„2)=Οƒ2+Ξ΄βˆ’1E[(Ξ·(X+Ο„Z;ΞΈ)βˆ’X)2]\Psi(\tau^2) = \sigma^2 + \delta^{-1}\mathbb{E}[(\eta(X+\tau Z;\theta)-X)^2]. Therefore the denoiser Ξ·\eta is the algorithmic knob that controls AMP's behaviour. A poor denoiser creates a worse fixed point; a good one pushes AMP toward the information-theoretic limit.

This section surveys the three canonical denoiser families and shows how each one realises a classical estimator β€” soft-thresholding ↔\leftrightarrow LASSO, posterior mean ↔\leftrightarrow MMSE/Bayes-optimal, and learned networks ↔\leftrightarrow D-AMP β€” all within the same AMP scaffolding.

Theorem: L1-AMP Fixed Point = LASSO Solution

Let AMP be run with the soft-threshold denoiser Ξ·st(u;Ξ»)=sign(u)(∣uβˆ£βˆ’Ξ»)+\eta_{\mathrm{st}}(u;\lambda) = \mathrm{sign}(u)(|u|-\lambda)_+ and a fixed threshold schedule Ξ»t=Ξ±Ο„t\lambda_t = \alpha \tau_t. Let the SE recursion have a unique stable fixed point τ⋆\tau_\star with ατ⋆=λ⋆\alpha\tau_\star = \lambda_\star. Then the AMP fixed point x^∞\hat{\mathbf{x}}^\infty coincides with the LASSO solution x^LASSO=arg⁑min⁑z12βˆ₯yβˆ’Azβˆ₯2+Ξ»effβˆ₯zβˆ₯1\hat{\mathbf{x}}_{\mathrm{LASSO}} = \arg\min_{\mathbf{z}} \tfrac{1}{2}\|\mathbf{y}-\mathbf{A}\mathbf{z}\|^2 + \lambda_{\mathrm{eff}}\|\mathbf{z}\|_1 for an effective regulariser Ξ»eff=λ⋆(1βˆ’Ξ΄βˆ’1βŸ¨Ξ·β€²βŸ©β‹†)\lambda_{\mathrm{eff}} = \lambda_\star(1 - \delta^{-1}\langle\eta'\rangle_\star).

LASSO is the minimiser of a convex functional; AMP with soft-thresholding is an efficient iterative solver whose fixed point coincides with that minimiser. The relation Ξ»eff=λ⋆(1βˆ’Ξ΄βˆ’1βŸ¨Ξ·β€²βŸ©β‹†)\lambda_{\mathrm{eff}} = \lambda_\star(1-\delta^{-1}\langle\eta'\rangle_\star) is known as the calibration equation and is how the AMP threshold maps to the conventional LASSO regulariser.

,

Analytic LASSO Risk via AMP

Theorem TL1-AMP Fixed Point = LASSO Solution has a non-obvious corollary: because AMP's terminal MSE equals the LASSO MSE and because AMP's MSE is predicted exactly by the scalar state-evolution fixed point, we obtain a closed-form prediction for the high-dimensional LASSO risk, parameterised by (Ξ΄,ρ,Οƒ2,Ξ»eff)(\delta, \rho, \sigma^2, \lambda_{\mathrm{eff}}).

Before this connection was made (Bayati--Montanari 2012, Donoho--Montanari 2016), sharp asymptotic predictions for LASSO in the proportional regime were beyond reach. AMP is thus both an algorithm and an analytical tool.

Definition:

MMSE Denoiser

For a prior pXp_X and Gaussian observation U=X+Ο„ZU = X + \tau Z, Z∼N(0,1)Z \sim \mathcal{N}(0,1), the MMSE denoiser is the posterior mean Ξ·mmse(u;Ο„2)=E[X∣U=u]=∫x pX(x) φτ(uβˆ’x) dx∫pX(x) φτ(uβˆ’x) dx,\eta_{\mathrm{mmse}}(u;\tau^2) = \mathbb{E}[X \mid U=u] = \frac{\int x\, p_X(x)\, \varphi_\tau(u-x)\,\mathrm{d}x}{\int p_X(x)\, \varphi_\tau(u-x)\,\mathrm{d}x}, where φτ\varphi_\tau is the zero-mean Gaussian density with variance Ο„2\tau^2. Its derivative satisfies the Tweedie / Stein identity Ξ·mmseβ€²(u;Ο„2)=Var(X∣U=u)/Ο„2\eta_{\mathrm{mmse}}'(u;\tau^2) = \mathrm{Var}(X|U=u)/\tau^2.

The Stein identity means that for the MMSE denoiser the Onsager coefficient has a beautiful interpretation: it is the posterior variance divided by the effective noise variance β€” i.e., the fraction of the input variance that the denoiser cannot remove.

Example: MMSE for Bernoulli--Gaussian Prior

Let X∼(1βˆ’Ο)Ξ΄0+ρ N(0,Οƒx2)X \sim (1-\rho)\delta_0 + \rho\,\mathcal{N}(0,\sigma_x^2). Derive Ξ·mmse(u;Ο„2)\eta_{\mathrm{mmse}}(u;\tau^2) in closed form.

Minimax Denoisers and Parameter-Free AMP

What if pXp_X is unknown? Donoho--Maleki--Montanari (2011) construct a minimax soft-threshold that minimises the worst-case MSE over all ρ\rho-sparse priors. The resulting α⋆(Ξ΄)\alpha^\star(\delta) is a universal constant β€” a single number per Ξ΄\delta. Parameter-free AMP uses this α⋆\alpha^\star and requires no prior knowledge of signal amplitudes or exact sparsity level.

Minimax AMP is 1--2 dB worse than oracle-Bayes AMP, but only requires knowing Ξ΄\delta. It is the right default for sparse-recovery problems where the prior is poorly specified.

Definition:

D-AMP: Denoising-based AMP

D-AMP (Denoising-based AMP) replaces the scalar denoiser Ξ·\eta with a general (possibly neural, possibly non-local) image denoiser DΟ„:RNβ†’RND_\tau: \mathbb{R}^N \to \mathbb{R}^N that takes an estimate of the noise level Ο„\tau and produces a denoised output: x^t+1=DΟ„^t(AHrt+x^t),\hat{\mathbf{x}}^{t+1} = D_{\hat{\tau}_t}(\mathbf{A}^{H}\mathbf{r}^t + \hat{\mathbf{x}}^t), rt+1=yβˆ’Ax^t+1+1δ div(DΟ„^t) rt.\mathbf{r}^{t+1} = \mathbf{y} - \mathbf{A}\hat{\mathbf{x}}^{t+1} + \frac{1}{\delta}\,\mathrm{div}(D_{\hat{\tau}_t})\,\mathbf{r}^t. The scalar Onsager coefficient βŸ¨Ξ·β€²βŸ©\langle\eta'\rangle is replaced by the divergence div(DΟ„)=βˆ‘iβˆ‚DΟ„,i/βˆ‚ui\mathrm{div}(D_\tau) = \sum_i \partial D_{\tau,i}/\partial u_i, estimated via Monte Carlo (Ramani et al., 2008; Metzler et al., 2016).

Learned Denoisers and Deep Unfolding

D-AMP opens the door to learned denoisers: train a CNN (e.g., DnCNN) to denoise Gaussian noise at a given level Ο„\tau, then plug it into the AMP scaffold. Because the effective input noise is guaranteed Gaussian by the Onsager machinery, the denoiser's training distribution matches the deployment distribution at every iteration.

Taking this one step further: unfold the AMP iteration into a TT-layer feed-forward network where each denoiser's weights are trained end-to-end. This is LAMP / Learned AMP (Borgerding--Schniter 2017) β€” the subject of Chapter 21.5 and central to deep-unfolding architectures for RF imaging (Book 2, Chapter 27.4).

Denoiser Choices for AMP

DenoiserEquivalent EstimatorPrior Info NeededMSE vs Bayes LimitNotes
Soft-threshold Ξ·st\eta_{\mathrm{st}}LASSO (β„“1\ell_1)Threshold schedule onlyLoose (gap depends on ρ,Ξ΄\rho,\delta)Piecewise linear, convex equivalent
Minimax soft-thresholdMinimax LASSOΞ΄\delta only1-2 dB from Bayes, no tuningParameter-free AMP
MMSE Ξ·mmse\eta_{\mathrm{mmse}}Posterior meanFull prior pXp_XMatches replica predictionRequires known prior
Hard-thresholdIHT-likeThreshold onlyLoose; discontinuous β‡’\Rightarrow SE failsNot Lipschitz
Neural denoiser (D-AMP)Learned priorTraining dataCan match/beat MMSEDivergence estimated via MC

Denoiser MSE Curves

Plot scalar denoising MSE E[(Ξ·(X+Ο„Z;ΞΈ)βˆ’X)2]\mathbb{E}[(\eta(X+\tau Z;\theta)-X)^2] as a function of the effective noise Ο„2\tau^2 for soft-threshold (LASSO), MMSE (Bayes), and naive identity denoisers on a Bernoulli--Gaussian prior. The steeper the descent of the MSE curve, the better the state-evolution fixed point.

Parameters
0.15
1

Std. of non-zero entries

1.5

Soft-threshold factor $\lambda=\alpha\tau$

πŸŽ“CommIT Contribution(2023)

Learned Denoisers for Structured Compressed Sensing

G. Caire β€” CommIT group research line (TU Berlin)

The CommIT group at TU Berlin has investigated learned-denoiser AMP variants for large-scale communication problems where priors are only implicitly specified (training data) or vary across realisations (dynamic spectrum, user activity detection). The focus is on wrapping denoiser networks around the OAMP/VAMP scaffolding of Chapter 21 so that the Gaussianity-of-input property can be preserved under structured A\mathbf{A} matrices typical of wireless channels.

learned-denoiserampcommit-group

Quick Check

If we run AMP with soft-thresholding and the state-evolution fixed point is strictly positive, what does that tell us about the problem?

AMP has a bug

The (Ξ΄,ρ)(\delta,\rho) pair lies above the Donoho--Tanner curve, so β„“1\ell_1 recovery cannot drive MSE to zero at this sparsity level

The noise variance Οƒ2\sigma^2 is too small

Use hard-thresholding instead

Quick Check

Which denoiser choice makes AMP asymptotically Bayes-optimal (in the proportional regime)?

Soft-threshold with minimax α⋆\alpha^\star

Posterior mean Ξ·mmse(u;Ο„2)\eta_{\mathrm{mmse}}(u;\tau^2) matched to the true prior pXp_X

Hard-threshold with threshold Ξ»=Οƒx\lambda = \sigma_x

Identity: Ξ·(u)=u\eta(u)=u

Common Mistake: Non-Lipschitz Denoisers Break State Evolution

Mistake:

Using hard-thresholding, rank-truncation with a hard cutoff, or any discontinuous denoiser in an AMP-like framework and applying the scalar state-evolution formula.

Correction:

The Bayati--Montanari state-evolution theorem requires the denoiser to be Lipschitz (or at least pseudo-Lipschitz of finite order) in its first argument. Discontinuous denoisers violate this and yield non-Gaussian pseudo-data. If you must use hard-thresholding, smooth it (e.g., replace by a steep sigmoid) and monitor divergence carefully.

Key Takeaway

The choice of denoiser is the principal design lever in AMP. Soft- thresholding recovers LASSO; matched posterior means realise Bayes-optimal inference; learned neural denoisers extend the reach to unknown priors. In every case, the Onsager correction (or its divergence-based analogue) keeps the pseudo-data Gaussian so that the denoiser operates under the conditions for which it is designed.