Ferkans — Interactive Telecom Tutor

Why Denoiser Design Matters

State evolution tells us that AMP's terminal MSE is the fixed point of $\Psi(\tau^2) = \sigma^2 + \delta^{-1}\mathbb{E}[(\eta(X+\tau Z;\theta)-X)^2]$ . Therefore the denoiser $\eta$ is the algorithmic knob that controls AMP's behaviour. A poor denoiser creates a worse fixed point; a good one pushes AMP toward the information-theoretic limit.

This section surveys the three canonical denoiser families and shows how each one realises a classical estimator — soft-thresholding $\leftrightarrow$ LASSO, posterior mean $\leftrightarrow$ MMSE/Bayes-optimal, and learned networks $\leftrightarrow$ D-AMP — all within the same AMP scaffolding.

Theorem: L1-AMP Fixed Point = LASSO Solution

Let AMP be run with the soft-threshold denoiser $\eta_{\mathrm{st}}(u;\lambda) = \mathrm{sign}(u)(|u|-\lambda)_+$ and a fixed threshold schedule $\lambda_t = \alpha \tau_t$ . Let the SE recursion have a unique stable fixed point $\tau_\star$ with $\alpha\tau_\star = \lambda_\star$ . Then the AMP fixed point $\hat{\mathbf{x}}^\infty$ coincides with the LASSO solution $\hat{\mathbf{x}}_{\mathrm{LASSO}} = \arg\min_{\mathbf{z}} \tfrac{1}{2}\|\mathbf{y}-\mathbf{A}\mathbf{z}\|^2 + \lambda_{\mathrm{eff}}\|\mathbf{z}\|_1$ for an effective regulariser $\lambda_{\mathrm{eff}} = \lambda_\star(1 - \delta^{-1}\langle\eta'\rangle_\star)$ .

LASSO is the minimiser of a convex functional; AMP with soft-thresholding is an efficient iterative solver whose fixed point coincides with that minimiser. The relation $\lambda_{\mathrm{eff}} = \lambda_\star(1-\delta^{-1}\langle\eta'\rangle_\star)$ is known as the calibration equation and is how the AMP threshold maps to the conventional LASSO regulariser.

Proof

Stationarity of AMP

At a fixed point $\hat{\mathbf{x}}^\infty$ we have $\hat{\mathbf{x}}^\infty = \eta_{\mathrm{st}}(\mathbf{A}^{H} \mathbf{r}^\infty + \hat{\mathbf{x}}^\infty; \lambda_\star)$ and $\mathbf{r}^\infty = \mathbf{y} - \mathbf{A}\hat{\mathbf{x}}^\infty + \delta^{-1}\langle\eta'\rangle_\star \mathbf{r}^\infty$ , i.e.\ $\mathbf{r}^\infty(1-\delta^{-1}\langle\eta'\rangle_\star) = \mathbf{y}-\mathbf{A}\hat{\mathbf{x}}^\infty$ .

Subgradient condition for LASSO

The LASSO KKT conditions state that $\mathbf{0} \in -\mathbf{A}^{H}(\mathbf{y}-\mathbf{A}\mathbf{z}) + \lambda_{\mathrm{eff}}\partial\|\mathbf{z}\|_1$ . Using the soft-threshold identity $\hat{z} = \eta_{\mathrm{st}}(\hat{z}+v;\lambda)$ iff $v \in \lambda\partial|\hat{z}|$ , the AMP fixed-point equation above rearranges exactly into the LASSO KKT conditions with $\lambda_{\mathrm{eff}} = \lambda_\star(1-\delta^{-1}\langle\eta'\rangle_\star)$ .

Uniqueness

When the LASSO has a unique minimiser (generic case for i.i.d.\ Gaussian $\mathbf{A}$ with $M<N$ ), the AMP fixed point must coincide with it.

,

Analytic LASSO Risk via AMP

Theorem TL1-AMP Fixed Point = LASSO Solution has a non-obvious corollary: because AMP's terminal MSE equals the LASSO MSE and because AMP's MSE is predicted exactly by the scalar state-evolution fixed point, we obtain a closed-form prediction for the high-dimensional LASSO risk, parameterised by $(\delta, \rho, \sigma^2, \lambda_{\mathrm{eff}})$ .

Before this connection was made (Bayati--Montanari 2012, Donoho--Montanari 2016), sharp asymptotic predictions for LASSO in the proportional regime were beyond reach. AMP is thus both an algorithm and an analytical tool.

Definition:
MMSE Denoiser

For a prior $p_X$ and Gaussian observation $U = X + \tau Z$ , $Z \sim \mathcal{N}(0,1)$ , the MMSE denoiser is the posterior mean $\eta_{\mathrm{mmse}}(u;\tau^2) = \mathbb{E}[X \mid U=u] = \frac{\int x\, p_X(x)\, \varphi_\tau(u-x)\,\mathrm{d}x}{\int p_X(x)\, \varphi_\tau(u-x)\,\mathrm{d}x},$ where $\varphi_\tau$ is the zero-mean Gaussian density with variance $\tau^2$ . Its derivative satisfies the Tweedie / Stein identity $\eta_{\mathrm{mmse}}'(u;\tau^2) = \mathrm{Var}(X|U=u)/\tau^2$ .

The Stein identity means that for the MMSE denoiser the Onsager coefficient has a beautiful interpretation: it is the posterior variance divided by the effective noise variance — i.e., the fraction of the input variance that the denoiser cannot remove.

Example: MMSE for Bernoulli--Gaussian Prior

Let $X \sim (1-\rho)\delta_0 + \rho\,\mathcal{N}(0,\sigma_x^2)$ . Derive $\eta_{\mathrm{mmse}}(u;\tau^2)$ in closed form.

Solution

Posterior over the mixture component

The posterior probability of the "non-zero" component given $U=u$ is $\pi(u) = \Pr[X\ne 0\mid U=u] = \frac{\rho\,\mathcal{N}(u;0,\sigma_x^2+\tau^2)}{(1-\rho)\,\mathcal{N}(u;0,\tau^2)+\rho\,\mathcal{N}(u;0,\sigma_x^2+\tau^2)}.$

Posterior mean

Conditioning on $X\ne 0$ gives a Gaussian-Gaussian update with mean $\frac{\sigma_x^2}{\sigma_x^2+\tau^2}u$ . Combining, $\eta_{\mathrm{mmse}}(u;\tau^2) = \pi(u)\cdot\frac{\sigma_x^2}{\sigma_x^2+\tau^2}\,u.$ This is a shrinker (small $|u|$ pulled harder to zero than soft-thresholding) that is smooth, strictly monotone, and matches the true prior.

Interpretation

In Bayes-optimal AMP this denoiser replaces soft-thresholding. The state-evolution fixed point is the conjectured (and, for many priors, proved) Bayes MMSE — typically strictly smaller than the LASSO MSE, especially near the phase-transition boundary.

Minimax Denoisers and Parameter-Free AMP

What if $p_X$ is unknown? Donoho--Maleki--Montanari (2011) construct a minimax soft-threshold that minimises the worst-case MSE over all $\rho$ -sparse priors. The resulting $\alpha^\star(\delta)$ is a universal constant — a single number per $\delta$ . Parameter-free AMP uses this $\alpha^\star$ and requires no prior knowledge of signal amplitudes or exact sparsity level.

Minimax AMP is 1--2 dB worse than oracle-Bayes AMP, but only requires knowing $\delta$ . It is the right default for sparse-recovery problems where the prior is poorly specified.

Definition:
D-AMP: Denoising-based AMP

D-AMP (Denoising-based AMP) replaces the scalar denoiser $\eta$ with a general (possibly neural, possibly non-local) image denoiser $D_\tau: \mathbb{R}^N \to \mathbb{R}^N$ that takes an estimate of the noise level $\tau$ and produces a denoised output: $\hat{\mathbf{x}}^{t+1} = D_{\hat{\tau}_t}(\mathbf{A}^{H}\mathbf{r}^t + \hat{\mathbf{x}}^t),$ $\mathbf{r}^{t+1} = \mathbf{y} - \mathbf{A}\hat{\mathbf{x}}^{t+1} + \frac{1}{\delta}\,\mathrm{div}(D_{\hat{\tau}_t})\,\mathbf{r}^t.$ The scalar Onsager coefficient $\langle\eta'\rangle$ is replaced by the divergence $\mathrm{div}(D_\tau) = \sum_i \partial D_{\tau,i}/\partial u_i$ , estimated via Monte Carlo (Ramani et al., 2008; Metzler et al., 2016).

Learned Denoisers and Deep Unfolding

D-AMP opens the door to learned denoisers: train a CNN (e.g., DnCNN) to denoise Gaussian noise at a given level $\tau$ , then plug it into the AMP scaffold. Because the effective input noise is guaranteed Gaussian by the Onsager machinery, the denoiser's training distribution matches the deployment distribution at every iteration.

Taking this one step further: unfold the AMP iteration into a $T$ -layer feed-forward network where each denoiser's weights are trained end-to-end. This is LAMP / Learned AMP (Borgerding--Schniter 2017) — the subject of Chapter 21.5 and central to deep-unfolding architectures for RF imaging (Book 2, Chapter 27.4).

Denoiser Choices for AMP

Denoiser	Equivalent Estimator	Prior Info Needed	MSE vs Bayes Limit	Notes
Soft-threshold $\eta_{\mathrm{st}}$	LASSO ( $\ell_1$ )	Threshold schedule only	Loose (gap depends on $\rho,\delta$ )	Piecewise linear, convex equivalent
Minimax soft-threshold	Minimax LASSO	$\delta$ only	1-2 dB from Bayes, no tuning	Parameter-free AMP
MMSE $\eta_{\mathrm{mmse}}$	Posterior mean	Full prior $p_X$	Matches replica prediction	Requires known prior
Hard-threshold	IHT-like	Threshold only	Loose; discontinuous $\Rightarrow$ SE fails	Not Lipschitz
Neural denoiser (D-AMP)	Learned prior	Training data	Can match/beat MMSE	Divergence estimated via MC

Denoiser MSE Curves

Plot scalar denoising MSE $\mathbb{E}[(\eta(X+\tau Z;\theta)-X)^2]$ as a function of the effective noise $\tau^2$ for soft-threshold (LASSO), MMSE (Bayes), and naive identity denoisers on a Bernoulli--Gaussian prior. The steeper the descent of the MSE curve, the better the state-evolution fixed point.

Parameters

\rho

0.15

\sigma_x

1

Std. of non-zero entries

\alpha

1.5

Soft-threshold factor $\lambda=\alpha\tau$

🎓CommIT Contribution(2023)

Learned Denoisers for Structured Compressed Sensing

G. Caire — CommIT group research line (TU Berlin)

The CommIT group at TU Berlin has investigated learned-denoiser AMP variants for large-scale communication problems where priors are only implicitly specified (training data) or vary across realisations (dynamic spectrum, user activity detection). The focus is on wrapping denoiser networks around the OAMP/VAMP scaffolding of Chapter 21 so that the Gaussianity-of-input property can be preserved under structured $\mathbf{A}$ matrices typical of wireless channels.

learned-denoiserampcommit-group

Quick Check

If we run AMP with soft-thresholding and the state-evolution fixed point is strictly positive, what does that tell us about the problem?

AMP has a bug

The $(\delta,\rho)$ pair lies above the Donoho--Tanner curve, so $\ell_1$ recovery cannot drive MSE to zero at this sparsity level

The noise variance $\sigma^2$ is too small

Use hard-thresholding instead

Correction:

The

(\delta,\rho)

pair lies above the Donoho--Tanner curve, so

\ell_1

recovery cannot drive MSE to zero at this sparsity level

Correct. A positive stable fixed point means that with the given $\\ell_1$ regularisation, the SE recursion does not collapse to zero — there is a non-vanishing residual error dictated by the geometry of the problem.

Quick Check

Which denoiser choice makes AMP asymptotically Bayes-optimal (in the proportional regime)?

Soft-threshold with minimax $\alpha^\star$

Posterior mean $\eta_{\mathrm{mmse}}(u;\tau^2)$ matched to the true prior $p_X$

Hard-threshold with threshold $\lambda = \sigma_x$

Identity: $\eta(u)=u$

Correction:

Posterior mean

\eta_{\mathrm{mmse}}(u;\tau^2)

matched to the true prior

p_X

When $\\eta$ is the Bayes-optimal scalar denoiser for the prior, the SE fixed point coincides with the replica-symmetric prediction of the Bayes MMSE — optimal in the proportional asymptotic regime.

Common Mistake: Non-Lipschitz Denoisers Break State Evolution

Mistake:

Using hard-thresholding, rank-truncation with a hard cutoff, or any discontinuous denoiser in an AMP-like framework and applying the scalar state-evolution formula.

Correction:

The Bayati--Montanari state-evolution theorem requires the denoiser to be Lipschitz (or at least pseudo-Lipschitz of finite order) in its first argument. Discontinuous denoisers violate this and yield non-Gaussian pseudo-data. If you must use hard-thresholding, smooth it (e.g., replace by a steep sigmoid) and monitor divergence carefully.

Key Takeaway

The choice of denoiser is the principal design lever in AMP. Soft- thresholding recovers LASSO; matched posterior means realise Bayes-optimal inference; learned neural denoisers extend the reach to unknown priors. In every case, the Onsager correction (or its divergence-based analogue) keeps the pseudo-data Gaussian so that the denoiser operates under the conditions for which it is designed.

Denoiser Design for AMP

Why Denoiser Design Matters

Theorem: L1-AMP Fixed Point = LASSO Solution

Stationarity of AMP

Subgradient condition for LASSO

Uniqueness

Analytic LASSO Risk via AMP

Definition: MMSE Denoiser

Example: MMSE for Bernoulli--Gaussian Prior

Posterior over the mixture component

Posterior mean

Interpretation

Minimax Denoisers and Parameter-Free AMP

Definition: D-AMP: Denoising-based AMP

Learned Denoisers and Deep Unfolding

Denoiser Choices for AMP

Denoiser MSE Curves

Parameters

Learned Denoisers for Structured Compressed Sensing

Quick Check

Quick Check

Common Mistake: Non-Lipschitz Denoisers Break State Evolution

Key Takeaway

Definition:
MMSE Denoiser

Definition:
D-AMP: Denoising-based AMP