Learned Message Passing and Deep Unfolding
From Hand-Designed to Learned Message Passing
AMP, OAMP, VAMP, and GAMP all share a common template: a linear step, a nonlinear denoiser, and an Onsager-style correction. The parameters of each stage β thresholds, damping factors, denoiser shape, even the linear operator itself β are derived from a statistical model (sparsity prior, noise level, matrix spectrum). When the assumed model matches reality, performance is Bayes-optimal; when it does not, mismatch erodes the gains.
Deep unfolding turns this limitation into a feature. Each iteration of the hand-designed algorithm is reinterpreted as a layer in a neural network, the per-layer parameters are declared free, and the whole unrolled network is trained end-to-end on representative pairs. The result is an algorithm that retains the interpretability of message passing but adapts its coefficients to the empirical signal and matrix distribution β escaping the restrictive assumptions of analytical state evolution.
Definition: LISTA β Learned ISTA
LISTA β Learned ISTA
Given unrolled layers, LISTA replaces the fixed ISTA iteration with the learned recursion
where are learnable parameters, typically initialized from the ISTA values , . Training minimizes the reconstruction loss over a dataset of signal--measurement pairs.
Empirically, LISTA reaches ISTA's -iteration MSE in 10-20 learned layers β a 50 speed-up. Weight tying (, ) gives a lighter model that still beats vanilla ISTA.
Definition: LAMP β Learned AMP
LAMP β Learned AMP
LAMP unrolls the AMP iteration. Each layer reads and produces
where is a learned feedback matrix (initialized at ), is a parameterized denoiser (soft-threshold, scaled soft-threshold, or a small MLP), and is the learned Onsager coefficient. All are trained jointly by back-propagation.
LAMP preserves the Onsager-style correction β but instead of computing it analytically from , it learns the scalar directly. This is what makes LAMP robust to mismatched matrix ensembles: the network finds the right correction for the actual operator at hand.
Definition: LDVAMP β Learned Denoising VAMP
LDVAMP β Learned Denoising VAMP
LDVAMP unrolls the VAMP recursion, replacing each scalar state-evolution update with a learned function and each denoiser with a neural network. Per layer:
Here is the LMMSE step (its parameters learned rather than matched to ) and is a learned prior denoiser (a CNN for images, a parameterized shrinkage for sparse signals).
LDVAMP inherits VAMP's robustness to ill-conditioned matrices while dropping the need to specify the signal prior or matrix spectrum analytically. It is the state-of-the-art unrolled network for structured compressed sensing.
Theorem: Linear Convergence Rate of LISTA
Assume satisfies a restricted isometry property with constant and the signal is -sparse. Then the optimal LISTA parameters achieve the linear convergence rate
with contraction factor strictly smaller than the ISTA contraction at the same regularization.
LISTA beats ISTA's linear rate by adapting each layer's operator to the current sparsity pattern. Early layers can be aggressive (large thresholds) to commit to the strongest components; later layers can be gentle to refine small components. ISTA uses the same operator for every iteration, sacrificing this adaptivity.
Coupling LISTA layers to ISTA iterates
Set , , . Substituting into the LISTA update gives ISTA with step . Thus any ISTA contraction is achievable by LISTA β the LISTA optimum is no worse.
Layerwise optimal $\alpha_t$
For each layer, choose to minimize the one-step error under the RIP bound. This gives and a layer-contraction when .
Extending beyond ISTA
LISTA's full parameterization has more degrees of freedom than the scaled-ISTA family. The end-to-end training optimum thus dominates the scaled-ISTA bound, giving .
Example: LAMP for Bernoulli-Gaussian Recovery
Design a 10-layer LAMP network for recovering Bernoulli-Gaussian signals (, unit-variance active components) from measurements with , , , SNR dB. Describe the trainable parameters, loss, and expected behaviour.
Parameter layout
Each layer has (125k params), a soft-threshold , and an Onsager scalar . Total parameters β still lightweight vs. typical DNNs.
Loss
End-to-end MSE: averaged over a training set of pairs sampled with and a fixed .
Expected outcome
After training (a few thousand SGD steps), the learned values track closely in the first few layers and diverge later β effectively cranking up the Onsager feedback as the estimate gets refined. Reaches AMP fixed-point MSE in 10 layers, vs. 40+ for vanilla AMP.
Training Tips for Unrolled Networks
Training unrolled message-passing networks is straightforward in principle but has a handful of recurring pitfalls:
- Layer-wise greedy warm-start. Train layer 0, freeze it, then add and train layer 1, and so on. This avoids the vanishing- gradient problem that plagues end-to-end training of deep unrolled networks.
- Tied vs. untied weights. Weight tying () cuts parameters by and generalizes better when data is scarce; untied weights win on large datasets.
- Matrix-specific vs. matrix-agnostic training. If is fixed (e.g., a physical MRI encoder), train on that single matrix. If varies per sample (e.g., random masks), train over the ensemble. The two regimes give different learned parameters.
- Loss curriculum. Start with a soft loss (per-layer MSE averaged over ) and anneal to the final-layer loss; this stabilizes early training.
- Initialisation from analytical parameters. Always initialize to and to the analytical Onsager coefficient. Random init often fails to recover convergence even after training.
LISTA vs ISTA Convergence
Compare reconstruction MSE as a function of the number of (un)rolled iterations for ISTA with the optimal fixed step size and LISTA with learned per-layer parameters. The learned network reaches ISTA's asymptotic MSE in a small fraction of the layers.
Parameters
LAMP: MSE vs Layer Count
Visualize the final MSE achieved by LAMP as the number of unrolled layers grows. Compare against fixed-parameter AMP at the same iteration count. Notice how LAMP saturates faster and to a lower floor, especially for structured sensing matrices where AMP struggles.
Parameters
Common Mistake: Overfitting to a Single Sensing Matrix
Mistake:
Training an unrolled LAMP/LDVAMP network with a single realization of drawn from the distribution of interest, and then deploying it on different realizations from the same distribution. The learned weights encode the idiosyncrasies of the training matrix and collapse on novel ones.
Correction:
Decide upfront whether the sensing matrix is fixed (e.g., a calibrated imaging system, a trained sparse code) or random per sample (e.g., random masks, fresh pilot realizations). In the fixed case, training with the single matrix is correct. In the random case, resample every mini-batch during training so that the learned parameters generalize over the ensemble. Mismatch between training and deployment is a leading cause of disappointing unrolled-network results in practice.
LISTA
Learned ISTA β an unrolled neural network with the ISTA iteration as its layer template. Parameters are trained end-to-end to minimize reconstruction MSE. Achieves ISTA's asymptotic accuracy in far fewer layers than iterations.
Related: LAMP, Deep unfolding (algorithm unrolling)
LAMP
Learned AMP β an unrolled AMP iteration with learnable feedback matrix , denoiser parameters, and Onsager scalar . Retains AMP's interpretability while adapting to empirical signal and matrix distributions.
Related: LISTA, Deep unfolding (algorithm unrolling)
Deep unfolding (algorithm unrolling)
A design paradigm that converts an iterative algorithm into a deep neural network by (i) unrolling a fixed number of iterations into layers, (ii) declaring per-layer parameters as trainable, and (iii) fitting them by end-to-end back-propagation. Combines the inductive bias of classical algorithms with the adaptivity of learned models.
Quick Check
What is the principal advantage of LISTA over ISTA when both are run for iterations / layers?
LISTA has a strictly convex loss function, while ISTA does not.
LISTA learns per-layer parameters by end-to-end training, so it reaches low MSE in far fewer layers.
LISTA does not require a sparsity assumption, while ISTA does.
LISTA provably recovers the exact LASSO solution, while ISTA only approximates it.
Correct. LISTA breaks ISTA's "one operator for all iterations" constraint. The per-layer parameters adapt to the empirical signal-measurement distribution.
Why This Matters: Unrolled VAMP for Wireless Channel Estimation
Pilot-based channel estimation in OFDM and massive-MIMO uplinks often reduces to a structured compressed-sensing problem: a sparse delay-Doppler-angular channel observed through a partial DFT or Kronecker dictionary. LDVAMP is a natural fit β the LMMSE step uses the known dictionary, while the learned prior denoiser captures dataset-specific channel statistics (clustering of multipath components, angular selectivity, Doppler coherence) that an analytical prior would miss.
The CommIT group has explored unrolled-VAMP pipelines for RF imaging and joint channel-activity estimation in unsourced random access, where the mix of known structure (sensing operator) and unknown data-driven priors (channel clusters) is exactly where unrolled networks outperform both hand-designed message passing and generic deep learning.
See full treatment in Chapter 27Ψ Section sec-lista-imaging