Unrolled OAMP with ProxNet

From Iterations to Layers

Every iterative reconstruction algorithm computes a sequence of updates x(k+1)=Tk(x(k);y,A)\mathbf{x}^{(k+1)} = \mathcal{T}_k(\mathbf{x}^{(k)};\, \mathbf{y}, \mathbf{A}). The key observation of algorithm unrolling (also called deep unfolding) is that a fixed number TT of such iterations can be viewed as a feedforward neural network with TT layers. Each layer performs one iteration, and the algorithm parameters (step sizes, thresholds, penalty weights, denoiser functions) become learnable parameters θt\theta_t optimised via backpropagation through the entire TT-layer graph.

This paradigm inherits the interpretability of iterative algorithms (each layer has a known function) while gaining the adaptability of neural networks (parameters are tuned from data). For RF imaging, unrolled OAMP is the natural choice: OAMP already handles the Kronecker-structured A\mathbf{A} from Ch 17, and unrolling lets us learn the denoiser and step sizes end-to-end.

Definition:

Algorithm Unrolling

Algorithm unrolling (or deep unfolding) converts a TT-iteration algorithm into a TT-layer neural network by:

  1. Truncating the iteration to TT steps.
  2. Parameterising each iteration's operators with learnable parameters θt\theta_t.
  3. Training the parameters end-to-end by minimising

minθ1,,θT  E(c,y)[L(c^(T)(θ),  c)]\min_{\theta_1, \ldots, \theta_T} \; \mathbb{E}_{(\mathbf{c}, \mathbf{y})}\bigl[ \mathcal{L}\bigl(\hat{\mathbf{c}}^{(T)}(\theta),\; \mathbf{c}\bigr) \bigr]

where c^(T)\hat{\mathbf{c}}^{(T)} is the output of the TT-th layer and L\mathcal{L} is a task loss (e.g., MSE).

Unlike generic deep networks, unrolled networks have strong inductive bias: the architecture encodes the structure of the forward model y=Ac+w\mathbf{y} = \mathbf{A}\mathbf{c} + \mathbf{w}. This dramatically reduces the number of learnable parameters compared to a generic U-Net and improves sample efficiency.

Definition:

OAMP Iteration (Review)

The Orthogonal AMP (OAMP) algorithm from Ch 17 iterates two steps:

Linear Estimation (LE): r(t)=c^(t)+WLE(yAc^(t))\mathbf{r}^{(t)} = \hat{\mathbf{c}}^{(t)} + \mathbf{W}_{\text{LE}}\bigl(\mathbf{y} - \mathbf{A}\hat{\mathbf{c}}^{(t)}\bigr)

Nonlinear Estimation (NLE): c^(t+1)=ηt(r(t))\hat{\mathbf{c}}^{(t+1)} = \eta_t\bigl(\mathbf{r}^{(t)}\bigr)

where WLE\mathbf{W}_{\text{LE}} satisfies the divergence constraint: 1Ntr(WLEA)=1\frac{1}{N}\operatorname{tr}(\mathbf{W}_{\text{LE}} \mathbf{A}) = 1. This ensures r(t)=c+approximately Gaussian noise\mathbf{r}^{(t)} = \mathbf{c} + \text{approximately Gaussian noise} in the large-system limit, regardless of the structure of A\mathbf{A}.

The orthogonalisation step is what makes OAMP suitable for RF imaging: standard AMP diverges for Kronecker-structured or partial-DFT sensing matrices, while OAMP's linear estimator works for any right-unitarily invariant matrix.

Definition:

Learned OAMP (Unrolled OAMP with ProxNet)

Learned OAMP unrolls TT iterations of OAMP into a feedforward neural network.

Layer tt (t=1,,Tt = 1, \ldots, T):

  1. Linear module (LMMSE): Compute the linear estimate using the known A\mathbf{A} and the current noise estimate σt2\sigma_t^2: r(t)=c^(t)+WLE(t)(yAc^(t)).\mathbf{r}^{(t)} = \hat{\mathbf{c}}^{(t)} + \mathbf{W}_{\text{LE}}^{(t)}\bigl(\mathbf{y} - \mathbf{A}\hat{\mathbf{c}}^{(t)}\bigr).
  2. Orthogonalisation (fixed): The analytical orthogonalisation formula from Ch 17.3 ensures the divergence-free condition.
  3. ProxNet denoiser: Replace the hand-designed ηt\eta_t with a neural network Dθt(r(t);σt2)\mathcal{D}_{\theta_t}(\mathbf{r}^{(t)};\, \sigma_t^2), typically a small CNN or DnCNN with parameters θt\theta_t.

Training: Minimise the end-to-end loss:

L(θ1,,θT)=E[c^(T)c2]\mathcal{L}(\theta_1, \ldots, \theta_T) = \mathbb{E}\left[\|\hat{\mathbf{c}}^{(T)} - \mathbf{c}\|^2\right]

where the expectation is over training data {(yj,cj)}\{(\mathbf{y}_j, \mathbf{c}_{j})\}.

Key advantages over pure deep learning:

  • Physics-informed: The linear module and orthogonalisation encode the forward model A\mathbf{A} exactly.
  • Fewer parameters: Only the denoiser is learned (104\sim 10^4--10510^5 parameters per layer, vs. 10610^6--10810^8 for end-to-end networks).
  • Interpretable: Each layer has a clear role (LMMSE + denoiser), and the state evolution provides a theoretical performance prediction.

Definition:

ProxNet --- Learned Denoiser for OAMP

ProxNet replaces the fixed denoiser ηt\eta_t in OAMP with a learned neural network Dθt\mathcal{D}_{\theta_t}:

c^(t+1)=Dθt(r(t);  σt2)\hat{\mathbf{c}}^{(t+1)} = \mathcal{D}_{\theta_t}\bigl(\mathbf{r}^{(t)};\; \sigma_t^2\bigr)

where σt2\sigma_t^2 is the estimated noise variance at layer tt (provided by the state evolution). The network Dθt\mathcal{D}_{\theta_t} is typically a small U-Net or DnCNN that takes r(t)\mathbf{r}^{(t)} as input and produces a denoised estimate.

The divergence of Dθt\mathcal{D}_{\theta_t} must be computed for the state evolution update:

div(Dθt)=1Ni=1N[Dθt(r)]iri.\operatorname{div}(\mathcal{D}_{\theta_t}) = \frac{1}{N}\sum_{i=1}^N \frac{\partial [\mathcal{D}_{\theta_t}(\mathbf{r})]_i}{\partial r_i}.

In practice, Monte Carlo divergence estimation (Hutchinson's trace estimator) provides an efficient single-backward-pass approximation using a random probe vector b\mathbf{b}: div(D)bTDb\operatorname{div}(\mathcal{D}) \approx \mathbf{b}^T \nabla \mathcal{D} \cdot \mathbf{b}.

Definition:

Kronecker-Structured OAMP for RF Imaging

When the sensing matrix has Kronecker structure A=A1A2\mathbf{A} = \mathbf{A}_1 \otimes \mathbf{A}_2 (common in 2D imaging with separable Tx/Rx arrays and frequency grids), the OAMP linear estimator decomposes as:

WLE=W1W2\mathbf{W}_{\text{LE}} = \mathbf{W}_1 \otimes \mathbf{W}_2

where Wi\mathbf{W}_i is the LMMSE estimator for Ai\mathbf{A}_i: Wi=AiH(AiAiH+σt2I)1\mathbf{W}_i = \mathbf{A}_i^H(\mathbf{A}_i\mathbf{A}_i^H + \sigma_t^2 \mathbf{I})^{-1}.

This reduces the computational cost from O(N2)O(N^2) to O(N3/2)O(N^{3/2}) (for square images with N=n2N = n^2), and from O(N2)O(N^2) to O(NlogN)O(N\log N) when Ai\mathbf{A}_i are partial DFT matrices (computed via FFT).

The state evolution for Kronecker-structured OAMP involves the joint singular value distribution of A1\mathbf{A}_1 and A2\mathbf{A}_2.

Kronecker structure arises naturally in phased-array imaging (azimuth ×\times elevation), MIMO radar (transmit ×\times receive), and separable OFDM frequency grids. Exploiting this structure is essential for scaling unrolled OAMP to practical problem sizes.

Unrolled OAMP with ProxNet

Complexity: Per layer: O(NlogN)O(N\log N) for Kronecker-FFT LMMSE + O(N)O(N) for CNN denoiser. Total: O(TNlogN)O(TN\log N).
Input: Measurements y\mathbf{y}, sensing matrix A\mathbf{A},
noise variance σ2\sigma^2, trained parameters {θt}t=1T\{\theta_t\}_{t=1}^T
Initialise: c^(0)=AHy\hat{\mathbf{c}}^{(0)} = \mathbf{A}^{H} \mathbf{y}
(matched-filter initialisation)
For t=1,,Tt = 1, \ldots, T:
1. Linear Estimation (LMMSE):
r(t)=c^(t1)+WLE(t)(yAc^(t1))\mathbf{r}^{(t)} = \hat{\mathbf{c}}^{(t-1)} + \mathbf{W}_{\text{LE}}^{(t)}(\mathbf{y} - \mathbf{A}\hat{\mathbf{c}}^{(t-1)})
2. State Evolution: Compute σt2\sigma_t^2 from the effective
noise variance at layer tt
3. ProxNet Denoiser:
c^(t)=Dθt(r(t);σt2)\hat{\mathbf{c}}^{(t)} = \mathcal{D}_{\theta_t}(\mathbf{r}^{(t)};\, \sigma_t^2)
4. Divergence Estimation:
dt=1Ndiv(Dθt)d_t = \frac{1}{N}\operatorname{div}(\mathcal{D}_{\theta_t})
(via Hutchinson's trace estimator)
Output: c^(T)\hat{\mathbf{c}}^{(T)}

The Kronecker LMMSE step dominates the per-layer cost. For a 256×256256 \times 256 image with T=10T = 10 layers, the total cost is comparable to 10 FFTs --- orders of magnitude faster than generic matrix inversions.

Theorem: State Evolution for Unrolled OAMP

In the large-system limit (M,NM, N \to \infty with M/NδM/N \to \delta), the effective noise at each OAMP layer is characterised by a scalar state evolution:

σt+12=1δ(σ2+mmse(σt2))\sigma_{t+1}^2 = \frac{1}{\delta}\bigl(\sigma^2 + \text{mmse}(\sigma_t^2)\bigr)

where mmse(σ2)=E[Dθt(c+σz)c2/N]\text{mmse}(\sigma^2) = \mathbb{E}[\|\mathcal{D}_{\theta_t}(\mathbf{c} + \sigma\mathbf{z}) - \mathbf{c}\|^2/N] is the per-component MSE of the denoiser at noise level σ2\sigma^2, and σ2\sigma^2 is the measurement noise variance.

The state evolution is exact for right-unitarily invariant matrices (which include Haar-distributed unitary, partial DFT, and Kronecker products of such matrices).

The orthogonalisation step in OAMP "Gaussianises" the residual, so the denoiser always sees signal-plus-Gaussian-noise. The state evolution tracks the noise variance through the layers, and the denoiser's performance at each noise level determines the next layer's noise level. This creates a virtuous cycle: better denoisers produce lower noise, which makes the next denoiser's job easier.

Layer-Wise vs End-to-End Training

Two training strategies for unrolled OAMP:

Layer-wise training: Train each layer tt independently to minimise c^(t)c2\|\hat{\mathbf{c}}^{(t)} - \mathbf{c}\|^2. Advantage: each sub-problem is small and converges quickly. Disadvantage: layers do not cooperate; early layers cannot anticipate later layers' needs.

End-to-end training: Train all layers jointly to minimise the final output loss c^(T)c2\|\hat{\mathbf{c}}^{(T)} - \mathbf{c}\|^2. Advantage: globally optimal parameters; layers specialise (e.g., early layers do coarse recovery, late layers refine). Disadvantage: vanishing gradients for large TT; higher memory cost.

Practical recommendation: Initialise with layer-wise pre-training (3--5 epochs per layer), then fine-tune end-to-end. Use gradient checkpointing for T>10T > 10 to manage memory.

Unrolled OAMP vs Classical OAMP

Compare the reconstruction performance of unrolled OAMP-ProxNet with classical OAMP using soft-thresholding and BM3D denoisers. The plot shows NMSE (dB) versus layer/iteration index.

Adjust the SNR and number of layers to see how the learned denoiser provides consistent gains, especially at moderate SNR where the prior mismatch of hand-designed denoisers is most costly.

Parameters
10
20

Example: Learned OAMP vs Hand-Tuned OAMP for RF Imaging

Setup: MIMO RF imaging with Nt=8N_t = 8 transmit antennas, Nr=16N_r = 16 receive antennas, Nf=32N_f = 32 subcarriers. Scene: mixture of point scatterers and extended targets. SNR = 20 dB. Training set: 10,000 synthetic scenes. All methods use T=10T = 10 iterations/layers.

Compare the reconstruction quality.

Backpropagation Through OAMP Layers

Gradients flow back through each layer in the standard manner:

  • Through the ProxNet denoiser Dθt\mathcal{D}_{\theta_t}: standard backpropagation through the CNN.
  • Through the orthogonalisation: r(t)/c^(t1)\partial\mathbf{r}^{(t)} / \partial\hat{\mathbf{c}}^{(t-1)} involves αt\alpha^t and the LMMSE weight --- both differentiable.
  • Through the LMMSE step: c^LMMSE/c^(t1)=IW(t)A\partial\hat{\mathbf{c}}_{\text{LMMSE}} / \partial\hat{\mathbf{c}}^{(t-1)} = \mathbf{I} - \mathbf{W}^{(t)}\mathbf{A}.

If A\mathbf{A} is imperfectly known (calibration errors), gradients can flow through the LMMSE step to refine A\mathbf{A}, enabling joint calibration and reconstruction.

Memory optimisation: For large TT, use gradient checkpointing --- store only every kk-th intermediate result and recompute the rest during backpropagation.

🎓CommIT Contribution(2024)

Unrolled OAMP with ProxNet for RF Imaging

G. Caire, CommIT GroupCommIT Group, TU Berlin

The CommIT group developed the unrolled OAMP-ProxNet architecture specifically for RF imaging with Kronecker-structured sensing matrices. The key innovations are:

  1. Kronecker-LMMSE integration: exploiting the separable structure A=A1A2\mathbf{A} = \mathbf{A}_1 \otimes \mathbf{A}_2 to reduce the per-layer LMMSE cost from O(N2)O(N^2) to O(NlogN)O(N\log N).
  2. Noise-level-aware ProxNet: the denoiser receives σt2\sigma_t^2 from the state evolution as a conditioning input, enabling a single network to handle the decreasing noise schedule across layers.
  3. State-evolution-guided training: the state evolution prediction is used as an auxiliary loss to regularise the learned noise variances, improving stability for small training sets.

The architecture achieves state-of-the-art performance on simulated MIMO-OFDM imaging scenes with 3--6 dB improvement over hand-tuned OAMP while using 10--100×\times fewer parameters than pure deep learning approaches.

unrolled OAMPProxNetKronecker structureRF imagingdeep unfolding

ProxNet Layer Architecture

Visualise the structure of a single ProxNet layer within the unrolled OAMP network. The diagram shows how data flows from the LMMSE linear estimate through the CNN denoiser, with the noise level σt2\sigma_t^2 from state evolution conditioning the denoiser.

Adjust the number of convolutional channels and layers in the ProxNet to see how parameter count and receptive field change.

Parameters
5
64

Unrolled OAMP Processing an RF Image

Watch how the unrolled OAMP-ProxNet network progressively refines an RF image across T=10T = 10 layers. Each frame shows the intermediate reconstruction after one layer, illustrating the transition from a coarse matched-filter initialisation to a refined, denoised image.
Layer-by-layer reconstruction of an RF image using unrolled OAMP with ProxNet. The state evolution noise σt2\sigma_t^2 decreases monotonically, and each ProxNet denoiser is adapted to its layer's noise level.

Common Mistake: Learned OAMP May Not Generalise to Unseen Matrix Structures

Mistake:

Training learned OAMP on a single sensing matrix A\mathbf{A} and expecting it to work on a different matrix at test time.

Correction:

Learned OAMP is typically trained for a specific measurement matrix A\mathbf{A} (or family of matrices). If the test-time matrix differs:

  • Different SNR: Moderate degradation; mitigated by training with a range of SNR values.
  • Different A\mathbf{A}: Performance can degrade significantly if the singular value distribution changes.
  • Different scene statistics: The learned denoiser may perform poorly on out-of-distribution scenes.

Best practices: Train on diverse scenes, include σt2\sigma_t^2 as a denoiser input (noise-adaptive), validate on held-out data with different matrix realisations.

Quick Check

What is the primary advantage of unrolled OAMP over running classical OAMP for TT iterations with a fixed denoiser?

Unrolled OAMP uses fewer total FLOPs per iteration

Layer-wise learnable parameters and ProxNet enable faster per-layer convergence

Unrolled OAMP does not require a forward model

Unrolled OAMP always converges to the global minimum

Algorithm Unrolling

Converting a TT-iteration algorithm into a TT-layer feedforward neural network where the algorithm parameters become learnable, trained end-to-end via backpropagation.

Related: Deep Unfolding, ProxNet

Deep Unfolding

Synonym for algorithm unrolling. The term emphasises the "unfolding" of a recursive iteration into a feedforward graph.

Related: Algorithm Unrolling

ProxNet

A learned neural network denoiser that replaces the fixed proximal operator in an unrolled optimisation algorithm, typically a small CNN conditioned on the current noise level.

Related: Algorithm Unrolling

Key Takeaway

Unrolled OAMP with ProxNet converts TT OAMP iterations into a trainable neural network where the LMMSE step encodes A\mathbf{A} exactly and the learned CNN denoiser adapts to the signal prior. Kronecker structure reduces the per-layer cost to O(NlogN)O(N\log N). The state evolution provides both a noise schedule for the denoiser and theoretical performance predictions.