Ferkans — Interactive Telecom Tutor

From Iterations to Layers

Every iterative reconstruction algorithm computes a sequence of updates $\mathbf{x}^{(k+1)} = \mathcal{T}_k(\mathbf{x}^{(k)};\, \mathbf{y}, \mathbf{A})$ . The key observation of algorithm unrolling (also called deep unfolding) is that a fixed number $T$ of such iterations can be viewed as a feedforward neural network with $T$ layers. Each layer performs one iteration, and the algorithm parameters (step sizes, thresholds, penalty weights, denoiser functions) become learnable parameters $\theta_t$ optimised via backpropagation through the entire $T$ -layer graph.

This paradigm inherits the interpretability of iterative algorithms (each layer has a known function) while gaining the adaptability of neural networks (parameters are tuned from data). For RF imaging, unrolled OAMP is the natural choice: OAMP already handles the Kronecker-structured $\mathbf{A}$ from Ch 17, and unrolling lets us learn the denoiser and step sizes end-to-end.

Definition:
Algorithm Unrolling

Algorithm unrolling (or deep unfolding) converts a $T$ -iteration algorithm into a $T$ -layer neural network by:

Truncating the iteration to $T$ steps.
Parameterising each iteration's operators with learnable parameters $\theta_t$ .
Training the parameters end-to-end by minimising

$\min_{\theta_1, \ldots, \theta_T} \; \mathbb{E}_{(\mathbf{c}, \mathbf{y})}\bigl[ \mathcal{L}\bigl(\hat{\mathbf{c}}^{(T)}(\theta),\; \mathbf{c}\bigr) \bigr]$

where $\hat{\mathbf{c}}^{(T)}$ is the output of the $T$ -th layer and $\mathcal{L}$ is a task loss (e.g., MSE).

Unlike generic deep networks, unrolled networks have strong inductive bias: the architecture encodes the structure of the forward model $\mathbf{y} = \mathbf{A}\mathbf{c} + \mathbf{w}$ . This dramatically reduces the number of learnable parameters compared to a generic U-Net and improves sample efficiency.

Definition:
OAMP Iteration (Review)

The Orthogonal AMP (OAMP) algorithm from Ch 17 iterates two steps:

Linear Estimation (LE): $\mathbf{r}^{(t)} = \hat{\mathbf{c}}^{(t)} + \mathbf{W}_{\text{LE}}\bigl(\mathbf{y} - \mathbf{A}\hat{\mathbf{c}}^{(t)}\bigr)$

Nonlinear Estimation (NLE): $\hat{\mathbf{c}}^{(t+1)} = \eta_t\bigl(\mathbf{r}^{(t)}\bigr)$

where $\mathbf{W}_{\text{LE}}$ satisfies the divergence constraint: $\frac{1}{N}\operatorname{tr}(\mathbf{W}_{\text{LE}} \mathbf{A}) = 1$ . This ensures $\mathbf{r}^{(t)} = \mathbf{c} + \text{approximately Gaussian noise}$ in the large-system limit, regardless of the structure of $\mathbf{A}$ .

The orthogonalisation step is what makes OAMP suitable for RF imaging: standard AMP diverges for Kronecker-structured or partial-DFT sensing matrices, while OAMP's linear estimator works for any right-unitarily invariant matrix.

Definition:
Learned OAMP (Unrolled OAMP with ProxNet)

Learned OAMP unrolls $T$ iterations of OAMP into a feedforward neural network.

Layer $t$ ( $t = 1, \ldots, T$ ):

Linear module (LMMSE): Compute the linear estimate using the known $\mathbf{A}$ and the current noise estimate $\sigma_t^2$ : $\mathbf{r}^{(t)} = \hat{\mathbf{c}}^{(t)} + \mathbf{W}_{\text{LE}}^{(t)}\bigl(\mathbf{y} - \mathbf{A}\hat{\mathbf{c}}^{(t)}\bigr).$
Orthogonalisation (fixed): The analytical orthogonalisation formula from Ch 17.3 ensures the divergence-free condition.
ProxNet denoiser: Replace the hand-designed $\eta_t$ with a neural network $\mathcal{D}_{\theta_t}(\mathbf{r}^{(t)};\, \sigma_t^2)$ , typically a small CNN or DnCNN with parameters $\theta_t$ .

Training: Minimise the end-to-end loss:

$\mathcal{L}(\theta_1, \ldots, \theta_T) = \mathbb{E}\left[\|\hat{\mathbf{c}}^{(T)} - \mathbf{c}\|^2\right]$

where the expectation is over training data $\{(\mathbf{y}_j, \mathbf{c}_{j})\}$ .

Key advantages over pure deep learning:

Physics-informed: The linear module and orthogonalisation encode the forward model $\mathbf{A}$ exactly.
Fewer parameters: Only the denoiser is learned ( $\sim 10^4$ -- $10^5$ parameters per layer, vs. $10^6$ -- $10^8$ for end-to-end networks).
Interpretable: Each layer has a clear role (LMMSE + denoiser), and the state evolution provides a theoretical performance prediction.

Definition:
ProxNet --- Learned Denoiser for OAMP

ProxNet replaces the fixed denoiser $\eta_t$ in OAMP with a learned neural network $\mathcal{D}_{\theta_t}$ :

$\hat{\mathbf{c}}^{(t+1)} = \mathcal{D}_{\theta_t}\bigl(\mathbf{r}^{(t)};\; \sigma_t^2\bigr)$

where $\sigma_t^2$ is the estimated noise variance at layer $t$ (provided by the state evolution). The network $\mathcal{D}_{\theta_t}$ is typically a small U-Net or DnCNN that takes $\mathbf{r}^{(t)}$ as input and produces a denoised estimate.

The divergence of $\mathcal{D}_{\theta_t}$ must be computed for the state evolution update:

$\operatorname{div}(\mathcal{D}_{\theta_t}) = \frac{1}{N}\sum_{i=1}^N \frac{\partial [\mathcal{D}_{\theta_t}(\mathbf{r})]_i}{\partial r_i}.$

In practice, Monte Carlo divergence estimation (Hutchinson's trace estimator) provides an efficient single-backward-pass approximation using a random probe vector $\mathbf{b}$ : $\operatorname{div}(\mathcal{D}) \approx \mathbf{b}^T \nabla \mathcal{D} \cdot \mathbf{b}$ .

Definition:
Kronecker-Structured OAMP for RF Imaging

When the sensing matrix has Kronecker structure $\mathbf{A} = \mathbf{A}_1 \otimes \mathbf{A}_2$ (common in 2D imaging with separable Tx/Rx arrays and frequency grids), the OAMP linear estimator decomposes as:

$\mathbf{W}_{\text{LE}} = \mathbf{W}_1 \otimes \mathbf{W}_2$

where $\mathbf{W}_i$ is the LMMSE estimator for $\mathbf{A}_i$ : $\mathbf{W}_i = \mathbf{A}_i^H(\mathbf{A}_i\mathbf{A}_i^H + \sigma_t^2 \mathbf{I})^{-1}$ .

This reduces the computational cost from $O(N^2)$ to $O(N^{3/2})$ (for square images with $N = n^2$ ), and from $O(N^2)$ to $O(N\log N)$ when $\mathbf{A}_i$ are partial DFT matrices (computed via FFT).

The state evolution for Kronecker-structured OAMP involves the joint singular value distribution of $\mathbf{A}_1$ and $\mathbf{A}_2$ .

Kronecker structure arises naturally in phased-array imaging (azimuth $\times$ elevation), MIMO radar (transmit $\times$ receive), and separable OFDM frequency grids. Exploiting this structure is essential for scaling unrolled OAMP to practical problem sizes.

Unrolled OAMP with ProxNet

Complexity: Per layer:

O(N\log N)

for Kronecker-FFT LMMSE +

O(N)

for CNN denoiser. Total:

O(TN\log N)

.

Input: Measurements

\mathbf{y}

, sensing matrix

\mathbf{A}

,

noise variance

\sigma^2

, trained parameters

\{\theta_t\}_{t=1}^T

Initialise:

\hat{\mathbf{c}}^{(0)} = \mathbf{A}^{H} \mathbf{y}

(matched-filter initialisation)

For

t = 1, \ldots, T

:

1. Linear Estimation (LMMSE):

\mathbf{r}^{(t)} = \hat{\mathbf{c}}^{(t-1)} + \mathbf{W}_{\text{LE}}^{(t)}(\mathbf{y} - \mathbf{A}\hat{\mathbf{c}}^{(t-1)})

2. State Evolution: Compute

\sigma_t^2

from the effective

noise variance at layer

t

3. ProxNet Denoiser:

\hat{\mathbf{c}}^{(t)} = \mathcal{D}_{\theta_t}(\mathbf{r}^{(t)};\, \sigma_t^2)

4. Divergence Estimation:

d_t = \frac{1}{N}\operatorname{div}(\mathcal{D}_{\theta_t})

(via Hutchinson's trace estimator)

Output:

\hat{\mathbf{c}}^{(T)}

The Kronecker LMMSE step dominates the per-layer cost. For a $256 \times 256$ image with $T = 10$ layers, the total cost is comparable to 10 FFTs --- orders of magnitude faster than generic matrix inversions.

Theorem: State Evolution for Unrolled OAMP

In the large-system limit ( $M, N \to \infty$ with $M/N \to \delta$ ), the effective noise at each OAMP layer is characterised by a scalar state evolution:

$\sigma_{t+1}^2 = \frac{1}{\delta}\bigl(\sigma^2 + \text{mmse}(\sigma_t^2)\bigr)$

where $\text{mmse}(\sigma^2) = \mathbb{E}[\|\mathcal{D}_{\theta_t}(\mathbf{c} + \sigma\mathbf{z}) - \mathbf{c}\|^2/N]$ is the per-component MSE of the denoiser at noise level $\sigma^2$ , and $\sigma^2$ is the measurement noise variance.

The state evolution is exact for right-unitarily invariant matrices (which include Haar-distributed unitary, partial DFT, and Kronecker products of such matrices).

The orthogonalisation step in OAMP "Gaussianises" the residual, so the denoiser always sees signal-plus-Gaussian-noise. The state evolution tracks the noise variance through the layers, and the denoiser's performance at each noise level determines the next layer's noise level. This creates a virtuous cycle: better denoisers produce lower noise, which makes the next denoiser's job easier.

Proof

Gaussianity of the linear estimate

For right-unitarily invariant $\mathbf{A}$ , the residual $\mathbf{r}^{(t)} - \mathbf{c}$ is approximately Gaussian with variance $v_t^2 = \sigma_t^2 \cdot (1 - \delta^{-1})^{-1}$ in the large-system limit. This follows from the free probability analysis of the OAMP linear estimator.

Denoiser MSE determines next variance

The NLE step produces $\hat{\mathbf{c}}^{(t+1)} = \mathcal{D}_{\theta_t}(\mathbf{r}^{(t)})$ . The error $\hat{\mathbf{c}}^{(t+1)} - \mathbf{c}$ has per-component variance $\text{mmse}(\sigma_t^2)$ . The LE step at the next layer introduces $\sigma_{t+1}^2 = \sigma^2/\delta + \text{mmse}(\sigma_t^2)/\delta$ . $\blacksquare$

Layer-Wise vs End-to-End Training

Two training strategies for unrolled OAMP:

Layer-wise training: Train each layer $t$ independently to minimise $\|\hat{\mathbf{c}}^{(t)} - \mathbf{c}\|^2$ . Advantage: each sub-problem is small and converges quickly. Disadvantage: layers do not cooperate; early layers cannot anticipate later layers' needs.

End-to-end training: Train all layers jointly to minimise the final output loss $\|\hat{\mathbf{c}}^{(T)} - \mathbf{c}\|^2$ . Advantage: globally optimal parameters; layers specialise (e.g., early layers do coarse recovery, late layers refine). Disadvantage: vanishing gradients for large $T$ ; higher memory cost.

Practical recommendation: Initialise with layer-wise pre-training (3--5 epochs per layer), then fine-tune end-to-end. Use gradient checkpointing for $T > 10$ to manage memory.

Unrolled OAMP vs Classical OAMP

Compare the reconstruction performance of unrolled OAMP-ProxNet with classical OAMP using soft-thresholding and BM3D denoisers. The plot shows NMSE (dB) versus layer/iteration index.

Adjust the SNR and number of layers to see how the learned denoiser provides consistent gains, especially at moderate SNR where the prior mismatch of hand-designed denoisers is most costly.

Parameters

Number of Layers/Iterations10

SNR (dB)20

Denoiser Type

Example: Learned OAMP vs Hand-Tuned OAMP for RF Imaging

Setup: MIMO RF imaging with $N_t = 8$ transmit antennas, $N_r = 16$ receive antennas, $N_f = 32$ subcarriers. Scene: mixture of point scatterers and extended targets. SNR = 20 dB. Training set: 10,000 synthetic scenes. All methods use $T = 10$ iterations/layers.

Compare the reconstruction quality.

Solution

Performance comparison

Method	NMSE (dB)	Params	Inference time
OAMP + soft threshold	$-18.5$	0	0.08 s
OAMP + BM3D	$-21.3$	0	2.1 s
Learned OAMP (ProxNet)	$-24.1$	55K	0.12 s
Pure U-Net (no physics)	$-20.7$	1.2M	0.05 s
ADMM-Net (unrolled ADMM)	$-22.5$	85K	0.10 s

Analysis

Learned OAMP gains 2.8 dB over OAMP+BM3D and 5.6 dB over soft thresholding. It outperforms a pure U-Net by 3.4 dB with 22 $\times$ fewer parameters, demonstrating the value of encoding physics in the architecture.

Learned OAMP also beats ADMM-Net by 1.6 dB, suggesting the orthogonalisation step provides a better inductive bias than the ADMM splitting for the Kronecker-structured operator $\mathbf{A}$ .

Backpropagation Through OAMP Layers

Gradients flow back through each layer in the standard manner:

Through the ProxNet denoiser $\mathcal{D}_{\theta_t}$ : standard backpropagation through the CNN.
Through the orthogonalisation: $\partial\mathbf{r}^{(t)} / \partial\hat{\mathbf{c}}^{(t-1)}$ involves $\alpha^t$ and the LMMSE weight --- both differentiable.
Through the LMMSE step: $\partial\hat{\mathbf{c}}_{\text{LMMSE}} / \partial\hat{\mathbf{c}}^{(t-1)} = \mathbf{I} - \mathbf{W}^{(t)}\mathbf{A}$ .

If $\mathbf{A}$ is imperfectly known (calibration errors), gradients can flow through the LMMSE step to refine $\mathbf{A}$ , enabling joint calibration and reconstruction.

Memory optimisation: For large $T$ , use gradient checkpointing --- store only every $k$ -th intermediate result and recompute the rest during backpropagation.

🎓CommIT Contribution(2024)

Unrolled OAMP with ProxNet for RF Imaging

G. Caire, CommIT Group — CommIT Group, TU Berlin

The CommIT group developed the unrolled OAMP-ProxNet architecture specifically for RF imaging with Kronecker-structured sensing matrices. The key innovations are:

Kronecker-LMMSE integration: exploiting the separable structure $\mathbf{A} = \mathbf{A}_1 \otimes \mathbf{A}_2$ to reduce the per-layer LMMSE cost from $O(N^2)$ to $O(N\log N)$ .
Noise-level-aware ProxNet: the denoiser receives $\sigma_t^2$ from the state evolution as a conditioning input, enabling a single network to handle the decreasing noise schedule across layers.
State-evolution-guided training: the state evolution prediction is used as an auxiliary loss to regularise the learned noise variances, improving stability for small training sets.

The architecture achieves state-of-the-art performance on simulated MIMO-OFDM imaging scenes with 3--6 dB improvement over hand-tuned OAMP while using 10--100 $\times$ fewer parameters than pure deep learning approaches.

unrolled OAMPProxNetKronecker structureRF imagingdeep unfolding

ProxNet Layer Architecture

Visualise the structure of a single ProxNet layer within the unrolled OAMP network. The diagram shows how data flows from the LMMSE linear estimate through the CNN denoiser, with the noise level $\sigma_t^2$ from state evolution conditioning the denoiser.

Adjust the number of convolutional channels and layers in the ProxNet to see how parameter count and receptive field change.

Parameters

ProxNet Conv Layers5

Channels per Layer64

Unrolled OAMP Processing an RF Image

Watch how the unrolled OAMP-ProxNet network progressively refines an RF image across

T = 10

layers. Each frame shows the intermediate reconstruction after one layer, illustrating the transition from a coarse matched-filter initialisation to a refined, denoised image.

Layer-by-layer reconstruction of an RF image using unrolled OAMP with ProxNet. The state evolution noise

\sigma_t^2

decreases monotonically, and each ProxNet denoiser is adapted to its layer's noise level.

Common Mistake: Learned OAMP May Not Generalise to Unseen Matrix Structures

Mistake:

Training learned OAMP on a single sensing matrix $\mathbf{A}$ and expecting it to work on a different matrix at test time.

Correction:

Learned OAMP is typically trained for a specific measurement matrix $\mathbf{A}$ (or family of matrices). If the test-time matrix differs:

Different SNR: Moderate degradation; mitigated by training with a range of SNR values.
Different $\mathbf{A}$ : Performance can degrade significantly if the singular value distribution changes.
Different scene statistics: The learned denoiser may perform poorly on out-of-distribution scenes.

Best practices: Train on diverse scenes, include $\sigma_t^2$ as a denoiser input (noise-adaptive), validate on held-out data with different matrix realisations.

Quick Check

What is the primary advantage of unrolled OAMP over running classical OAMP for $T$ iterations with a fixed denoiser?

Unrolled OAMP uses fewer total FLOPs per iteration

Layer-wise learnable parameters and ProxNet enable faster per-layer convergence

Unrolled OAMP does not require a forward model

Unrolled OAMP always converges to the global minimum

Correction:

Layer-wise learnable parameters and ProxNet enable faster per-layer convergence

With fixed parameters, every iteration uses the same denoiser and step size. Unrolling allows each layer to use parameters optimised for its position in the network: aggressive early denoising and conservative late refinement.

Algorithm Unrolling

Converting a $T$ -iteration algorithm into a $T$ -layer feedforward neural network where the algorithm parameters become learnable, trained end-to-end via backpropagation.

Related: Deep Unfolding, ProxNet

Deep Unfolding

Synonym for algorithm unrolling. The term emphasises the "unfolding" of a recursive iteration into a feedforward graph.

Related: Algorithm Unrolling

ProxNet

A learned neural network denoiser that replaces the fixed proximal operator in an unrolled optimisation algorithm, typically a small CNN conditioned on the current noise level.

Related: Algorithm Unrolling

Key Takeaway

Unrolled OAMP with ProxNet converts $T$ OAMP iterations into a trainable neural network where the LMMSE step encodes $\mathbf{A}$ exactly and the learned CNN denoiser adapts to the signal prior. Kronecker structure reduces the per-layer cost to $O(N\log N)$ . The state evolution provides both a noise schedule for the denoiser and theoretical performance predictions.

Unrolled OAMP with ProxNet

From Iterations to Layers

Definition: Algorithm Unrolling

Definition: OAMP Iteration (Review)

Definition: Learned OAMP (Unrolled OAMP with ProxNet)

Definition: ProxNet --- Learned Denoiser for OAMP

Definition: Kronecker-Structured OAMP for RF Imaging

Unrolled OAMP with ProxNet

Theorem: State Evolution for Unrolled OAMP

Gaussianity of the linear estimate

Denoiser MSE determines next variance

Layer-Wise vs End-to-End Training

Unrolled OAMP vs Classical OAMP

Parameters

Example: Learned OAMP vs Hand-Tuned OAMP for RF Imaging

Performance comparison

Analysis

Backpropagation Through OAMP Layers

Unrolled OAMP with ProxNet for RF Imaging

ProxNet Layer Architecture

Parameters

Unrolled OAMP Processing an RF Image

Common Mistake: Learned OAMP May Not Generalise to Unseen Matrix Structures

Quick Check

Algorithm Unrolling

Deep Unfolding

ProxNet

Key Takeaway

Definition:
Algorithm Unrolling

Definition:
OAMP Iteration (Review)

Definition:
Learned OAMP (Unrolled OAMP with ProxNet)

Definition:
ProxNet --- Learned Denoiser for OAMP

Definition:
Kronecker-Structured OAMP for RF Imaging