Ferkans — Interactive Telecom Tutor

ex23-01-dip-loss

Easy

Write the DIP loss function for a linear inverse problem $\mathbf{y} = \mathbf{A}\mathbf{x} + \mathbf{w}$ with generator network $f_\theta$ and fixed input $\mathbf{z}$ . What is being optimised: the input, the weights, or both?

Show Hint

The DIP loss measures the discrepancy between the measurements and the forward-modelled reconstruction.

Only $\theta$ is optimised; $\mathbf{z}$ is fixed after initial sampling.

Solution

DIP loss

$\mathcal{L}(\theta) = \frac{1}{2}\|\mathbf{y} - \mathbf{A}f_\theta(\mathbf{z})\|^2.$ $Only the network weights$ \theta $are optimised via gradient descent. The input$ \mathbf{z} $is sampled once from$ \mathcal{N}(\mathbf{0}, \mathbf{I}) $and fixed. The reconstruction is$ \hat{\mathbf{x}} = f_{\theta^*}(\mathbf{z}) $.$ \square$

ex23-02-n2n-proof

Easy

Prove that the Noise2Noise loss $\mathbb{E}[\|f_\theta(\mathbf{y}_1) - \mathbf{y}_2\|^2]$ has the same minimiser as the supervised loss $\mathbb{E}[\|f_\theta(\mathbf{y}_1) - \mathbf{x}\|^2]$ when $\mathbf{w}_1$ and $\mathbf{w}_2$ are independent zero-mean noise.

Show Hint

Write $\mathbf{y}_2 = \mathbf{x} + \mathbf{w}_2$ and expand the squared norm.

The cross-term vanishes due to independence and zero mean.

Solution

Expand the Noise2Noise loss

$\mathbb{E}[\|f_\theta(\mathbf{y}_1) - \mathbf{y}_2\|^2] = \mathbb{E}[\|f_\theta(\mathbf{y}_1) - \mathbf{x} - \mathbf{w}_2\|^2]KATEXPLACEHOLDER0END= \mathbb{E}[\|f_\theta(\mathbf{y}_1) - \mathbf{x}\|^2] - 2\mathbb{E}[\langle f_\theta(\mathbf{y}_1) - \mathbf{x}, \mathbf{w}_2\rangle] + \mathbb{E}[\|\mathbf{w}_2\|^2].$ $

Cross-term vanishes

Since $\mathbf{w}_2$ is independent of $\mathbf{y}_1$ (and hence of $f_\theta(\mathbf{y}_1)$ and $\mathbf{x}$ ) with $\mathbb{E}[\mathbf{w}_2] = \mathbf{0}$ : $\mathbb{E}[\langle f_\theta(\mathbf{y}_1) - \mathbf{x}, \mathbf{w}_2\rangle] = 0.$

The N2N loss equals the supervised loss plus a constant $\mathbb{E}[\|\mathbf{w}_2\|^2]$ . The minimiser over $\theta$ is the same. $\square$

ex23-03-sure-linear

Easy

For a linear denoiser $f(\mathbf{y}) = \mathbf{W}\mathbf{y}$ , compute the divergence $\operatorname{div}(f)$ and write the SURE loss in closed form.

Show Hint

For a linear map, $\operatorname{div}(f) = \operatorname{tr}(\mathbf{W})$ .

Solution

Divergence of a linear map

$[f(\mathbf{y})]_i = \sum_j W_{ij} y_j$ . Therefore $\frac{\partial [f]_i}{\partial y_i} = W_{ii}$ and $\operatorname{div}(f) = \sum_i W_{ii} = \operatorname{tr}(\mathbf{W})$ .

SURE loss

$\text{SURE} = \frac{1}{N}\|(\mathbf{I} - \mathbf{W})\mathbf{y}\|^2 - \sigma^2 + \frac{2\sigma^2}{N}\operatorname{tr}(\mathbf{W}).$ $Minimising over$ \mathbf{W} $yields the Wiener filter$ \mathbf{W}^* = \mathbf{C}_x(\mathbf{C}_x + \sigma^2\mathbf{I})^{-1} $.$ \square$

ex23-04-ei-definition

Easy

Write the equivariant imaging loss $\mathcal{L}_{\text{EI}}$ for a reconstruction network $f_\theta$ , forward operator $\mathbf{A}$ , and a group $\mathcal{G} = \{T_1, \ldots, T_K\}$ of discrete transformations. Explain the role of each term.

Show Hint

There are two terms: data consistency and equivariance.

Solution

Data consistency loss

$\mathcal{L}_{\text{DC}} = \mathbb{E}_\mathbf{y}\bigl[\|\mathbf{A}f_\theta(\mathbf{y}) - \mathbf{y}\|^2\bigr].$ $This ensures the reconstruction is consistent with the measurements (constrains the range of$ \mathbf{A}^H$).

Equivariance loss

$\mathcal{L}_{\text{EI}} = \frac{1}{K}\sum_{k=1}^K \mathbb{E}_\mathbf{y}\bigl[\|f_\theta(\mathbf{A}T_k f_\theta(\mathbf{y})) - T_k f_\theta(\mathbf{y})\|^2\bigr].$ $This requires the network to produce consistent reconstructions under transformations (constrains the null space of$ \mathbf{A}$).

Total loss

$\mathcal{L} = \mathcal{L}_{\text{DC}} + \lambda\,\mathcal{L}_{\text{EI}}$ where $\lambda$ balances the two objectives. $\square$

ex23-05-foundation-gap

Easy

List three statistical differences between natural images and RF reflectivity maps that create a domain gap for foundation models. For each, suggest a mitigation strategy.

Show Hint

Think about dynamic range, complex values, and texture statistics.

Solution

Domain gap factors

Dynamic range: Natural images: 8-bit (0--255). RF reflectivity: 40+ dB dynamic range, often log-scaled. Mitigation: Normalise to log-magnitude before feeding to the foundation model.
Complex values: Natural images are real-valued (RGB). RF signals are complex (amplitude + phase). Mitigation: Use 2-channel (real/imaginary) representation, or separate magnitude/phase processing.
Texture statistics: Natural images have smooth textures and sharp edges. RF images have speckle (multiplicative noise), sidelobes, and grating lobes. Mitigation: Fine-tune on simulated RF data, or use LoRA adaptation with a small RF dataset. $\square$

ex23-06-dip-overfitting

Medium

A DIP reconstruction of a $128 \times 128$ image uses a U-Net with 1.5 million parameters. The image has $N = 16{,}384$ pixels. Explain why the network can overfit and estimate the number of iterations before overfitting begins. Compare with a Deep Decoder having 50K parameters.

Show Hint

The network has $\sim 90\times$ more parameters than pixels.

Overfitting begins when the network starts memorising the noise pattern.

Solution

Over-parameterisation

With 1.5M parameters and 16K pixels, the network is $\sim 90\times$ over-parameterised. It has sufficient capacity to fit any function, including pure noise.

Overfitting timeline

Empirically, DIP overfitting begins after $\sim 2{,}000$ -- $5{,}000$ iterations. The spectral bias delays overfitting: low-frequency signal ( $\sim 50\%$ of energy) is fit in $\sim 500$ iterations; mid-frequency details in $\sim 2{,}000$ ; high-frequency noise in $\sim 5{,}000$ +. The optimal stopping point depends on SNR.

Deep Decoder comparison

A Deep Decoder with 50K parameters ( $\sim 3\times$ the pixel count) cannot memorise noise due to under-parameterisation. It converges without overfitting but may miss fine details. The U-Net DIP achieves higher peak PSNR (at the optimal stopping point) but requires careful early stopping. $\square$

,

ex23-07-sure-soft-threshold

Medium

Compute SURE for the soft-thresholding denoiser $f_\lambda(\mathbf{y})_i = \text{sign}(y_i)\max(|y_i| - \lambda, 0)$ . Find the optimal threshold $\lambda^*$ as a function of $\sigma$ and the signal statistics.

Show Hint

$\operatorname{div}(f_\lambda) = |\{i : |y_i| > \lambda\}|$ .

The optimal $\lambda$ balances bias (from thresholding signal) and variance (from noise leakage).

Solution

SURE for soft thresholding

The divergence is $\operatorname{div}(f_\lambda) = |\{i : |y_i| > \lambda\}| \triangleq K(\lambda)$ .

$\text{SURE}(\lambda) = \frac{1}{N}\sum_i \min(y_i^2, \lambda^2) - \sigma^2 + \frac{2\sigma^2}{N}K(\lambda).$

Optimal threshold

For Gaussian signal + Gaussian noise with known sparsity fraction $s = K_{\text{true}}/N$ , the optimal threshold scales as $\lambda^* \approx \sigma\sqrt{2\log(N/K_{\text{true}})}$ (the universal threshold).

In practice, minimise SURE numerically: evaluate $\text{SURE}(\lambda)$ for a grid of $\lambda$ values and choose the minimiser. This requires no knowledge of $s$ . $\square$

,

ex23-08-ei-shift-fourier

Medium

For a partial Fourier sensing matrix $\mathbf{A} = \mathbf{P}_\Omega\mathbf{F}$ , show that a spatial shift $T_\Delta$ by $\Delta$ pixels corresponds to a phase rotation in Fourier space. Explain why this is useful for equivariant imaging.

Show Hint

Shift theorem: $\mathcal{F}[T_\Delta x](k) = e^{-j2\pi k\Delta/N}\hat{x}(k)$ .

Solution

Fourier shift theorem

A circular shift by $\Delta$ pixels: $[T_\Delta\mathbf{x}]_n = x_{(n-\Delta) \bmod N}$ . In Fourier domain: $[\mathbf{F}T_\Delta\mathbf{x}]_k = e^{-j2\pi k\Delta/N}\hat{x}_k$ .

Effect on measurements

$\mathbf{A}T_\Delta\mathbf{x} = \mathbf{P}_\Omega\,\operatorname{diag}(e^{-j2\pi k\Delta/N})\,\hat{\mathbf{x}}$ .

The phase rotation does not change which frequencies are measured, but it modulates them differently. Different shifts create a system of equations at each measured frequency.

EI benefit

The EI constraint $f_\theta(\mathbf{A}T_\Delta\hat{\mathbf{x}}) = T_\Delta\hat{\mathbf{x}}$ for multiple shifts $\{\Delta_1, \ldots, \Delta_K\}$ forces the network to produce consistent reconstructions under different phase modulations. This implicitly constrains the unmeasured frequencies through the network's learned inductive bias. $\square$

ex23-09-gsure-derivation

Medium

Derive GSURE for the inverse problem $\mathbf{y} = \mathbf{A}\mathbf{x} + \mathbf{w}$ with $\mathbf{w} \sim \mathcal{N}(\mathbf{0}, \sigma^2\mathbf{I}_M)$ . Show that it estimates the projected MSE $\|\mathbf{A}\hat{\mathbf{x}} - \mathbf{A}\mathbf{x}\|^2/M$ and explain why it cannot constrain the null space.

Show Hint

Apply standard SURE to the composite map $h_\theta(\mathbf{y}) = \mathbf{A}f_\theta(\mathbf{y})$ .

Solution

Apply SURE to the projected estimate

Define $h_\theta(\mathbf{y}) = \mathbf{A}f_\theta(\mathbf{y}) : \mathbb{R}^M \to \mathbb{R}^M$ . Applying standard SURE: $\text{GSURE} = \frac{1}{M}\|\mathbf{y} - h_\theta(\mathbf{y})\|^2 - \sigma^2 + \frac{2\sigma^2}{M}\operatorname{div}(h_\theta).$

Divergence

$\operatorname{div}(h_\theta) = \operatorname{tr}(\mathbf{A}\,\mathbf{J}_{f_\theta})$ where $\mathbf{J}_{f_\theta} \in \mathbb{R}^{N \times M}$ is the Jacobian of $f_\theta$ . MC estimate: $\widehat{\operatorname{div}} = \mathbf{b}^\top \mathbf{A}\,\mathbf{J}_{f_\theta}\,\mathbf{b}$ with $\mathbf{b} \sim \mathcal{N}(\mathbf{0}, \mathbf{I}_M)$ .

Null-space blindness

GSURE estimates $\mathbb{E}[\|\mathbf{A}(f_\theta(\mathbf{y}) - \mathbf{x})\|^2]/M$ , which depends only on the range-space error. If $\mathbf{v} \in \text{null}(\mathbf{A})$ , then $\mathbf{A}(\hat{\mathbf{x}} + \mathbf{v}) = \mathbf{A}\hat{\mathbf{x}}$ , so GSURE is identical for $\hat{\mathbf{x}}$ and $\hat{\mathbf{x}} + \mathbf{v}$ . Additional regularisation (TV, EI, prior) is needed. $\square$

ex23-10-n2v-correlated

Medium

Explain why Noise2Void fails when the noise is spatially correlated. Provide a concrete example from RF imaging where this occurs.

Show Hint

If noise at pixel $i$ can be predicted from neighbouring pixels, the network can 'cheat'.

Solution

Why N2V fails

Noise2Void masks pixel $i$ and predicts it from $\{y_j : j \neq i\}$ . The loss is $|f_\theta(\mathbf{y}_{\setminus i})_i - y_i|^2$ . The optimal predictor minimises this by predicting $\mathbb{E}[y_i \mid \{y_j : j \neq i\}]$ .

If noise is independent: $\mathbb{E}[y_i \mid \{y_j\}_{j \neq i}] = \mathbb{E}[x_i \mid \{y_j\}_{j \neq i}]$ (the network learns to denoise).

If noise is correlated: $\mathbb{E}[y_i \mid \{y_j\}_{j \neq i}]$ includes a noise prediction component, so the network learns to partially reproduce noise rather than fully remove it.

RF imaging example

After matched filtering in RF imaging, the noise is convolved with the point spread function, creating correlations over the mainlobe width ( $\sim \lambda/(2\sin\theta_{\max})$ ). A N2V network can predict the masked pixel's noise from its correlated neighbours, resulting in noisy reconstructions. $\square$

ex23-11-n2n-gradient-variance

Medium

Compare the gradient variance of Noise2Noise and supervised training. Show that N2N has higher gradient variance and explain the practical implications for training.

Show Hint

The N2N gradient has an extra noise term from the noisy target.

Solution

Gradient comparison

Supervised gradient: $\nabla_\theta \|f_\theta(\mathbf{y}_1) - \mathbf{x}\|^2 = 2\mathbf{J}_{f_\theta}^\top(f_\theta(\mathbf{y}_1) - \mathbf{x})$ .

N2N gradient: $\nabla_\theta \|f_\theta(\mathbf{y}_1) - \mathbf{y}_2\|^2 = 2\mathbf{J}_{f_\theta}^\top(f_\theta(\mathbf{y}_1) - \mathbf{x} - \mathbf{w}_2)$ .

The N2N gradient has an additional term $-2\mathbf{J}_{f_\theta}^\top\mathbf{w}_2$ .

Variance analysis

$\text{Var}[\nabla\mathcal{L}_{\text{N2N}}] = \text{Var}[\nabla\mathcal{L}_{\text{sup}}] + 4\sigma^2\mathbb{E}[\|\mathbf{J}_{f_\theta}\|_F^2]$ .

The extra variance is proportional to $\sigma^2$ and the Jacobian norm.

Practical implications

Higher gradient variance means N2N requires: (1) smaller learning rate, (2) larger batch size, or (3) more training iterations to achieve the same convergence. Typically, N2N needs $\sim 2$ -- $5\times$ more iterations than supervised training, but the per-iteration cost is the same. $\square$

ex23-12-dip-tv

Hard

Combine DIP with total variation regularisation to create a more robust reconstruction. Write the modified loss, explain how TV interacts with DIP's spectral bias, and analyse whether early stopping is still needed.

Show Hint

Modified loss: $\|\mathbf{y} - \mathbf{A}f_\theta(\mathbf{z})\|^2 + \lambda\,\text{TV}(f_\theta(\mathbf{z}))$ .

TV penalises high-frequency content, reinforcing the spectral bias.

Solution

Modified DIP loss

$\mathcal{L}(\theta) = \frac{1}{2}\|\mathbf{y} - \mathbf{A}f_\theta(\mathbf{z})\|^2 + \lambda\sum_i \|\nabla [f_\theta(\mathbf{z})]_i\|_2.$ $

Interaction with spectral bias

DIP's spectral bias learns low frequencies first. TV penalises high-frequency gradients, reinforcing this tendency. The combined effect:

Early iterations: DIP learns low-frequency structure; TV is inactive (low gradient values).
Mid iterations: DIP starts fitting mid-frequency details; TV selectively preserves edges while suppressing oscillations.
Late iterations: TV prevents the network from fitting high-frequency noise, extending the useful training window.

Early stopping analysis

TV significantly reduces the need for early stopping: the PSNR curve has a plateau (from $\sim 3{,}000$ to $\sim 8{,}000$ iterations) rather than a sharp peak ( $\sim 3{,}200$ without TV). However, for very long runs ( $>10{,}000$ iterations), the network can still overfit in the TV-null directions (constant regions). In practice, DIP+TV is much more robust to the stopping criterion than vanilla DIP. $\square$

ex23-13-ei-recovery-proof

Hard

Prove that for a partial Fourier matrix $\mathbf{A} = \mathbf{P}_\Omega\mathbf{F}$ with $|\Omega| = M < N$ , equivariant imaging with all $N$ circular shifts recovers the full signal (assuming shift-invariant signal class).

Show Hint

Each shift creates measurements with different phase modulations at the same frequency locations.

Show that the system of equations is over-determined.

Solution

Shift creates virtual measurements

For shift $\Delta$ : $\mathbf{A}T_\Delta\mathbf{x} = \mathbf{P}_\Omega\operatorname{diag}(e^{-j2\pi k\Delta/N})\hat{\mathbf{x}}$ . The EI constraint $f_\theta(\mathbf{A}T_\Delta\hat{\mathbf{x}}) = T_\Delta\hat{\mathbf{x}}$ requires correct reconstruction from these modulated measurements.

Full system

With $N$ shifts $\Delta = 0, 1, \ldots, N-1$ , the EI constraints create a system of $N \times M$ equations. For each measured frequency $k \in \Omega$ , the $N$ shifts provide $N$ modulated observations with phases $e^{-j2\pi k\Delta/N}$ for $\Delta = 0, \ldots, N-1$ .

The reconstruction must be consistent with all these modulated views, which constrains not just the measured coefficients but also the unmeasured ones through the network.

Over-determined system

The system has $NM$ equations for $N$ unknowns (the Fourier coefficients). Since $M \geq 1$ , this is over-determined, and the unique solution (for a perfect network) is the true signal. The condition for exact recovery is that shifts generate enough "virtual diversity" to span all frequencies through the reconstruction network. $\square$

ex23-14-sure-nonGaussian

Hard

Extend SURE to Poisson noise. For $y_i \sim \text{Poisson}(x_i)$ , derive an unbiased risk estimate analogous to SURE for Gaussian noise.

Show Hint

For Poisson: $\mathbb{E}[y_i \cdot g(y_i)] = \mathbb{E}[x_i \cdot g(y_i + 1)]$ (Stein-type identity for Poisson).

Solution

Poisson Stein identity

For $Y \sim \text{Poisson}(\lambda)$ : $\mathbb{E}[\lambda \cdot g(Y)] = \mathbb{E}[Y \cdot g(Y)]$ for any $g$ with $\mathbb{E}[|Y \cdot g(Y)|] < \infty$ .

Equivalently: $\mathbb{E}[\lambda \cdot (g(Y+1) - g(Y))] = \mathbb{E}[(Y - \lambda) \cdot g(Y)]$ .

Poisson SURE (PURE)

For denoiser $f_\theta$ applied to Poisson data: $\text{PURE}(f_\theta) = \frac{1}{N}\sum_i \bigl[f_\theta(\mathbf{y})_i - y_i\bigr]^2 + \frac{2}{N}\sum_i y_i\bigl[f_\theta(\mathbf{y} + \mathbf{e}_i)_i - f_\theta(\mathbf{y})_i\bigr] - \frac{1}{N}\|\mathbf{y}\|_1$ where $\mathbf{e}_i$ is the $i$ -th canonical vector.

The divergence term $y_i[f_\theta(\mathbf{y} + \mathbf{e}_i)_i - f_\theta(\mathbf{y})_i]$ is a finite-difference approximation that replaces the Gaussian divergence.

Practical limitation

PURE requires $N$ forward passes (one per pixel perturbation), which is much more expensive than MC-SURE (one extra backward pass). Efficient approximations use random subsets of pixels. $\square$

ex23-15-dip-complex

Hard

Design a DIP reconstruction for complex-valued SAR images that handles: (1) complex signals, (2) multiplicative speckle noise, and (3) phase preservation. Compare with standard real-valued DIP.

Show Hint

Use a 2-channel output for real and imaginary parts.

For speckle: work in log-domain to convert multiplicative noise to additive.

Solution

Complex DIP architecture

Network $f_\theta: \mathbb{R}^d \to \mathbb{R}^{2N}$ outputs 2 channels: $[\operatorname{Re}(\hat{x}); \operatorname{Im}(\hat{x})]$ . The complex image is $\hat{\mathbf{x}} = f_\theta^{(1)}(\mathbf{z}) + jf_\theta^{(2)}(\mathbf{z})$ .

Speckle-aware loss

For multiplicative speckle with Goodman model: $\mathcal{L}(\theta) = \sum_i \left[\frac{|y_i|}{|\hat{x}_i|^2} + 2\log|\hat{x}_i|\right] + \mu\,\|\mathbf{P}_\Omega\mathbf{F}\hat{\mathbf{x}} - \mathbf{y}_{\text{raw}}\|^2$ combining the negative log-likelihood with data fidelity on raw measurements.

Phase preservation

Add a phase consistency loss: $\|\angle(\mathbf{A}\hat{\mathbf{x}}) - \angle(\mathbf{y})\|^2$ on the measured frequencies. The complex DIP naturally preserves phase through the 2-channel representation.

Comparison

Real DIP on magnitude: PSNR $\approx 26$ dB, no phase. Complex DIP: PSNR $\approx 27.5$ dB, phase error $< 15°$ on strong scatterers. The improvement comes from jointly optimising magnitude and phase. $\square$

ex23-16-ei-measurement-splitting

Hard

Combine equivariant imaging with measurement splitting for RF imaging with a partial Fourier forward model. Write the combined loss and analyse how each component contributes to null-space recovery.

Show Hint

Split measurements: $\mathbf{y}_1 = \mathbf{P}_{\Omega_1}\mathbf{F}\mathbf{x} + \mathbf{w}_1$ , $\mathbf{y}_2 = \mathbf{P}_{\Omega_2}\mathbf{F}\mathbf{x} + \mathbf{w}_2$ .

Solution

Combined loss

$\mathcal{L} = \underbrace{\|\mathbf{P}_{\Omega_2}\mathbf{F}f_\theta(\mathbf{y}_1) - \mathbf{y}_2\|^2}_{\text{measurement splitting}} + \lambda_{\text{EI}}\underbrace{\sum_k \|f_\theta(\mathbf{A}T_{g_k}f_\theta(\mathbf{y}_1)) - T_{g_k}f_\theta(\mathbf{y}_1)\|^2}_{\text{equivariance}} + \lambda_{\text{DC}}\underbrace{\|\mathbf{A}f_\theta(\mathbf{y}_1) - \mathbf{y}\|^2}_{\text{data consistency}}.$ $

Complementary contributions

Measurement splitting constrains frequencies in $\Omega_2$ (cross-validates on held-out measurements).
Data consistency constrains frequencies in $\Omega = \Omega_1 \cup \Omega_2$ .
Equivariance constrains frequencies in $\Omega^c$ (null space) via symmetry-induced virtual measurements.

Together, they provide supervision for all frequency components.

Advantage

The combination is more robust than either method alone: measurement splitting handles non-symmetric scenes, while EI handles scenes where the measurement split is too sparse. $\square$

,

ex23-17-ram-conditioning

Challenge

Design a conditioning mechanism for a RAM-style foundation model that adapts to different RF imaging forward operators. The model should handle partial Fourier, diffraction tomography, and MIMO radar sensing matrices without retraining.

Show Hint

The forward operator can be encoded via its SVD, its PSF, or a learned embedding.

Consider both explicit conditioning (operator as input) and implicit conditioning (via the data consistency loss).

Solution

SVD-based conditioning

Encode $\mathbf{A}$ via its truncated SVD: $\mathbf{A} \approx \mathbf{U}_r\boldsymbol{\Sigma}_r\mathbf{V}_r^H$ . Feed the singular values $\{\sigma_1, \ldots, \sigma_r\}$ and a low-dimensional representation of $\mathbf{U}_r, \mathbf{V}_r$ to the network as side information.

The network architecture: $f_\theta(\mathbf{y}, \text{SVD}(\mathbf{A}))$ uses cross-attention or FiLM conditioning to modulate features based on the operator.

PSF conditioning

For shift-invariant operators, encode $\mathbf{A}$ via its point spread function (PSF). The PSF is a compact 2D/3D function that captures the operator's spatial characteristics. This is natural for RF imaging where the PSF depends on array geometry and frequency.

Learned operator embedding

Train an operator encoder $\psi(\mathbf{A})$ that maps any linear operator to a fixed-dimensional embedding. During RAM pretraining, the encoder learns to extract operator-relevant features (rank, condition number, spatial frequency coverage).

Test-time adaptation

For new operators not seen during training: (1) compute the conditioning vector, (2) run the foundation model for an initial reconstruction, (3) fine-tune using SURE or EI losses for 50--100 iterations. This combines the foundation model's broad prior with operator-specific adaptation. $\square$

ex23-18-self-supervised-comparison

Challenge

Design a comprehensive comparison experiment of self-supervised methods for RF imaging. Compare DIP, Noise2Noise, SURE+PnP, equivariant imaging, and foundation model transfer on the same test set. Define evaluation metrics, data requirements, and predict which method wins in each regime.

Show Hint

Each method has different data requirements: DIP (1 measurement), N2N (paired noisy), SURE+PnP (noisy images), EI (unpaired measurements), FM (pretrained model).

Consider both low-SNR/high-SNR and sparse/extended scene regimes.

Solution

Experimental setup

Test set: 100 simulated RF scenes (50 sparse point-scatterers, 50 extended). Forward model: partial Fourier with compression ratio $M/N = 0.25$ . SNR: 10, 20, 30 dB.

Data requirements

Method	Training data	Test-time data	Test-time compute
DIP	None	1 meas. + optimisation	$\sim 5$ min/image
N2N	1000 noisy pairs	1 measurement	$\sim 0.1$ s
SURE+PnP	1000 noisy images	1 measurement	$\sim 10$ s
EI	1000 unpaired meas.	1 measurement	$\sim 0.1$ s
FM (LoRA)	ImageNet + 100 RF	1 measurement	$\sim 50$ s

Predicted winners

Low SNR (10 dB), sparse: DIP (strong architectural bias for sparse signals)
Low SNR (10 dB), extended: EI (symmetries provide null-space info)
High SNR (30 dB), sparse: SURE+PnP (denoiser quality dominates)
High SNR (30 dB), extended: N2N (strongest supervision among self-supervised)
Domain shift at test time: FM with LoRA (robust to distribution changes)

Metrics

Primary: PSNR (magnitude), SSIM, phase error (degrees). Secondary: computational time, memory, hyperparameter sensitivity. $\square$

, ,

ex23-19-sure-pnp

Challenge

Design a SURE-based training procedure for the plug-and-play denoiser in a PnP-ADMM algorithm. The denoiser is trained end-to-end through the ADMM iterations using SURE loss (no clean ground truth). Analyse convergence and the interaction between SURE training and ADMM convergence.

Show Hint

Unroll $K$ ADMM iterations and compute SURE on the final output.

The divergence must be computed through the entire unrolled ADMM.

Solution

Unrolled PnP-ADMM

Unroll $K$ ADMM iterations: $\hat{\mathbf{x}}_K = \text{ADMM}_K(\mathbf{y}; f_\theta)$ where the denoiser $f_\theta$ is applied at each proximal step. The end-to-end map is $g_\theta(\mathbf{y}) = \text{ADMM}_K(\mathbf{y}; f_\theta)$ .

SURE on the unrolled output

Apply GSURE to the end-to-end reconstruction: $\text{GSURE}(g_\theta) = \frac{1}{M}\|\mathbf{y} - \mathbf{A}g_\theta(\mathbf{y})\|^2 - \sigma^2 + \frac{2\sigma^2}{M}\operatorname{div}_\mathbf{y}(\mathbf{A}g_\theta).$

The divergence $\operatorname{div}(\mathbf{A}g_\theta)$ is computed via backpropagation through the entire unrolled ADMM (MC estimate with one probe vector).

Convergence analysis

Two nested optimisation loops interact:

Outer loop: gradient descent on $\theta$ to minimise GSURE.
Inner loop: $K$ ADMM iterations for reconstruction.

For convergence: (1) the ADMM must converge for each fixed $\theta$ (requires the denoiser to be firm non-expansive), (2) the GSURE gradient must be unbiased (requires Gaussian noise). In practice, $K = 5$ -- $10$ iterations and the denoiser is regularised for non-expansiveness via spectral normalisation. $\square$

,

ex23-20-ei-rf-multiview

Challenge

For a multi-static RF imaging system with $I$ transmitters and $J$ receivers, design an equivariant imaging framework that exploits both spatial symmetries and measurement redundancy. Analyse the null-space recovery guarantee as a function of the array geometry and the symmetry group.

Show Hint

The multi-static sensing matrix has a block structure: $\mathbf{A} = [\mathbf{A}_{1,1}; \ldots; \mathbf{A}_{I,J}]$ .

Consider both scene symmetries and array symmetries.

Solution

Multi-static forward model

Each Tx-Rx pair provides $\mathbf{y}_{i,j} = \mathbf{A}_{i,j}\mathbf{c} + \mathbf{w}_{i,j}$ where $\mathbf{A}_{i,j}$ depends on the steering vectors $\mathbf{a}(\phi_i, \theta_i)$ and $\hat{\mathbf{a}}(\hat{\phi}_j, \hat{\theta}_j)$ . The combined system has $IJK$ measurements for $Q$ voxels.

Scene symmetries

For scenes invariant under rotations by angle $\alpha$ : the rotation $T_\alpha$ permutes the voxel grid, and the EI loss enforces $f_\theta(\mathbf{A}T_\alpha f_\theta(\mathbf{y})) = T_\alpha f_\theta(\mathbf{y})$ .

For the rotation to mix range and null spaces, the array must not have the same rotational symmetry as the scene (otherwise the measurements are invariant under rotation, providing no information).

Array-induced symmetries

If the array has $P$ -fold symmetry (e.g., UPA with $P = 4$ ), the measurements of a rotated scene can be computed from the measurements of the original scene by permuting Tx-Rx indices. This provides "free" data augmentation that does not require re-measurement.

Recovery guarantee

Full recovery requires the group action to be transitive on the null space. For a UPA with $N_t \times N_r$ elements at wavelength $\lambda$ and scene diameter $D$ , the null space has dimension $Q - \text{rank}(\mathbf{A})$ . The number of independent constraints from $K$ group elements is $\sim K \cdot \text{rank}(\mathbf{A}) \cdot (1 - \text{overlap})$ where overlap measures redundancy between transformed measurements. Recovery requires this to exceed $Q - \text{rank}(\mathbf{A})$ . $\square$

,

Exercises

ex23-01-dip-loss

DIP loss

ex23-02-n2n-proof

Expand the Noise2Noise loss

Cross-term vanishes

ex23-03-sure-linear

Divergence of a linear map

SURE loss

ex23-04-ei-definition

Data consistency loss

Equivariance loss

Total loss

ex23-05-foundation-gap

Domain gap factors

ex23-06-dip-overfitting

Over-parameterisation

Overfitting timeline

Deep Decoder comparison

ex23-07-sure-soft-threshold

SURE for soft thresholding

Optimal threshold

ex23-08-ei-shift-fourier

Fourier shift theorem

Effect on measurements

EI benefit

ex23-09-gsure-derivation

Apply SURE to the projected estimate

Divergence

Null-space blindness

ex23-10-n2v-correlated

Why N2V fails

RF imaging example

ex23-11-n2n-gradient-variance

Gradient comparison

Variance analysis

Practical implications

ex23-12-dip-tv

Modified DIP loss

Interaction with spectral bias

Early stopping analysis

ex23-13-ei-recovery-proof

Shift creates virtual measurements

Full system

Over-determined system

ex23-14-sure-nonGaussian

Poisson Stein identity

Poisson SURE (PURE)

Practical limitation

ex23-15-dip-complex

Complex DIP architecture

Speckle-aware loss

Phase preservation

Comparison

ex23-16-ei-measurement-splitting

Combined loss

Complementary contributions

Advantage

ex23-17-ram-conditioning

SVD-based conditioning

PSF conditioning

Learned operator embedding

Test-time adaptation

ex23-18-self-supervised-comparison

Experimental setup

Data requirements

Predicted winners

Metrics

ex23-19-sure-pnp

Unrolled PnP-ADMM

SURE on the unrolled output

Convergence analysis

ex23-20-ei-rf-multiview

Multi-static forward model

Scene symmetries

Array-induced symmetries

Recovery guarantee