Ferkans — Interactive Telecom Tutor

ex20-01-mf-decomposition

Easy

Given a sensing matrix $\mathbf{A} \in \mathbb{C}^{M \times N}$ and measurements $\mathbf{y} = \mathbf{A}\mathbf{c} + \mathbf{w}$ , write down the matched-filter image $\hat{\mathbf{c}}^{\text{BP}}$ and show that it decomposes into a signal term and a noise term. Identify the Gram matrix and characterise the covariance of the noise term.

Show Hint

Apply $\mathbf{A}^{H}$ to both sides of $\mathbf{y} = \mathbf{A}\mathbf{c} + \mathbf{w}$ .

Identify the Gram matrix $\mathbf{G} = \mathbf{A}^{H}\mathbf{A}$ .

Compute $\mathbb{E}[\tilde{\mathbf{w}}\tilde{\mathbf{w}}^H]$ where $\tilde{\mathbf{w}} = \mathbf{A}^{H}\mathbf{w}$ .

Solution

Compute the matched-filter image

$\hat{\mathbf{c}}^{\text{BP}} = \mathbf{A}^{H} \mathbf{y} = \mathbf{A}^{H}(\mathbf{A}\mathbf{c} + \mathbf{w}) = \underbrace{\mathbf{A}^{H}\mathbf{A}}_{\mathbf{G}}\,\mathbf{c} + \underbrace{\mathbf{A}^{H}\mathbf{w}}_{\tilde{\mathbf{w}}}.$ $

Characterise the noise covariance

For white noise $\mathbf{w} \sim \mathcal{CN}(\mathbf{0}, \sigma^2\mathbf{I}_M)$ :

$\mathbb{E}[\tilde{\mathbf{w}}\tilde{\mathbf{w}}^H] = \mathbf{A}^{H} \cdot \sigma^2\mathbf{I} \cdot \mathbf{A} = \sigma^2\,\mathbf{G}.$

The back-projected noise is coloured with the same correlation structure as the PSF (Gram matrix). This is the key difficulty: noise and sidelobe artefacts share the same spatial pattern.

ex20-02-dc-idempotent

Easy

Verify that the hard data-consistency layer $\text{DC}(\hat{\mathbf{c}}) = \hat{\mathbf{c}} - \mathbf{A}^{H}(\mathbf{A}\hat{\mathbf{c}} - \mathbf{y})$ is idempotent when $\mathbf{A}\mathbf{A}^{H} = \mathbf{I}$ , i.e., $\text{DC}(\text{DC}(\hat{\mathbf{c}})) = \text{DC}(\hat{\mathbf{c}})$ .

Show Hint

Let $\mathbf{z} = \text{DC}(\hat{\mathbf{c}})$ and show $\mathbf{A}\mathbf{z} = \mathbf{y}$ .

Then compute $\text{DC}(\mathbf{z})$ using the result above.

Solution

Show measurement consistency

Apply $\mathbf{A}$ to the DC output: $\mathbf{A}\,\text{DC}(\hat{\mathbf{c}}) = \mathbf{A}\hat{\mathbf{c}} - \mathbf{A}\mathbf{A}^{H}(\mathbf{A}\hat{\mathbf{c}} - \mathbf{y}) = \mathbf{A}\hat{\mathbf{c}} - (\mathbf{A}\hat{\mathbf{c}} - \mathbf{y}) = \mathbf{y}.$

Apply DC a second time

Let $\mathbf{z} = \text{DC}(\hat{\mathbf{c}})$ . We showed $\mathbf{A}\mathbf{z} = \mathbf{y}$ . Therefore: $\text{DC}(\mathbf{z}) = \mathbf{z} - \mathbf{A}^{H}(\mathbf{A}\mathbf{z} - \mathbf{y}) = \mathbf{z} - \mathbf{A}^{H}(\mathbf{y} - \mathbf{y}) = \mathbf{z}. \quad\square$

ex20-03-loss-estimator

Easy

A post-processing network $f_\theta$ is trained with the MSE loss $\mathcal{L}(\theta) = \mathbb{E}[\|f_\theta(\hat{\mathbf{c}}^{\text{BP}}) - \mathbf{c}\|^2]$ .

(a) State the optimal network $f_{\theta^*}$ in the limit of infinite data and network capacity.

(b) For a 1D binary scene $c \in \{-1, +1\}$ with equal probability and measurement $y = c + n$ with $n \sim \mathcal{N}(0, \sigma^2)$ , compute $f_{\theta^*}(y) = \mathbb{E}[c \mid y]$ .

Show Hint

MSE is minimised pointwise by the conditional mean.

Use Bayes theorem: $P(c = 1 \mid y) = \sigma_{\text{logistic}}(2y/\sigma^2)$ .

Solution

Optimal MSE network is the posterior mean

For any fixed $y$ , the MSE decomposes as $\mathbb{E}[\|f(y) - c\|^2 \mid y] = \|f(y) - \mathbb{E}[c \mid y]\|^2 + \operatorname{Var}(c \mid y)$ . The second term is irreducible, so $f^*(y) = \mathbb{E}[c \mid y]$ .

Compute the posterior mean for binary scene

$P(c = 1 \mid y) = \frac{e^{-(y-1)^2/(2\sigma^2)}}{e^{-(y-1)^2/(2\sigma^2)} + e^{-(y+1)^2/(2\sigma^2)}} = \frac{1}{1 + e^{-2y/\sigma^2}}.$ $Therefore$ f^(y) = \mathbb{E}[c \mid y] = P(c=1|y) - P(c=-1|y) = \tanh(y/\sigma^2) $. For$ |y| \ll \sigma $(low SNR):$ f^(y) \approx 0$, demonstrating the blurring effect — the network outputs a value that never occurs in the true scene.

ex20-04-gram-random

Medium

Let $\mathbf{A} \in \mathbb{C}^{M \times N}$ have i.i.d. entries $A_{ij} \sim \mathcal{CN}(0, 1/M)$ . Show that:

(a) $\mathbb{E}[G_{ii}] = 1$ for all $i$ . (b) $\mathbb{E}[G_{ij}] = 0$ for $i \neq j$ . (c) $\operatorname{Var}(G_{ij}) = 1/M$ for $i \neq j$ .

Conclude that $\mathbf{G} \approx \mathbf{I}$ for large $M$ and state the implication for MF→U-Net performance.

Show Hint

$G_{ij} = \sum_{m=1}^M \overline{A_{mi}} A_{mj}$ . Use independence of the entries.

For $i \neq j$ , each summand $\overline{A_{mi}} A_{mj}$ has zero mean by independence.

Solution

Diagonal entries

$G_{ii} = \|\mathbf{a}_i\|^2 = \sum_{m=1}^M |A_{mi}|^2$ . Each $|A_{mi}|^2$ has mean $\mathbb{E}[|A_{mi}|^2] = 1/M$ for $A_{mi} \sim \mathcal{CN}(0, 1/M)$ . By linearity, $\mathbb{E}[G_{ii}] = M \cdot (1/M) = 1$ .

Off-diagonal mean

For $i \neq j$ : $G_{ij} = \sum_{m=1}^M \overline{A_{mi}} A_{mj}$ . Since $A_{mi}$ and $A_{mj}$ are independent for $i \neq j$ : $\mathbb{E}[\overline{A_{mi}} A_{mj}] = \overline{\mathbb{E}[A_{mi}]} \cdot \mathbb{E}[A_{mj}] = 0$ . Hence $\mathbb{E}[G_{ij}] = 0$ .

Off-diagonal variance

$\operatorname{Var}(G_{ij}) = \sum_{m=1}^M \mathbb{E}[|\overline{A_{mi}} A_{mj}|^2] = M \cdot \mathbb{E}[|A_{mi}|^2]\mathbb{E}[|A_{mj}|^2] = M \cdot \frac{1}{M} \cdot \frac{1}{M} = \frac{1}{M}$ .

As $M \to \infty$ : off-diagonal entries $\to 0$ in probability, so $\mathbf{G} \to \mathbf{I}$ . The MF image satisfies $\hat{\mathbf{c}}^{\text{BP}} \approx \mathbf{c} + \text{white noise}$ — the U-Net task reduces to simple denoising. $\square$

ex20-05-dc-mri

Medium

In MRI, the sensing operator is $\mathbf{A} = \mathbf{P}_\Omega\mathbf{F}$ where $\mathbf{F} \in \mathbb{C}^{N \times N}$ is the DFT matrix and $\mathbf{P}_\Omega$ selects $M$ rows.

(a) Show that $\mathbf{A}\mathbf{A}^{H} = \mathbf{I}_M$ (orthonormal rows). (b) Write the explicit form of the hard DC layer for this operator. (c) Interpret the DC layer as a k-space replacement operation.

Show Hint

Use the fact that $\mathbf{F}\mathbf{F}^H = N\mathbf{I}$ for the DFT matrix, with appropriate normalisation.

The DC layer replaces acquired k-space samples while preserving the network prediction at unacquired locations.

Solution

Verify orthonormal rows

Normalise so $\mathbf{F}^H\mathbf{F} = \mathbf{I}$ (unitary DFT). Then: $\mathbf{A}\mathbf{A}^{H} = \mathbf{P}_\Omega\mathbf{F}\mathbf{F}^H\mathbf{P}_\Omega^H = \mathbf{P}_\Omega\mathbf{I}\mathbf{P}_\Omega^H = \mathbf{P}_\Omega\mathbf{P}_\Omega^H = \mathbf{I}_M$ .

Write the DC layer

$\text{DC}(\hat{\mathbf{c}}) = \hat{\mathbf{c}} - \mathbf{F}^H\mathbf{P}_\Omega^H(\mathbf{P}_\Omega\mathbf{F}\hat{\mathbf{c}} - \mathbf{y}).$ $

k-space interpretation

Let $\hat{X} = \mathbf{F}\hat{\mathbf{c}}$ (DFT of network estimate). The DC layer sets: $[\mathbf{F}\,\text{DC}(\hat{\mathbf{c}})]_k = \begin{cases} y_k & k \in \Omega \\ \hat{X}_k & k \notin \Omega. \end{cases}$ Acquired k-space samples are replaced by the measurements; unacquired locations are filled by the network. The inverse DFT then gives the final image. $\square$

ex20-06-modl-cg

Medium

The MoDL data-consistency step solves $\hat{\mathbf{c}}_k = (\mathbf{A}^{H}\mathbf{A} + \lambda_k\mathbf{I})^{-1}(\mathbf{A}^{H}\mathbf{y} + \lambda_k\mathbf{z}_k)$ .

(a) Show that this is the solution to the regularised least-squares problem $\min_{\mathbf{c}} \|\mathbf{A}\mathbf{c} - \mathbf{y}\|^2 + \lambda_k\|\mathbf{c} - \mathbf{z}_k\|^2$ .

(b) For $\lambda_k \to 0$ , what does the solution approach? For $\lambda_k \to \infty$ , what does it approach?

Show Hint

Differentiate the objective with respect to $\mathbf{c}$ and set to zero.

For $\lambda_k \to 0$ : data fidelity dominates. For $\lambda_k \to \infty$ : CNN output dominates.

Solution

Derive the normal equations

The objective is $\frac{1}{2}\|\mathbf{A}\mathbf{c} - \mathbf{y}\|^2 + \frac{\lambda_k}{2}\|\mathbf{c} - \mathbf{z}_k\|^2$ . Taking the gradient and setting to zero: $\mathbf{A}^{H}(\mathbf{A}\mathbf{c} - \mathbf{y}) + \lambda_k(\mathbf{c} - \mathbf{z}_k) = \mathbf{0}$ $(\mathbf{A}^{H}\mathbf{A} + \lambda_k\mathbf{I})\mathbf{c} = \mathbf{A}^{H}\mathbf{y} + \lambda_k\mathbf{z}_k.$ Solving: $\hat{\mathbf{c}}_k = (\mathbf{A}^{H}\mathbf{A} + \lambda_k\mathbf{I})^{-1}(\mathbf{A}^{H}\mathbf{y} + \lambda_k\mathbf{z}_k)$ . $\square$

Limiting cases

$\lambda_k \to 0$ : $\hat{\mathbf{c}}_k \to (\mathbf{A}^{H}\mathbf{A})^\dagger\mathbf{A}^{H}\mathbf{y}$ (pseudoinverse — pure data fidelity, ignores denoiser).

$\lambda_k \to \infty$ : Divide by $\lambda_k$ : $\hat{\mathbf{c}}_k \to \mathbf{z}_k$ (pure denoiser output, ignores measurements).

The step size $\lambda_k$ controls the data-vs-prior balance.

ex20-07-perceptual-pseudo-metric

Medium

Show that the perceptual loss $\mathcal{L}_{\text{perc}}(\hat{\mathbf{c}}, \mathbf{c})$ is not a true metric on the image space by providing a counterexample where $\mathcal{L}_{\text{perc}} = 0$ but $\hat{\mathbf{c}} \neq \mathbf{c}$ . Explain the practical implication for RF imaging.

Show Hint

Consider the null space of the VGG feature extractor $\phi_\ell$ .

Deep networks are not injective: different inputs can produce identical features.

Solution

Identify the null space

The VGG feature extractor $\phi_\ell$ maps images to feature maps via convolutions and ReLU activations. This mapping is not injective. Add a high-frequency perturbation $\mathbf{p}$ with spatial frequency above VGG's sensitivity (e.g., alternating $\pm\epsilon$ checkerboard). For small $\epsilon$ , the pooling layers in VGG average out this pattern: $\phi_\ell(\hat{\mathbf{c}} + \mathbf{p}) \approx \phi_\ell(\hat{\mathbf{c}})$ for all selected layers.

Counterexample

Set $\hat{\mathbf{c}} = \mathbf{c} + \mathbf{p}$ where $\mathbf{p}$ is a high-frequency checkerboard pattern invisible to VGG. Then $\mathcal{L}_{\text{perc}}(\hat{\mathbf{c}}, \mathbf{c}) = 0$ but $\hat{\mathbf{c}} \neq \mathbf{c}$ . The perceptual loss is a pseudo-metric (violates identity of indiscernibles).

Implication for RF imaging

This means perceptual loss alone cannot guarantee pixel-level fidelity. High-frequency target signatures (e.g., the precise location of a point reflector to sub-pixel accuracy) may be corrupted without any perceptual loss penalty. In RF imaging, always combine perceptual loss with a pixel-wise term (MSE or $L^1$ ). $\square$

ex20-08-transfer-bound

Medium

A U-Net $f_\theta$ is trained on data from sensing matrix $\mathbf{A}_{1}$ with Gram matrix $\mathbf{G}_1$ . At deployment, the sensing matrix is $\mathbf{A}_{2}$ with Gram matrix $\mathbf{G}_2$ . Derive an upper bound on the deployment reconstruction error in terms of the training error and $\|\mathbf{G}_1 - \mathbf{G}_2\|$ . Assume $f_\theta$ is Lipschitz with constant $L_f$ .

Show Hint

The MF image from $\mathbf{A}_{k}$ is $\hat{\mathbf{c}}_k = \mathbf{G}_k\mathbf{c} + \tilde{\mathbf{w}}_k$ .

Use the triangle inequality to split into transfer error + training error.

Apply the Lipschitz property to bound the transfer error.

Solution

Split the error

$\|f_\theta(\hat{\mathbf{c}}^{\text{BP}}_{2}) - \mathbf{c}\| \leq \underbrace{\|f_\theta(\hat{\mathbf{c}}^{\text{BP}}_{2}) - f_\theta(\hat{\mathbf{c}}^{\text{BP}}_{1})\|}_{\text{transfer error}} + \underbrace{\|f_\theta(\hat{\mathbf{c}}^{\text{BP}}_{1}) - \mathbf{c}\|}_{\text{training error}}.$ $

Bound the transfer error

By Lipschitz continuity of $f_\theta$ : $\|f_\theta(\hat{\mathbf{c}}^{\text{BP}}_{2}) - f_\theta(\hat{\mathbf{c}}^{\text{BP}}_{1})\| \leq L_f \|\hat{\mathbf{c}}^{\text{BP}}_{2} - \hat{\mathbf{c}}^{\text{BP}}_{1}\| = L_f \|(\mathbf{G}_2 - \mathbf{G}_1)\mathbf{c} + (\tilde{\mathbf{w}}_2 - \tilde{\mathbf{w}}_1)\|.$

Final bound

$\|f_\theta(\hat{\mathbf{c}}^{\text{BP}}_{2}) - \mathbf{c}\| \leq L_f\bigl(\|\mathbf{G}_1 - \mathbf{G}_2\|\cdot\|\mathbf{c}\| + \|\tilde{\mathbf{w}}_2 - \tilde{\mathbf{w}}_1\|\bigr) + \text{training error}.$ $The transfer error is controlled by$ |\mathbf{G}_1 - \mathbf{G}_2| $. For random matrices (where$ \mathbf{G} \approx \mathbf{I} $for any geometry), this bound is tight. For physical operators with different geometries, the bound can be large, confirming that MF→U-Net requires retraining when the sensing operator changes significantly.$ \square$

ex20-09-sidelobe-correlation

Hard

Consider a scene with a single strong reflector at position $i_0$ : $\mathbf{c} = c\,\mathbf{e}_{i_0}$ with $c \in \mathbb{C}$ . The matched-filter image is $\hat{\mathbf{c}}^{\text{BP}} = \mathbf{G}\mathbf{c} + \tilde{\mathbf{w}}$ .

(a) Write the expression for the sidelobe at pixel $j \neq i_0$ in $\hat{\mathbf{c}}^{\text{BP}}$ .

(b) Show that the covariance between the sidelobe at pixel $j$ and the back-projected noise at pixel $j$ is $\sigma^2\,G_{ji_0}G_{ji_0}^*$ .

(c) Explain why this correlation makes it impossible for the U-Net to distinguish sidelobes from real features at pixel $j$ using $\hat{\mathbf{c}}^{\text{BP}}$ alone.

Show Hint

The sidelobe at pixel $j$ is $G_{ji_0} c$ . The back-projected noise at pixel $j$ is $[\tilde{\mathbf{w}}]_j$ .

Compute $\mathbb{E}[[\tilde{\mathbf{w}}]_j \cdot \overline{G_{ji_0} c}]$ using $\operatorname{Cov}(\tilde{\mathbf{w}}) = \sigma^2\mathbf{G}$ .

A classifier using only $[\hat{\mathbf{c}}^{\text{BP}}]_j$ cannot determine whether the signal at $j$ is a sidelobe or a real target.

Solution

Sidelobe at pixel j

$[\hat{\mathbf{c}}^{\text{BP}}]_j = [\mathbf{G}\mathbf{c}]_j + [\tilde{\mathbf{w}}]_j = G_{ji_0} c + [\tilde{\mathbf{w}}]_j.$

The sidelobe is $G_{ji_0} c$ — it exists even for $j \neq i_0$ .

Covariance between sidelobe and noise

The back-projected noise has covariance $[\operatorname{Cov}(\tilde{\mathbf{w}})]_{jk} = \sigma^2 G_{jk}$ . The signal at $j$ due to the point target has energy $|G_{ji_0}|^2|c|^2$ . The covariance between noise at pixel $j$ and the sidelobe is: $\operatorname{Cov}([\tilde{\mathbf{w}}]_j,\; G_{ji_0}c) = \sigma^2 G_{jj} \cdot G_{ji_0} \neq 0$ whenever $G_{jj} \neq 0$ and $G_{ji_0} \neq 0$ .

More precisely, the noise at pixel $j$ is $[\tilde{\mathbf{w}}]_j = \sum_m A_{mj}^* n_m$ . The sidelobe contribution from target $i_0$ to pixel $j$ is $\sum_m A_{mj}^* A_{mi_0} c$ . These share the same factor $A_{mj}^*$ , making them statistically dependent.

Why the U-Net cannot distinguish them

Given only $[\hat{\mathbf{c}}^{\text{BP}}]_j$ , the U-Net observes the sum $G_{ji_0} c + [\tilde{\mathbf{w}}]_j$ . These two terms are correlated because they both involve $\mathbf{A}^{H}$ . A feature at pixel $j$ could be: (a) a real target with amplitude $\sim G_{ji_0}c$ , or (b) noise that looks like a target. Without additional information about the sensing geometry and the true scene at $i_0$ , the U-Net cannot differentiate these cases from $\hat{\mathbf{c}}^{\text{BP}}$ alone.

This is the sidelobe corruption problem: structured sensing operators create correlated noise and signal components that are statistically inseparable from a single MF image. $\square$

ex20-10-modl-convergence

Hard

Consider the MoDL iteration with a fixed denoiser $\mathcal{D}_\theta(\mathbf{c}) = \text{prox}_{\lambda R}(\mathbf{c})$ for a convex regulariser $R$ . Show that the MoDL iteration:

$\hat{\mathbf{c}}_{k+1} = (\mathbf{A}^{H}\mathbf{A} + \mu\mathbf{I})^{-1}(\mathbf{A}^{H}\mathbf{y} + \mu\,\text{prox}_{\lambda R}(\hat{\mathbf{c}}_k))$

is equivalent to a proximal gradient step, and give conditions under which the iteration converges to the minimiser of $\frac{1}{2}\|\mathbf{A}\mathbf{c} - \mathbf{y}\|^2 + \lambda R(\mathbf{c})$ .

Show Hint

The CG solve is the proximal operator of $\frac{1}{2\mu}\|\mathbf{A}\cdot - \mathbf{y}\|^2$ .

Rewrite the iteration as a proximal-proximal splitting step.

Convergence requires $R$ to be convex and the step size to be chosen appropriately.

Solution

Identify the proximal operator

The CG step solves $\hat{\mathbf{c}}_{k+1} = \arg\min_{\mathbf{c}} \frac{1}{2}\|\mathbf{A}\mathbf{c}-\mathbf{y}\|^2 + \frac{\mu}{2}\|\mathbf{c} - \mathbf{z}_k\|^2$ where $\mathbf{z}_k = \text{prox}_{\lambda R}(\hat{\mathbf{c}}_k)$ .

This is equivalent to: $\hat{\mathbf{c}}_{k+1} = \text{prox}_{\frac{1}{\mu}\|\mathbf{A}\cdot-\mathbf{y}\|^2/2}(\mathbf{z}_k) = \text{prox}_{\frac{1}{2\mu}\|\mathbf{A}\cdot-\mathbf{y}\|^2}(\text{prox}_{\lambda R}(\hat{\mathbf{c}}_k)).$

Convergence conditions

This is a proximal-proximal splitting (alternating proximal algorithm). Convergence to the minimiser of $\frac{1}{2}\|\mathbf{A}\mathbf{c}-\mathbf{y}\|^2 + \lambda R(\mathbf{c})$ holds when:

$R$ is convex and lower semicontinuous.
The regularisation weight satisfies $\lambda/\mu \leq 1/L$ where $L = \|\mathbf{A}\|^2$ (spectral norm squared).
The step size $\mu$ is bounded above by twice the smallest eigenvalue of $\mathbf{A}^{H}\mathbf{A}$ in the relevant subspace. $\square$

ex20-11-unet-receptive-field

Hard

A U-Net for post-processing has $L$ encoder levels, each performing $2\times$ downsampling followed by two $3 \times 3$ convolutions. Derive the receptive field diameter $r_L$ . For $L = 4$ , what is the maximum sidelobe range that can be suppressed by this network, and what does this imply for long-range sidelobes in SAR?

Show Hint

At level $\ell$ , one pixel covers $2^\ell$ pixels at the original resolution.

Each $3\times 3$ convolution at level $\ell$ adds $2 \cdot 2^\ell$ pixels to the receptive field.

Solve the recursion $r_\ell = r_{\ell-1} + 4 \cdot 2^\ell$ with $r_0 = 5$ .

Solution

Base case ($ll = 0$)

At the input resolution, two $3\times 3$ convolutions give $r_0 = 2 + 3 = 5$ pixels.

Recursive formula

At level $\ell$ (after $\ell$ downsampling steps), each pixel covers $2^\ell$ pixels at the original scale. Two $3\times 3$ convolutions at level $\ell$ add $2 \times 2 \times 2^\ell = 4 \cdot 2^\ell$ pixels to the receptive field: $r_\ell = r_{\ell-1} + 4 \cdot 2^\ell.$

Solve the recursion

$r_L = r_0 + \sum_{\ell=1}^L 4 \cdot 2^\ell = 5 + 4 \cdot 2(2^L - 1) = 4 \cdot 2^{L+1} - 3.$ $For$ L = 4 $:$ r_4 = 4 \cdot 32 - 3 = 125 $pixels. **SAR implication:** SAR PSF sidelobes from the Doppler ambiguity can extend hundreds of resolution cells in the cross-range direction. A 4-level U-Net with 125-pixel receptive field will miss long-range sidelobes entirely. This requires either deeper U-Nets (larger$ L $), dilated convolutions, or explicit PSF-conditioned architectures that handle long-range correlations analytically.$ \square$

ex20-12-physics-channel-benefit

Hard

Consider a linear Gaussian model: $\mathbf{c} \sim \mathcal{CN}(\mathbf{0}, \sigma_c^2\mathbf{I})$ and $\mathbf{w} \sim \mathcal{CN}(\mathbf{0}, \sigma^2\mathbf{I})$ .

(a) Compute the MMSE estimator $\hat{\mathbf{c}}^{\text{blind}} = \mathbb{E}[\mathbf{c} \mid \hat{\mathbf{c}}^{\text{BP}}]$ and its MSE when $\mathbf{G}$ is known.

(b) Compute the MMSE estimator $\hat{\mathbf{c}}^{\text{informed}} = \mathbb{E}[\mathbf{c} \mid \hat{\mathbf{c}}^{\text{BP}}, \mathbf{G}]$ and its MSE when both $\hat{\mathbf{c}}^{\text{BP}}$ and $\mathbf{G}$ are given.

(c) Show that the informed estimator achieves lower MSE whenever $\mathbf{G} \neq c\mathbf{I}$ for any scalar $c$ .

Show Hint

For the Gaussian linear model, the MMSE estimator is the Wiener filter.

The conditional covariance given $\hat{\mathbf{c}}^{\text{BP}}$ uses the Gram matrix structure.

When $\mathbf{G}$ varies spatially (non-constant diagonal), conditioning on it provides per-pixel SNR information.

Solution

MMSE without PSF knowledge

Given $\hat{\mathbf{c}}^{\text{BP}} = \mathbf{G}\mathbf{c} + \tilde{\mathbf{w}}$ with $\operatorname{Cov}(\tilde{\mathbf{w}}) = \sigma^2\mathbf{G}$ and $\mathbb{E}[\mathbf{c}\mathbf{c}^{H}] = \sigma_c^2\mathbf{I}$ , the Wiener filter is:

$\hat{\mathbf{c}}^{\text{blind}} = \sigma_c^2\mathbf{G}^H (\sigma_c^2\mathbf{G}\mathbf{G}^H + \sigma^2\mathbf{G})^{-1} \hat{\mathbf{c}}^{\text{BP}}.$

MSE is $\operatorname{tr}(\sigma_c^2\mathbf{I} - \sigma_c^2\mathbf{G}^H(\sigma_c^2\mathbf{G}\mathbf{G}^H + \sigma^2\mathbf{G})^{-1}\sigma_c^2\mathbf{G})$ .

MMSE with PSF knowledge

When $\mathbf{G}$ is provided, the per-pixel effective SNR $\text{SNR}_i = \sigma_c^2 G_{ii}/\sigma^2$ is known. For a diagonal $\mathbf{G}$ (shift-invariant approximation), the informed Wiener filter decomposes per-pixel: $\hat{c}_i^{\text{informed}} = \frac{\sigma_c^2 G_{ii}}{\sigma_c^2 G_{ii}^2 + \sigma^2 G_{ii}} [\hat{\mathbf{c}}^{\text{BP}}]_i = \frac{\sigma_c^2 G_{ii}}{\sigma_c^2 G_{ii} + \sigma^2} \cdot \frac{[\hat{\mathbf{c}}^{\text{BP}}]_i}{G_{ii}}.$ MSE is $\sum_i \frac{\sigma_c^2 \sigma^2}{\sigma_c^2 G_{ii} + \sigma^2}$ .

Comparison

For the blind estimator, the Wiener weights are the same for all pixels (assuming uniform $\mathbf{G}$ ). For the informed estimator, pixels with large $G_{ii}$ (high SNR) are relied upon more; pixels with small $G_{ii}$ (low SNR, i.e., dark side of the beam) are trusted less. When $\mathbf{G}$ has non-constant diagonal (physically structured operators), the informed estimator achieves strictly lower MSE. $\square$

ex20-13-modl-optimal-lambda

Challenge

In MoDL with a linear denoiser $\mathcal{D}_\theta(\mathbf{c}) = \mathbf{D}\mathbf{c}$ (e.g., a linear Wiener denoiser), find the optimal step size $\lambda^*$ that minimises the one-step reconstruction MSE

$\text{MSE}(\lambda) = \mathbb{E}\bigl[\|(\mathbf{A}^{H}\mathbf{A} + \lambda\mathbf{I})^{-1}(\mathbf{A}^{H}\mathbf{y} + \lambda\mathbf{D}\mathbf{c}_{0}) - \mathbf{c}\|^2\bigr],$

where $\mathbf{c}_{0} = \mathbf{c} + \boldsymbol{\epsilon}$ is a noisy initial estimate with $\boldsymbol{\epsilon} \sim \mathcal{CN}(\mathbf{0}, \sigma_\epsilon^2\mathbf{I})$ . Express $\lambda^*$ in terms of the eigenvalues of $\mathbf{A}^{H}\mathbf{A}$ and $\mathbf{D}$ .

Show Hint

Work in the eigenbasis of $\mathbf{G} = \mathbf{A}^{H}\mathbf{A} = \mathbf{U}\boldsymbol{\Lambda}\mathbf{U}^H$ .

For each eigenvalue $\lambda_i$ of $\mathbf{G}$ , the one-step MoDL update is scalar.

Minimise the per-eigenmode MSE to find the optimal $\lambda^*_i$ and then argue for a single global $\lambda^*$ .

Solution

Diagonalise in the eigenbasis of G

Let $\mathbf{G} = \mathbf{U}\boldsymbol{\Lambda}\mathbf{U}^H$ with eigenvalues $g_i$ . In this basis, the MoDL update is scalar per mode $i$ : $\hat{c}_i = \frac{g_i y_i' + \lambda d_i c_{0,i}'}{g_i + \lambda},$ where $y_i' = [\mathbf{U}^H\mathbf{A}^{H}\mathbf{y}]_i$ , $d_i$ is the $i$ -th diagonal of $\mathbf{U}^H\mathbf{D}\mathbf{U}$ , and $c_{0,i}' = [\mathbf{U}^H\mathbf{c}_{0}]_i$ .

Per-mode MSE

The MSE for mode $i$ is: $\text{MSE}_i(\lambda) = \left|\frac{g_i + \lambda d_i}{g_i + \lambda} - 1\right|^2 \sigma_c^2 + \frac{g_i^2 \sigma^2 + \lambda^2 d_i^2 \sigma_\epsilon^2}{(g_i + \lambda)^2}.$

Optimal lambda

Differentiating $\text{MSE}_i(\lambda)$ with respect to $\lambda$ and setting to zero gives a mode-specific optimal $\lambda_i^*$ . For a single global $\lambda^*$ , minimise the total MSE $\sum_i \text{MSE}_i(\lambda)$ — a scalar optimisation problem that depends on the spectrum of $\mathbf{G}$ , the denoiser $\mathbf{D}$ , and the noise levels $\sigma^2, \sigma_\epsilon^2$ .

The result confirms that learning $\lambda_k$ is strictly better than fixing it: the optimal value depends on the spectral content of the current estimate, which changes across MoDL iterations. $\square$

ex20-14-gan-hallucination

Challenge

Construct a formal example showing that a GAN-trained reconstruction network can produce a hallucinated target. Consider a two-pixel scene $\mathbf{c} \in \{(1,0)^\top,\, (0,1)^\top\}$ with equal probability and a single measurement $y = c_1 + c_2 + n$ with $n \sim \mathcal{N}(0, \sigma^2)$ .

(a) Show that the measurement provides no information about which pixel is active. (b) Compare the MSE-trained and GAN-trained network outputs. (c) Explain which is more dangerous for radar target detection and why.

Show Hint

Both scenes produce the same measurement distribution: $y \sim \mathcal{N}(1, \sigma^2)$ .

MSE estimator: posterior mean over both modes. GAN: samples one mode at random.

Consider the false alarm rate for each estimator.

Solution

Measurement is uninformative

For both $\mathbf{c} = (1,0)^\top$ and $\mathbf{c} = (0,1)^\top$ : $y = c_1 + c_2 + n = 1 + n \sim \mathcal{N}(1, \sigma^2)$ . The likelihood $p(y \mid \mathbf{c})$ is identical for both scenes. By Bayes' theorem, $p(\mathbf{c} \mid y) = p(\mathbf{c}) = 1/2$ for each.

MSE vs. GAN outputs

MSE: $\hat{\mathbf{c}} = \mathbb{E}[\mathbf{c} \mid y] = \frac{1}{2}(1,0)^\top + \frac{1}{2}(0,1)^\top = (0.5, 0.5)^\top$ .
GAN: Samples from $p(\mathbf{c} \mid y)$ , outputting $(1,0)^\top$ or $(0,1)^\top$ with probability $1/2$ each.

Safety analysis for radar

The GAN output always places a target at exactly one pixel — but may choose the wrong pixel 50% of the time. This constitutes a false positive in target localisation — not a false detection overall, but a false position assignment that could cause a tracker to follow a hallucinated trajectory.

The MSE output $(0.5, 0.5)^\top$ is less decisive: a detection threshold of $>0.7$ would declare no target at either pixel (correct), whereas the GAN would always trigger a detection at one of two locations (50% wrong).

For radar systems operating under the Neyman-Pearson framework, the MSE estimator with threshold control is safer than the GAN estimator whose false alarm rate is uncontrolled. $\square$

ex20-15-geometry-generalisation

Challenge

A physics-informed U-Net takes inputs $(\hat{\mathbf{c}}^{\text{BP}}, \operatorname{diag}(\mathbf{G}))$ and is trained on a family of MIMO sensing operators $\{\mathbf{A}(\boldsymbol{\alpha})\}$ parameterised by array geometry $\boldsymbol{\alpha}$ (e.g., antenna positions). Using PAC-Bayes theory, derive a generalisation bound on the expected reconstruction error for a new geometry $\boldsymbol{\alpha}^*$ not seen during training.

Express the bound in terms of: training error, number of training geometries $T$ , network complexity (number of parameters $P$ ), and scene dimension $N$ .

Show Hint

Apply the union bound over training geometries to extend the single-geometry generalisation bound.

The PAC-Bayes prior on network weights has complexity $\mathcal{O}(P \log P / T)$ .

The geometry-specific component of the error depends on $\|\mathbf{G}(\boldsymbol{\alpha}^*) - \mathbf{G}(\boldsymbol{\alpha}_k)\|$ for the nearest training geometry $\boldsymbol{\alpha}_k$ .

Solution

Decompose into training and geometry-transfer error

For a new geometry $\boldsymbol{\alpha}^*$ , find the nearest training geometry $\boldsymbol{\alpha}_{k^*} = \arg\min_k \|\mathbf{G}(\boldsymbol{\alpha}^*) - \mathbf{G}(\boldsymbol{\alpha}_k)\|$ . By the triangle inequality: $\text{Err}(\boldsymbol{\alpha}^*) \leq \text{Err}(\boldsymbol{\alpha}_{k^*}) + L_f \|\mathbf{G}(\boldsymbol{\alpha}^*) - \mathbf{G}(\boldsymbol{\alpha}_{k^*})\| \cdot \text{const}.$

PAC-Bayes bound for training error

By the PAC-Bayes theorem, with probability $\geq 1 - \delta$ over $T$ training samples from geometry $\boldsymbol{\alpha}_{k^*}$ : $\mathbb{E}[\text{Err}(\boldsymbol{\alpha}_{k^*})] \leq \hat{\text{Err}}_T(\boldsymbol{\alpha}_{k^*}) + \sqrt{\frac{P\log P + \log(T/\delta)}{2T}}.$

Combined generalisation bound

$\mathbb{E}[\text{Err}(\boldsymbol{\alpha}^*)] \leq \underbrace{\hat{\text{Err}}_T}_{\text{train error}} + \underbrace{\sqrt{\frac{P\log P + \log(T/\delta)}{2T}}}_{\text{complexity}} + \underbrace{L_f \cdot \delta_G}_{\text{geometry gap}},$ $where$ \delta_G = |\mathbf{G}(\boldsymbol{\alpha}^) - \mathbf{G}(\boldsymbol{\alpha}_{k^})| $is the Gram matrix gap to the nearest training geometry. **Implication:** Geometry generalisation requires covering the geometry space$ {\boldsymbol{\alpha}} $densely enough so that$ \delta_G $is small, even for physics-informed networks. Domain randomisation (training on many geometries) reduces$ \delta_G $at the cost of larger complexity.$ \square$

Exercises

ex20-01-mf-decomposition

Compute the matched-filter image

Characterise the noise covariance

ex20-02-dc-idempotent

Show measurement consistency

Apply DC a second time

ex20-03-loss-estimator

Optimal MSE network is the posterior mean

Compute the posterior mean for binary scene

ex20-04-gram-random

Diagonal entries

Off-diagonal mean

Off-diagonal variance

ex20-05-dc-mri

Verify orthonormal rows

Write the DC layer

k-space interpretation

ex20-06-modl-cg

Derive the normal equations

Limiting cases

ex20-07-perceptual-pseudo-metric

Identify the null space

Counterexample

Implication for RF imaging

ex20-08-transfer-bound

Split the error

Bound the transfer error

Final bound

ex20-09-sidelobe-correlation

Sidelobe at pixel j

Covariance between sidelobe and noise

Why the U-Net cannot distinguish them

ex20-10-modl-convergence

Identify the proximal operator

Convergence conditions

ex20-11-unet-receptive-field

Base case ($ll = 0$)

Recursive formula

Solve the recursion

ex20-12-physics-channel-benefit

MMSE without PSF knowledge

MMSE with PSF knowledge

Comparison

ex20-13-modl-optimal-lambda

Diagonalise in the eigenbasis of G

Per-mode MSE

Optimal lambda

ex20-14-gan-hallucination

Measurement is uninformative

MSE vs. GAN outputs

Safety analysis for radar

ex20-15-geometry-generalisation

Decompose into training and geometry-transfer error

PAC-Bayes bound for training error

Combined generalisation bound

Base case ($ll = 0$)