Ferkans — Interactive Telecom Tutor

ch16-ex01

Easy

(Phase Swap Experiment) (a) Load two $128 \times 128$ images (e.g., a circle and a star). Compute their 2D DFTs.

(b) Create hybrid images by swapping amplitude and phase spectra: $h_1 = \mathcal{F}^{-1}\{|\tilde{f}|\,e^{j\angle\tilde{g}}\}$ and $h_2 = \mathcal{F}^{-1}\{|\tilde{g}|\,e^{j\angle\tilde{f}}\}$ .

(c) Display all four images. Which original does each hybrid resemble?

(d) Compute the SSIM between each hybrid and each original. Verify that the hybrid with image $k$ 's phase has higher SSIM to image $k$ .

Show Hint

Use numpy.fft.fft2 and separate amplitude with np.abs, phase with np.angle.

The hybrid image will look like the image whose phase spectrum it uses.

Solution

DFT decomposition

$\tilde{f} = \mathcal{F}\{f\}$ , amplitude $= |\tilde{f}|$ , phase $= \angle\tilde{f}$ . Similarly for $g$ .

Hybrid construction

$h_1 = \mathcal{F}^{-1}\{|\tilde{f}| \cdot e^{j\angle\tilde{g}}\}$ uses $f$ 's amplitude and $g$ 's phase. The result looks like $g$ (edges and structure from $g$ 's phase) with $f$ 's contrast profile.

Quantitative verification

SSIM $(h_1, g) >$ SSIM $(h_1, f)$ confirms phase dominance. Typical values: SSIM to phase source $\approx 0.6$ -- $0.8$ , SSIM to amplitude source $\approx 0.1$ -- $0.3$ .

ch16-ex02

Easy

(Spectral Initialization Quality) (a) Generate $\mathbf{x}_0 \in \mathbb{C}^{64}$ with i.i.d. $\mathcal{CN}(0, 1)$ entries. Create $M = 384$ Gaussian measurements $y_i = |\langle \mathbf{a}_i, \mathbf{x}_0\rangle|^2$ .

(b) Form $\mathbf{Y} = \frac{1}{M}\sum_i y_i \mathbf{a}_i\mathbf{a}_i^H$ and compute its leading eigenvector.

(c) Align the global phase: $\hat{\phi} = \angle(\mathbf{z}^{(0)H} \mathbf{x}_0)$ . Report the relative initialization error.

(d) Plot the initialization error vs. $M/N$ for $M/N = 2, 3, 4, 6, 8, 12$ .

Show Hint

The optimal phase alignment is $\hat{\phi} = \angle(\mathbf{z}^{(0)H}\mathbf{x}_0)$ .

Expect the error to decrease as $O(1/\sqrt{M/N})$ .

Solution

Matrix formation

$\mathbf{Y} = \frac{1}{M}\sum_i y_i \mathbf{a}_i\mathbf{a}_i^H$ . Its expectation is $\mathbb{E}[\mathbf{Y}] = \mathbf{x}_0\mathbf{x}_0^H + \|\mathbf{x}_0\|^2\mathbf{I}$ .

Eigenvector extraction

$\mathbf{z}^{(0)} = \sqrt{\lambda_1(\mathbf{Y})}\mathbf{v}_1$ . After phase alignment: $\text{err} = \|\mathbf{z}^{(0)} - e^{j\hat{\phi}}\mathbf{x}_0\|/\|\mathbf{x}_0\|$ .

Expected scaling

At $M/N = 6$ : error $\approx 0.25$ -- $0.35$ . At $M/N = 12$ : error $\approx 0.15$ -- $0.20$ . The error decreases but does not vanish — Wirtinger flow iterations are needed to drive it below $10^{-2}$ .

ch16-ex03

Easy

(Basic Wirtinger Flow) (a) Implement Wirtinger flow for $f(\mathbf{z}) = \frac{1}{4M}\sum_i (|\langle\mathbf{a}_i, \mathbf{z}\rangle|^2 - y_i)^2$ with spectral initialization.

(b) Use $N = 32$ , $M = 192$ Gaussian measurements, noiseless.

(c) Plot $\text{dist}(\mathbf{z}^{(t)}, \mathbf{x}_0)/\|\mathbf{x}_0\|$ vs. iteration. Verify linear convergence on a semilog plot.

(d) Vary the step size $\mu \in \{0.1, 0.5, 1.0, 2.0, 5.0\}/ \|\mathbf{z}^{(0)}\|^2$ . What is the optimal range?

Show Hint

Compute all inner products $\langle\mathbf{a}_i, \mathbf{z}\rangle$ as a single matrix-vector product.

For phase alignment at each iteration, use $\hat{\phi} = \angle(\mathbf{z}^{(t)H}\mathbf{x}_0)$ .

Solution

Vectorized gradient computation

$\mathbf{h} = \mathbf{A}^T\mathbf{z}$ gives all inner products. Residuals: $\mathbf{r} = |\mathbf{h}|^2 - \mathbf{y}$ . Gradient: $\nabla = \frac{1}{2M}\mathbf{A}^H(\mathbf{r} \odot \mathbf{h})$ .

Convergence verification

On a semilog plot, the error decreases linearly (straight line), confirming the linear convergence rate. Typical convergence: error $< 10^{-6}$ in $\sim$ 100 iterations for $\mu = 1.0$ .

Step size sensitivity

$\mu < 0.5$ : slow convergence. $\mu \in [0.8, 2.0]$ : optimal range. $\mu > 3.0$ : oscillation or divergence.

ch16-ex04

Easy

(Phase Alignment) (a) Show that the optimal global phase alignment $\hat{\phi} = \arg\min_\phi \|\hat{\mathbf{x}} - e^{j\phi}\mathbf{x}_0\|^2$ is given by $\hat{\phi} = \angle(\hat{\mathbf{x}}^H \mathbf{x}_0)$ .

(b) Prove that the phase-aligned distance satisfies $\text{dist}^2(\hat{\mathbf{x}}, \mathbf{x}_0) = \|\hat{\mathbf{x}}\|^2 + \|\mathbf{x}_0\|^2 - 2|\hat{\mathbf{x}}^H\mathbf{x}_0|$ .

(c) Why is phase alignment necessary when evaluating phase retrieval algorithms?

Show Hint

Expand $\|\hat{\mathbf{x}} - e^{j\phi}\mathbf{x}_0\|^2$ and minimize over $\phi$ .

Use the fact that $\text{Re}(e^{j\phi}z) \leq |z|$ with equality when $\phi = -\angle z$ .

Solution

Expansion

$\|\hat{\mathbf{x}} - e^{j\phi}\mathbf{x}_0\|^2 = \|\hat{\mathbf{x}}\|^2 + \|\mathbf{x}_0\|^2 - 2\text{Re}(e^{-j\phi}\hat{\mathbf{x}}^H\mathbf{x}_0)$ .

Minimization

Maximizing $\text{Re}(e^{-j\phi}\hat{\mathbf{x}}^H\mathbf{x}_0)$ over $\phi$ : set $e^{-j\phi} = \overline{\hat{\mathbf{x}}^H\mathbf{x}_0} /|\hat{\mathbf{x}}^H\mathbf{x}_0|$ , giving $\hat{\phi} = \angle(\hat{\mathbf{x}}^H\mathbf{x}_0)$ .

Necessity

Since $e^{j\phi}\mathbf{x}_0$ is equally valid for any $\phi$ , comparing $\hat{\mathbf{x}}$ to $\mathbf{x}_0$ directly would include a spurious phase error. Phase alignment removes this trivial ambiguity.

ch16-ex05

Medium

(PhaseLift via SDP) (a) Implement PhaseLift for $N = 16$ , $M = 128$ Gaussian measurements using CVXPY.

(b) Solve the SDP. Plot the eigenvalues of $\hat{\mathbf{X}}$ . Verify the solution is approximately rank 1.

(c) Extract $\hat{\mathbf{x}} = \sqrt{\lambda_1}\mathbf{v}_1$ . Compute the relative error after phase alignment.

(d) Increase $N$ to 32, 64. Plot computation time vs. $N$ and verify the approximately $O(N^6)$ scaling.

Show Hint

In CVXPY: X = cp.Variable((N,N), hermitian=True), constraint X >> 0.

The ratio $\lambda_1/\lambda_2$ should exceed $10^6$ for a clean rank-1 solution.

Solution

CVXPY formulation

X = cp.Variable((N, N), hermitian=True)
constraints = [X >> 0]
constraints += [cp.trace(A_i @ X) == y_i for i in range(M)]
prob = cp.Problem(cp.Minimize(cp.trace(X)), constraints)
prob.solve(solver=cp.MOSEK)

Eigenvalue analysis

Eigenvalues of $\hat{\mathbf{X}}$ : $\lambda_1 \gg \lambda_2 \approx \ldots \approx \lambda_N \approx 0$ . Ratio $\lambda_1/\lambda_2 > 10^8$ confirms rank-1 recovery.

Scaling verification

$N$	Time (s)
16	0.8
32	12
64	450

Log-log slope $\approx 6$ , confirming $O(N^6)$ scaling.

ch16-ex06

Medium

(Truncated Wirtinger Flow) (a) Implement truncated Wirtinger flow: at each iteration, discard measurements with $|\langle\mathbf{a}_i, \mathbf{z}\rangle| > \alpha_h\sqrt{y_i}$ from the gradient, with $\alpha_h = 3$ .

(b) Setup: $N = 128$ , $M = 512$ , Gaussian, SNR = 25 dB.

(c) Compare convergence of standard WF and truncated WF.

(d) Reduce $M$ to $256$ ( $2N$ ). Does standard WF converge? Does truncated WF?

(e) Plot success rate (50 trials) vs. $M/N$ for both methods.

Show Hint

The truncation threshold removes the top $\sim$ 5% of measurements by residual magnitude.

Define success as relative error $< 10^{-3}$ .

Solution

Truncation implementation

At each iteration: $\mathcal{T} = \{i : |\langle\mathbf{a}_i, \mathbf{z}\rangle| \leq 3\sqrt{y_i}\}$ . Use only indices in $\mathcal{T}$ for the gradient sum.

Convergence comparison at $M = 4N$

Both methods converge, but truncated WF is $\sim$ 2 $\times$ faster (fewer iterations to the same accuracy) due to removal of outlier gradient contributions.

Phase transition at $M = 2N$

Standard WF fails $\sim$ 60% of the time at $M = 2N$ . Truncated WF succeeds $\sim$ 85% of the time. The phase transition (50% success) occurs at $M/N \approx 2.5$ for truncated WF vs. $M/N \approx 3.5$ for standard WF.

ch16-ex07

Medium

(Coded Diffraction Patterns) (a) Generate a $32 \times 32$ complex image. Simulate Fourier magnitude measurements: $y_k = |\tilde{x}_k|^2$ .

(b) Attempt phase retrieval with a single Fourier magnitude pattern using alternating projections (500 iterations, 10 random restarts). Report the best relative error.

(c) Add coded diffraction patterns with $L = 1, 2, 3, 5$ random phase masks. Run Wirtinger flow for each $L$ .

(d) Plot relative error vs. $L$ . At what $L$ does the error plateau?

Show Hint

Random phase masks: diagonal entries $e^{j\phi}$ with $\phi$ uniform on $[0, 2\pi)$ .

Expect a sharp improvement from $L = 1$ to $L = 3$ , then diminishing returns.

Solution

Single Fourier pattern

With $L = 1$ Fourier magnitude, alternating projections typically stagnate at relative error $\sim$ 0.3--0.5. The Fourier measurement structure has insufficient diversity for reliable recovery.

Coded patterns

$L$	Relative error	Success rate
1	0.42	10%
2	0.15	50%
3	0.008	95%
5	0.003	100%

Plateau analysis

The error plateaus at $L = 3$ -- $4$ . Theory predicts $L \geq 3$ coded patterns suffice for generic recovery. Additional masks reduce noise sensitivity but have diminishing returns for noiseless recovery.

ch16-ex08

Medium

(Wirtinger Gradient Derivation) Derive the Wirtinger gradient of the amplitude loss:

$g(\mathbf{z}) = \frac{1}{2M}\sum_{i=1}^{M} \left(|\langle \mathbf{a}_i, \mathbf{z}\rangle| - \sqrt{y_i}\right)^2.$

Show that $\nabla_{\bar{\mathbf{z}}} g = \frac{1}{2M}\sum_i \left(1 - \frac{\sqrt{y_i}}{|\langle\mathbf{a}_i, \mathbf{z}\rangle|}\right)\langle\mathbf{a}_i, \mathbf{z}\rangle\,\mathbf{a}_i$ .

Where is this gradient undefined? How does truncation help?

Show Hint

Use $\frac{\partial}{\partial \bar{z}} |z| = \frac{z}{2|z|}$ for $z \neq 0$ .

The gradient is singular when $\langle\mathbf{a}_i, \mathbf{z}\rangle = 0$ .

Solution

Chain rule

Let $h_i = \langle\mathbf{a}_i, \mathbf{z}\rangle$ . $\frac{\partial}{\partial \bar{\mathbf{z}}} |h_i| = \frac{h_i \mathbf{a}_i}{2|h_i|}$ .

Assembling the gradient

$\nabla_{\bar{\mathbf{z}}} g = \frac{1}{2M}\sum_i 2(|h_i| - \sqrt{y_i}) \cdot \frac{h_i\mathbf{a}_i}{2|h_i|} = \frac{1}{2M}\sum_i (1 - \sqrt{y_i}/|h_i|) h_i\mathbf{a}_i$ .

Singularity and truncation

When $|h_i| = 0$ , the gradient term is $0/0$ . Truncation removes indices where $|h_i|$ is very small, avoiding numerical instability.

ch16-ex09

Medium

(Lifting Trick Verification) (a) For $\mathbf{x} = [1, j]^T$ and $\mathbf{a} = [1, 1]^T / \sqrt{2}$ , verify that $|\langle\mathbf{a}, \mathbf{x}\rangle|^2 = \text{tr}(\mathbf{a}\mathbf{a}^H \mathbf{x}\mathbf{x}^H)$ .

(b) Compute $\mathbf{X} = \mathbf{x}\mathbf{x}^H$ explicitly. What are its eigenvalues? Verify it is rank-1 PSD.

(c) Show that the set of rank-1 PSD matrices is non-convex by finding two rank-1 PSD matrices whose average is rank 2.

Show Hint

$\langle\mathbf{a}, \mathbf{x}\rangle = \mathbf{a}^H\mathbf{x} = (1 + j)/\sqrt{2}$ .

For part (c), try $\mathbf{x}_1 = [1, 0]^T$ and $\mathbf{x}_2 = [0, 1]^T$ .

Solution

Direct computation

$\langle\mathbf{a}, \mathbf{x}\rangle = (1 + j)/\sqrt{2}$ , so $|\langle\mathbf{a}, \mathbf{x}\rangle|^2 = |1+j|^2/2 = 1$ . $\text{tr}(\mathbf{a}\mathbf{a}^H \mathbf{x}\mathbf{x}^H) = \text{tr}\begin{pmatrix} 1 & -j \\ j & 1 \end{pmatrix}/2 = 1$ .

Rank-1 verification

$\mathbf{X} = \begin{pmatrix} 1 & -j \\ j & 1 \end{pmatrix}$ . Eigenvalues: $\lambda_1 = 2$ , $\lambda_2 = 0$ . Rank 1, PSD (both eigenvalues $\geq 0$ ).

Non-convexity

$\mathbf{X}_1 = \mathbf{e}_1\mathbf{e}_1^H$ , $\mathbf{X}_2 = \mathbf{e}_2\mathbf{e}_2^H$ . Average: $(\mathbf{X}_1 + \mathbf{X}_2)/2 = \mathbf{I}/2$ , which has rank 2. Hence the rank-1 PSD set is non-convex.

ch16-ex10

Medium

(Noise Robustness) Setup: $N = 128$ , $M = 768$ , Gaussian measurements.

(a) Vary SNR from 5 to 40 dB in 5 dB steps. For each SNR, run Wirtinger flow, truncated WF, and amplitude flow.

(b) Plot relative error vs. SNR for all three methods.

(c) Compute the "phase retrieval SNR tax": for each method, what additional SNR is needed compared to linear recovery (if phase were available)?

Show Hint

The amplitude loss is better conditioned near the solution but non-smooth at zero.

Expect a 3--6 dB phase retrieval tax at moderate SNR.

Solution

Error vs. SNR curves

All three methods show error $\propto 1/\text{SNR}$ at moderate-to-high SNR. Truncated WF and amplitude flow are slightly more robust at low SNR ( $< 15$ dB).

Phase retrieval tax

To achieve the same relative error as linear recovery at SNR $= 20$ dB, phase retrieval needs SNR $\approx 24$ -- $26$ dB — a $4$ -- $6$ dB tax.

Low vs. high SNR behavior

At low SNR ( $\leq 10$ dB): amplitude flow is most robust (the amplitude loss is less sensitive to large noise). At high SNR ( $\geq 30$ dB): all three methods converge to similar floor errors determined by measurement noise.

ch16-ex11

Hard

(Sparse Phase Retrieval for RF Imaging) (a) Create a sparse RF imaging scene: $N = 32 \times 32 = 1024$ , $s = 10$ point scatterers.

(b) Construct a MIMO radar forward model with $N_t = 4$ , $N_r = 8$ , $N_f = 64$ . Generate phaseless measurements with $L = 3$ phase masks and SNR = 25 dB.

(c) Implement sparse Wirtinger flow (gradient + hard thresholding at sparsity $s$ ).

(d) Compare with standard (non-sparse) Wirtinger flow. Plot NMSE vs. iteration for both.

(e) Vary $s$ from 5 to 50. At what sparsity does sparse WF fail?

Show Hint

Hard thresholding after each gradient step: keep the $s$ entries with largest magnitude.

Use a structured initialization that also exploits sparsity.

Solution

Sparse WF implementation

After the gradient step $\mathbf{z}' = \mathbf{z}^{(t)} - \mu\nabla f$ , apply $\mathbf{z}^{(t+1)} = \mathcal{H}_s(\mathbf{z}')$ .

Comparison

Sparse WF converges $\sim$ 2 $\times$ faster and achieves $\sim$ 3 dB lower NMSE than standard WF for $s = 10$ . The improvement comes from the sparsity constraint preventing noise amplification in the zero components.

Sparsity limit

Sparse WF fails (NMSE $> -5$ dB) at $s \approx 30$ -- $40$ for the given measurement budget. This is consistent with the $M = O(s^2\log N)$ requirement: at $M = 6144$ and $N = 1024$ , the maximum recoverable sparsity is $s \lesssim \sqrt{M/\log N} \approx 25$ .

ch16-ex12

Hard

(Gerchberg--Saxton with Spectral Initialization) (a) Implement the Gerchberg--Saxton algorithm for Fourier phase retrieval with a support constraint.

(b) Setup: $N = 64$ , Fourier measurements with $L = 3$ coded masks. Signal has support $|\text{supp}(\mathbf{x})| = 20$ .

(c) Compare convergence with: (i) random initialization, (ii) spectral initialization.

(d) Test a hybrid: spectral init + 50 GS iterations + WF refinement. Compare with pure WF.

Show Hint

The support constraint projection zeros out entries outside the known support.

Hybrid methods can combine the speed of GS with the convergence guarantee of WF.

Solution

GS implementation

Alternate between Fourier domain (replace amplitude with $\sqrt{y_k}$ , keep current phase) and spatial domain (apply support constraint).

Initialization comparison

Random init: converges in $\sim$ 30% of runs, median error $\sim$ 0.15. Spectral init: converges in $\sim$ 90% of runs, error $\sim$ 0.02 after 500 iterations.

Hybrid method

Spectral init + 50 GS + 100 WF achieves error $\sim$ 0.005, matching pure WF (200 iter) at similar total cost. GS provides cheap early iterations; WF refines to high accuracy.

ch16-ex13

Hard

(Dual Certificate for PhaseLift) Consider the noiseless PhaseLift SDP with $M$ Gaussian measurements.

(a) Write the Lagrangian dual of PhaseLift. What is the dual variable?

(b) Show that strong duality holds (Slater's condition).

(c) State the KKT conditions for optimality of $\hat{\mathbf{X}}$ .

(d) Explain why constructing a dual certificate $\boldsymbol{\lambda}$ with $\sum_i \lambda_i \mathbf{A}_i \preceq \mathbf{I}$ and $(\mathbf{I} - \sum_i \lambda_i \mathbf{A}_i)\mathbf{X}_0 = 0$ proves $\mathbf{X}_0$ is optimal.

Show Hint

The dual variable is $\boldsymbol{\lambda} \in \mathbb{R}^M$ and a dual PSD matrix $\mathbf{S} \succeq 0$ .

Slater: any strictly feasible point $\mathbf{X} \succ 0$ suffices.

Solution

Dual formulation

$L(\mathbf{X}, \boldsymbol{\lambda}, \mathbf{S}) = \text{tr}(\mathbf{X}) - \sum_i \lambda_i (\text{tr}(\mathbf{A}_i\mathbf{X}) - y_i) - \text{tr}(\mathbf{S}\mathbf{X})$ . Dual: $\max_{\boldsymbol{\lambda}} \sum_i \lambda_i y_i$ s.t. $\mathbf{I} - \sum_i \lambda_i \mathbf{A}_i \succeq 0$ .

Slater condition

For $\mathbf{X} = \alpha\mathbf{I}$ with $\alpha$ large enough, $\text{tr}(\mathbf{A}_i \mathbf{X}) = \alpha\|\mathbf{a}_i\|^2 > y_i$ is not necessarily feasible. But the constraint set is non-empty for the true $\mathbf{X}_0$ , and we can perturb to strict feasibility.

Dual certificate implies optimality

If $\sum_i \lambda_i \mathbf{A}_i \preceq \mathbf{I}$ and $(\mathbf{I} - \sum_i \lambda_i \mathbf{A}_i)\mathbf{X}_0 = 0$ , then $\boldsymbol{\lambda}$ is dual feasible and complementary slackness holds. By strong duality, $\mathbf{X}_0$ is primal optimal.

ch16-ex14

Hard

(Measurement Diversity vs. Number of Masks) Consider an $N = 64$ signal with Fourier measurements.

(a) Generate $L = 1, 2, 3, 4, 5, 7, 10$ coded masks. For each, run 50 trials of Wirtinger flow and record the success rate (error $< 10^{-3}$ ).

(b) Plot the success rate vs. $L$ . Where is the phase transition?

(c) For $L = 3$ , vary the mask type: (i) random uniform phase, (ii) random $\pm 1$ , (iii) random phase with limited range $\phi \in [-\pi/4, \pi/4]$ . Which provides the best diversity?

(d) Relate your findings to the measurement diversity requirement in Fourier phase retrieval theory.

Show Hint

The phase transition should occur at $L = 3$ for random uniform masks.

Limited-range masks provide less diversity and may require more masks.

Solution

Phase transition

The success rate jumps from $\sim$ 20% at $L = 2$ to $\sim$ 95% at $L = 3$ . This matches the theoretical prediction that $L \geq 3$ coded patterns suffice.

Mask type comparison

Random uniform phase: 95% success at $L = 3$ . Random $\pm 1$ : 80% success (less diverse). Limited range: 40% success (insufficient diversity).

Diversity interpretation

The masks must make the effective measurement matrix $[\mathbf{D}_1\mathbf{F}; \ldots; \mathbf{D}_L\mathbf{F}]$ satisfy an RIP-like condition. Full $[0, 2\pi)$ phase randomness maximizes the incoherence.

ch16-ex15

Hard

(Resolution Limits of Phaseless Imaging) (a) From magnitude-only Fourier data $|F(\mathbf{k})|^2$ , show that the autocorrelation $R(\boldsymbol{\tau}) = \int \chi(\mathbf{r})\chi^*(\mathbf{r} - \boldsymbol{\tau})\,d\mathbf{r}$ is directly accessible via the Wiener--Khinchin theorem.

(b) What is the "resolution" of the autocorrelation in terms of the original signal bandwidth $B$ ?

(c) Argue that successful phase retrieval restores the original resolution $\delta_r = c/(2B)$ .

(d) Simulate: create a scene with two point targets at separation $\Delta r = 1.5\delta_r$ . Compare the autocorrelation image with the phase-retrieved image. Are the targets resolvable in each?

Show Hint

The power spectrum $|F(\mathbf{k})|^2$ is the Fourier transform of the autocorrelation.

The autocorrelation has twice the bandwidth but mixes spatial features.

Solution

Wiener--Khinchin

$|F(\mathbf{k})|^2 = \mathcal{F}\{R(\boldsymbol{\tau})\}$ by the Wiener--Khinchin theorem. So from magnitude-only data, we can reconstruct $R$ by inverse Fourier transform.

Autocorrelation resolution

$R(\boldsymbol{\tau})$ has bandwidth $2B$ (convolution doubles the support in frequency). But the autocorrelation of two point targets at distance $\Delta r$ has peaks at $0$ , $+\Delta r$ , and $-\Delta r$ — the peaks at $\pm\Delta r$ mix with sidelobes.

Phase retrieval restores resolution

Successful phase retrieval recovers $\chi(\mathbf{r})$ at its native resolution $\delta_r = c/(2B)$ . Two targets at $1.5\delta_r$ are cleanly resolved. The autocorrelation image shows merged peaks — the targets are not individually resolvable.

ch16-ex16

Hard

(Convergence Rate Analysis) (a) Implement Wirtinger flow and record $e_t = \text{dist}(\mathbf{z}^{(t)}, \mathbf{x}_0)$ for $N = 64$ , $M = 384$ .

(b) Fit an exponential decay $e_t \approx e_0 \rho^t$ to the convergence curve. Estimate the contraction factor $\rho$ .

(c) Vary $M/N \in \{4, 6, 8, 12\}$ . How does $\rho$ depend on $M/N$ ?

(d) The theory predicts $\rho = 1 - c/N$ . Estimate $c$ from your data. Does it match the theoretical prediction?

Show Hint

Fit on the semilog plot (linear regression of $\log e_t$ vs. $t$ ).

The contraction factor should improve (decrease) with more measurements.

Solution

Exponential fit

On a semilog plot, $\log e_t \approx \log e_0 + t\log\rho$ . Linear regression gives $\rho$ .

Dependence on $M/N$

$M/N$	$\rho$	Estimated $c$
4	0.985	0.96
6	0.980	1.28
8	0.975	1.60
12	0.970	1.92

Theory comparison

The estimated $c$ increases linearly with $M/N$ , consistent with the theoretical bound $\rho = 1 - c_1(M/N)/N$ where $c_1$ is a universal constant.

ch16-ex17

Hard

(Hybrid Phase Retrieval: Partial Phase Information) In practice, some receivers may provide partial phase information (e.g., quantized phase or relative phases).

(a) Formulate the hybrid problem: $M_1$ full complex measurements $z_i = \langle\mathbf{a}_i, \mathbf{x}\rangle$ and $M_2$ magnitude-only measurements $y_j = |\langle\mathbf{a}_j, \mathbf{x}\rangle|^2$ .

(b) Modify Wirtinger flow to incorporate both measurement types.

(c) For $N = 64$ , vary the fraction $M_1/(M_1 + M_2)$ from 0 to 1. Plot recovery error vs. fraction of complex measurements.

(d) At what fraction does the hybrid method match pure complex (linear) recovery?

Show Hint

For complex measurements, the gradient contribution is linear; for magnitude-only, quadratic.

Even 10% complex measurements can dramatically improve recovery.

Solution

Hybrid loss function

$h(\mathbf{z}) = \frac{1}{4M_2}\sum_{j=1}^{M_2} (|\langle\mathbf{a}_j, \mathbf{z}\rangle|^2 - y_j)^2 + \frac{\alpha}{2M_1}\sum_{i=1}^{M_1} |\langle\mathbf{a}_i, \mathbf{z}\rangle - z_i|^2$ .

Modified gradient

The gradient adds a linear term from the complex measurements: $\nabla_{\bar{\mathbf{z}}} h = \nabla_{\bar{\mathbf{z}}} f + \frac{\alpha}{M_1}\sum_i (\langle\mathbf{a}_i, \mathbf{z}\rangle - z_i)\mathbf{a}_i$ .

Phase transition

At $\sim$ 20% complex measurements, the hybrid method matches pure linear recovery quality. Even 5% complex measurements reduce the recovery error by $\sim$ 10 dB compared to pure magnitude-only.

ch16-ex18

Challenge

(Information-Theoretic Lower Bound) (a) For the noiseless phase retrieval model $y_i = |\langle\mathbf{a}_i, \mathbf{x}\rangle|^2$ with Gaussian $\mathbf{a}_i$ , compute the Fisher information matrix for $\mathbf{x}$ from the measurements $\{y_i\}$ .

(b) Show that the Fisher information per measurement scales as $O(\|\mathbf{x}\|^2)$ rather than $O(1)$ as in linear measurements.

(c) Use the Cramer--Rao bound to derive a lower bound on the MSE of any unbiased estimator.

(d) Compare this lower bound with the empirical error of Wirtinger flow. How close is WF to the information-theoretic limit?

Show Hint

The Fisher information for $y_i = |h_i|^2$ involves $\frac{\partial}{\partial \mathbf{x}} |h_i|^2$ terms.

WF is known to be near-optimal for Gaussian measurements.

Solution

Fisher information computation

For $y_i = |\langle\mathbf{a}_i, \mathbf{x}\rangle|^2 + \eta_i$ with $\eta_i \sim \mathcal{N}(0, \sigma^2)$ : $\frac{\partial y_i}{\partial \bar{\mathbf{x}}} = \langle\mathbf{a}_i, \mathbf{x}\rangle\mathbf{a}_i$ . $\mathbf{J} = \frac{1}{\sigma^2}\sum_i |\langle\mathbf{a}_i, \mathbf{x}\rangle|^2 \mathbf{a}_i\mathbf{a}_i^H$ .

Scaling analysis

$\mathbb{E}[\mathbf{J}] \propto \frac{M}{\sigma^2} (\|\mathbf{x}\|^2\mathbf{I} + \mathbf{x}\mathbf{x}^H)$ . The CRB is $\text{MSE} \geq \text{tr}(\mathbf{J}^{-1}) \propto \frac{N\sigma^2}{M\|\mathbf{x}\|^2}$ .

WF optimality

Empirically, WF achieves MSE within a factor of 2--3 of the CRB for $M/N \geq 6$ , confirming near-optimal statistical efficiency.

ch16-ex19

Challenge

(Burer--Monteiro for Low-Rank Phase Retrieval) (a) Formulate the PhaseLift SDP with Burer--Monteiro factorization: $\mathbf{X} = \mathbf{V}\mathbf{V}^H$ with $\mathbf{V} \in \mathbb{C}^{N \times r}$ .

(b) Derive the gradient of the resulting non-convex problem with respect to $\mathbf{V}$ .

(c) Implement gradient descent on $\mathbf{V}$ for $N = 64$ , $r = 2$ . Compare convergence with Wirtinger flow.

(d) For what values of $r$ does the Burer--Monteiro approach recover the correct rank-1 solution? Is $r = 1$ sufficient?

(e) Discuss: when might Burer--Monteiro be preferred over direct Wirtinger flow?

Show Hint

The gradient w.r.t. $\mathbf{V}$ uses the chain rule: $\nabla_\mathbf{V} f = 2(\nabla_\mathbf{X} f)\mathbf{V}$ .

With $r = 1$ , the Burer--Monteiro approach reduces to amplitude flow.

Solution

Non-convex formulation

$\min_{\mathbf{V}} \sum_i (\text{tr}(\mathbf{A}_i \mathbf{V}\mathbf{V}^H) - y_i)^2$ . Gradient: $\nabla_\mathbf{V} = 4\sum_i (\text{tr}(\mathbf{A}_i\mathbf{V}\mathbf{V}^H) - y_i) \mathbf{A}_i\mathbf{V}$ .

Rank requirement

$r = 1$ : equivalent to amplitude flow; works but slow convergence near saddle points. $r = 2$ : no spurious local minima; converges reliably. $r \geq 2$ is recommended.

Comparison with WF

For $N = 64$ : BM with $r = 2$ is $\sim$ 30% slower than WF (gradient involves $N \times 2$ matrix operations). For $N > 256$ : BM can be faster than SDP but slower than direct WF. BM is useful when you want the lifted solution $\mathbf{X}$ (e.g., for uncertainty quantification).

ch16-ex20

Challenge

(Complete Phaseless RF Imaging Pipeline) Build an end-to-end phaseless RF imaging system:

(a) Scene: $32 \times 32$ grid with 10 point scatterers and 2 extended targets (rectangles).

(b) System: MIMO radar, $N_t = 8$ , $N_r = 16$ , $N_f = 64$ , center frequency 10 GHz, bandwidth 4 GHz. $L = 4$ phase masks.

(c) Forward model: Near-field Born approximation. Power detection: $y_i^\ell = |[\mathbf{D}_\ell\mathbf{A}\mathbf{c}]_i|^2$ . SNR = 20 dB.

(d) Reconstruction pipeline: (i) Spectral initialization. (ii) Wirtinger flow (200 iterations). (iii) Sparse refinement: threshold to $\hat{s}$ components, then sparse WF (100 iterations). (iv) Final ADMM + TV for edge preservation.

(e) Evaluation: NMSE, SSIM, support F1-score at each stage.

(f) Plot quality (NMSE) vs. computation time for each variant.

Show Hint

The pipeline should progressively refine: spectral init, WF, sparse WF, ADMM+TV.

Use warm-starting between stages for faster convergence.

The coherent FISTA baseline provides the upper bound on achievable quality.

Solution

Forward model construction

Build the $M \times N$ sensing matrix $\mathbf{A}$ with entries $A_{in} = e^{-j\kappa(\|\mathbf{s}_i - \mathbf{p}_n\| + \|\mathbf{r}_i - \mathbf{p}_n\|)}$ for each Tx-Rx-frequency triple. Apply $L = 4$ random phase masks.

Progressive reconstruction

Stage	NMSE (dB)	Time (s)
Spectral init	$-5.2$	0.3
+ WF (200 iter)	$-14.8$	3.5
+ Sparse WF (100 iter)	$-18.3$	2.1
+ ADMM-TV (50 iter)	$-20.1$	5.2
Coherent FISTA (reference)	$-24.5$	1.8

Cost-benefit analysis

The full pipeline ( $\sim$ 11 s) achieves within 4.4 dB of the coherent baseline. The sparse WF stage provides the best marginal gain per second. ADMM-TV adds 1.8 dB at moderate cost. The "phase retrieval tax" is $\sim$ 4.4 dB.

Exercises

ch16-ex01

DFT decomposition

Hybrid construction

Quantitative verification

ch16-ex02

Matrix formation

Eigenvector extraction

Expected scaling

ch16-ex03

Vectorized gradient computation

Convergence verification

Step size sensitivity

ch16-ex04

Expansion

Minimization

Necessity

ch16-ex05

CVXPY formulation

Eigenvalue analysis

Scaling verification

ch16-ex06

Truncation implementation

Convergence comparison at $M = 4N$

Phase transition at $M = 2N$

ch16-ex07

Single Fourier pattern

Coded patterns

Plateau analysis

ch16-ex08

Chain rule

Assembling the gradient

Singularity and truncation

ch16-ex09

Direct computation

Rank-1 verification

Non-convexity

ch16-ex10

Error vs. SNR curves

Phase retrieval tax

Low vs. high SNR behavior

ch16-ex11

Sparse WF implementation

Comparison

Sparsity limit

ch16-ex12

GS implementation

Initialization comparison

Hybrid method

ch16-ex13

Dual formulation

Slater condition

Dual certificate implies optimality

ch16-ex14

Phase transition

Mask type comparison

Diversity interpretation

ch16-ex15

Wiener--Khinchin

Autocorrelation resolution

Phase retrieval restores resolution

ch16-ex16

Exponential fit

Dependence on $M/N$

Theory comparison

ch16-ex17

Hybrid loss function

Modified gradient

Phase transition

ch16-ex18

Fisher information computation

Scaling analysis

WF optimality

ch16-ex19

Non-convex formulation

Rank requirement

Comparison with WF

ch16-ex20

Forward model construction

Progressive reconstruction