Exercises
ch16-ex01
Easy(Phase Swap Experiment) (a) Load two images (e.g., a circle and a star). Compute their 2D DFTs.
(b) Create hybrid images by swapping amplitude and phase spectra: and .
(c) Display all four images. Which original does each hybrid resemble?
(d) Compute the SSIM between each hybrid and each original. Verify that the hybrid with image 's phase has higher SSIM to image .
Use numpy.fft.fft2 and separate amplitude with np.abs, phase with np.angle.
The hybrid image will look like the image whose phase spectrum it uses.
DFT decomposition
, amplitude , phase . Similarly for .
Hybrid construction
uses 's amplitude and 's phase. The result looks like (edges and structure from 's phase) with 's contrast profile.
Quantitative verification
SSIM SSIM confirms phase dominance. Typical values: SSIM to phase source --, SSIM to amplitude source --.
ch16-ex02
Easy(Spectral Initialization Quality) (a) Generate with i.i.d. entries. Create Gaussian measurements .
(b) Form and compute its leading eigenvector.
(c) Align the global phase: . Report the relative initialization error.
(d) Plot the initialization error vs. for .
The optimal phase alignment is .
Expect the error to decrease as .
Matrix formation
. Its expectation is .
Eigenvector extraction
. After phase alignment: .
Expected scaling
At : error --. At : error --. The error decreases but does not vanish β Wirtinger flow iterations are needed to drive it below .
ch16-ex03
Easy(Basic Wirtinger Flow) (a) Implement Wirtinger flow for with spectral initialization.
(b) Use , Gaussian measurements, noiseless.
(c) Plot vs. iteration. Verify linear convergence on a semilog plot.
(d) Vary the step size . What is the optimal range?
Compute all inner products as a single matrix-vector product.
For phase alignment at each iteration, use .
Vectorized gradient computation
gives all inner products. Residuals: . Gradient: .
Convergence verification
On a semilog plot, the error decreases linearly (straight line), confirming the linear convergence rate. Typical convergence: error in 100 iterations for .
Step size sensitivity
: slow convergence. : optimal range. : oscillation or divergence.
ch16-ex04
Easy(Phase Alignment) (a) Show that the optimal global phase alignment is given by .
(b) Prove that the phase-aligned distance satisfies .
(c) Why is phase alignment necessary when evaluating phase retrieval algorithms?
Expand and minimize over .
Use the fact that with equality when .
Expansion
.
Minimization
Maximizing over : set , giving .
Necessity
Since is equally valid for any , comparing to directly would include a spurious phase error. Phase alignment removes this trivial ambiguity.
ch16-ex05
Medium(PhaseLift via SDP) (a) Implement PhaseLift for , Gaussian measurements using CVXPY.
(b) Solve the SDP. Plot the eigenvalues of . Verify the solution is approximately rank 1.
(c) Extract . Compute the relative error after phase alignment.
(d) Increase to 32, 64. Plot computation time vs. and verify the approximately scaling.
In CVXPY: X = cp.Variable((N,N), hermitian=True), constraint X >> 0.
The ratio should exceed for a clean rank-1 solution.
CVXPY formulation
X = cp.Variable((N, N), hermitian=True)
constraints = [X >> 0]
constraints += [cp.trace(A_i @ X) == y_i for i in range(M)]
prob = cp.Problem(cp.Minimize(cp.trace(X)), constraints)
prob.solve(solver=cp.MOSEK)
Eigenvalue analysis
Eigenvalues of : . Ratio confirms rank-1 recovery.
Scaling verification
| Time (s) | |
|---|---|
| 16 | 0.8 |
| 32 | 12 |
| 64 | 450 |
Log-log slope , confirming scaling.
ch16-ex06
Medium(Truncated Wirtinger Flow) (a) Implement truncated Wirtinger flow: at each iteration, discard measurements with from the gradient, with .
(b) Setup: , , Gaussian, SNR = 25 dB.
(c) Compare convergence of standard WF and truncated WF.
(d) Reduce to (). Does standard WF converge? Does truncated WF?
(e) Plot success rate (50 trials) vs. for both methods.
The truncation threshold removes the top 5% of measurements by residual magnitude.
Define success as relative error .
Truncation implementation
At each iteration: . Use only indices in for the gradient sum.
Convergence comparison at $M = 4N$
Both methods converge, but truncated WF is 2 faster (fewer iterations to the same accuracy) due to removal of outlier gradient contributions.
Phase transition at $M = 2N$
Standard WF fails 60% of the time at . Truncated WF succeeds 85% of the time. The phase transition (50% success) occurs at for truncated WF vs. for standard WF.
ch16-ex07
Medium(Coded Diffraction Patterns) (a) Generate a complex image. Simulate Fourier magnitude measurements: .
(b) Attempt phase retrieval with a single Fourier magnitude pattern using alternating projections (500 iterations, 10 random restarts). Report the best relative error.
(c) Add coded diffraction patterns with random phase masks. Run Wirtinger flow for each .
(d) Plot relative error vs. . At what does the error plateau?
Random phase masks: diagonal entries with uniform on .
Expect a sharp improvement from to , then diminishing returns.
Single Fourier pattern
With Fourier magnitude, alternating projections typically stagnate at relative error 0.3--0.5. The Fourier measurement structure has insufficient diversity for reliable recovery.
Coded patterns
| Relative error | Success rate | |
|---|---|---|
| 1 | 0.42 | 10% |
| 2 | 0.15 | 50% |
| 3 | 0.008 | 95% |
| 5 | 0.003 | 100% |
Plateau analysis
The error plateaus at --. Theory predicts coded patterns suffice for generic recovery. Additional masks reduce noise sensitivity but have diminishing returns for noiseless recovery.
ch16-ex08
Medium(Wirtinger Gradient Derivation) Derive the Wirtinger gradient of the amplitude loss:
Show that .
Where is this gradient undefined? How does truncation help?
Use for .
The gradient is singular when .
Chain rule
Let . .
Assembling the gradient
.
Singularity and truncation
When , the gradient term is . Truncation removes indices where is very small, avoiding numerical instability.
ch16-ex09
Medium(Lifting Trick Verification) (a) For and , verify that .
(b) Compute explicitly. What are its eigenvalues? Verify it is rank-1 PSD.
(c) Show that the set of rank-1 PSD matrices is non-convex by finding two rank-1 PSD matrices whose average is rank 2.
.
For part (c), try and .
Direct computation
, so . .
Rank-1 verification
. Eigenvalues: , . Rank 1, PSD (both eigenvalues ).
Non-convexity
, . Average: , which has rank 2. Hence the rank-1 PSD set is non-convex.
ch16-ex10
Medium(Noise Robustness) Setup: , , Gaussian measurements.
(a) Vary SNR from 5 to 40 dB in 5 dB steps. For each SNR, run Wirtinger flow, truncated WF, and amplitude flow.
(b) Plot relative error vs. SNR for all three methods.
(c) Compute the "phase retrieval SNR tax": for each method, what additional SNR is needed compared to linear recovery (if phase were available)?
The amplitude loss is better conditioned near the solution but non-smooth at zero.
Expect a 3--6 dB phase retrieval tax at moderate SNR.
Error vs. SNR curves
All three methods show error at moderate-to-high SNR. Truncated WF and amplitude flow are slightly more robust at low SNR ( dB).
Phase retrieval tax
To achieve the same relative error as linear recovery at SNR dB, phase retrieval needs SNR -- dB β a -- dB tax.
Low vs. high SNR behavior
At low SNR ( dB): amplitude flow is most robust (the amplitude loss is less sensitive to large noise). At high SNR ( dB): all three methods converge to similar floor errors determined by measurement noise.
ch16-ex11
Hard(Sparse Phase Retrieval for RF Imaging) (a) Create a sparse RF imaging scene: , point scatterers.
(b) Construct a MIMO radar forward model with , , . Generate phaseless measurements with phase masks and SNR = 25 dB.
(c) Implement sparse Wirtinger flow (gradient + hard thresholding at sparsity ).
(d) Compare with standard (non-sparse) Wirtinger flow. Plot NMSE vs. iteration for both.
(e) Vary from 5 to 50. At what sparsity does sparse WF fail?
Hard thresholding after each gradient step: keep the entries with largest magnitude.
Use a structured initialization that also exploits sparsity.
Sparse WF implementation
After the gradient step , apply .
Comparison
Sparse WF converges 2 faster and achieves 3 dB lower NMSE than standard WF for . The improvement comes from the sparsity constraint preventing noise amplification in the zero components.
Sparsity limit
Sparse WF fails (NMSE dB) at -- for the given measurement budget. This is consistent with the requirement: at and , the maximum recoverable sparsity is .
ch16-ex12
Hard(Gerchberg--Saxton with Spectral Initialization) (a) Implement the Gerchberg--Saxton algorithm for Fourier phase retrieval with a support constraint.
(b) Setup: , Fourier measurements with coded masks. Signal has support .
(c) Compare convergence with: (i) random initialization, (ii) spectral initialization.
(d) Test a hybrid: spectral init + 50 GS iterations + WF refinement. Compare with pure WF.
The support constraint projection zeros out entries outside the known support.
Hybrid methods can combine the speed of GS with the convergence guarantee of WF.
GS implementation
Alternate between Fourier domain (replace amplitude with , keep current phase) and spatial domain (apply support constraint).
Initialization comparison
Random init: converges in 30% of runs, median error 0.15. Spectral init: converges in 90% of runs, error 0.02 after 500 iterations.
Hybrid method
Spectral init + 50 GS + 100 WF achieves error 0.005, matching pure WF (200 iter) at similar total cost. GS provides cheap early iterations; WF refines to high accuracy.
ch16-ex13
Hard(Dual Certificate for PhaseLift) Consider the noiseless PhaseLift SDP with Gaussian measurements.
(a) Write the Lagrangian dual of PhaseLift. What is the dual variable?
(b) Show that strong duality holds (Slater's condition).
(c) State the KKT conditions for optimality of .
(d) Explain why constructing a dual certificate with and proves is optimal.
The dual variable is and a dual PSD matrix .
Slater: any strictly feasible point suffices.
Dual formulation
. Dual: s.t. .
Slater condition
For with large enough, is not necessarily feasible. But the constraint set is non-empty for the true , and we can perturb to strict feasibility.
Dual certificate implies optimality
If and , then is dual feasible and complementary slackness holds. By strong duality, is primal optimal.
ch16-ex14
Hard(Measurement Diversity vs. Number of Masks) Consider an signal with Fourier measurements.
(a) Generate coded masks. For each, run 50 trials of Wirtinger flow and record the success rate (error ).
(b) Plot the success rate vs. . Where is the phase transition?
(c) For , vary the mask type: (i) random uniform phase, (ii) random , (iii) random phase with limited range . Which provides the best diversity?
(d) Relate your findings to the measurement diversity requirement in Fourier phase retrieval theory.
The phase transition should occur at for random uniform masks.
Limited-range masks provide less diversity and may require more masks.
Phase transition
The success rate jumps from 20% at to 95% at . This matches the theoretical prediction that coded patterns suffice.
Mask type comparison
Random uniform phase: 95% success at . Random : 80% success (less diverse). Limited range: 40% success (insufficient diversity).
Diversity interpretation
The masks must make the effective measurement matrix satisfy an RIP-like condition. Full phase randomness maximizes the incoherence.
ch16-ex15
Hard(Resolution Limits of Phaseless Imaging) (a) From magnitude-only Fourier data , show that the autocorrelation is directly accessible via the Wiener--Khinchin theorem.
(b) What is the "resolution" of the autocorrelation in terms of the original signal bandwidth ?
(c) Argue that successful phase retrieval restores the original resolution .
(d) Simulate: create a scene with two point targets at separation . Compare the autocorrelation image with the phase-retrieved image. Are the targets resolvable in each?
The power spectrum is the Fourier transform of the autocorrelation.
The autocorrelation has twice the bandwidth but mixes spatial features.
Wiener--Khinchin
by the Wiener--Khinchin theorem. So from magnitude-only data, we can reconstruct by inverse Fourier transform.
Autocorrelation resolution
has bandwidth (convolution doubles the support in frequency). But the autocorrelation of two point targets at distance has peaks at , , and β the peaks at mix with sidelobes.
Phase retrieval restores resolution
Successful phase retrieval recovers at its native resolution . Two targets at are cleanly resolved. The autocorrelation image shows merged peaks β the targets are not individually resolvable.
ch16-ex16
Hard(Convergence Rate Analysis) (a) Implement Wirtinger flow and record for , .
(b) Fit an exponential decay to the convergence curve. Estimate the contraction factor .
(c) Vary . How does depend on ?
(d) The theory predicts . Estimate from your data. Does it match the theoretical prediction?
Fit on the semilog plot (linear regression of vs. ).
The contraction factor should improve (decrease) with more measurements.
Exponential fit
On a semilog plot, . Linear regression gives .
Dependence on $M/N$
| Estimated | ||
|---|---|---|
| 4 | 0.985 | 0.96 |
| 6 | 0.980 | 1.28 |
| 8 | 0.975 | 1.60 |
| 12 | 0.970 | 1.92 |
Theory comparison
The estimated increases linearly with , consistent with the theoretical bound where is a universal constant.
ch16-ex17
Hard(Hybrid Phase Retrieval: Partial Phase Information) In practice, some receivers may provide partial phase information (e.g., quantized phase or relative phases).
(a) Formulate the hybrid problem: full complex measurements and magnitude-only measurements .
(b) Modify Wirtinger flow to incorporate both measurement types.
(c) For , vary the fraction from 0 to 1. Plot recovery error vs. fraction of complex measurements.
(d) At what fraction does the hybrid method match pure complex (linear) recovery?
For complex measurements, the gradient contribution is linear; for magnitude-only, quadratic.
Even 10% complex measurements can dramatically improve recovery.
Hybrid loss function
.
Modified gradient
The gradient adds a linear term from the complex measurements: .
Phase transition
At 20% complex measurements, the hybrid method matches pure linear recovery quality. Even 5% complex measurements reduce the recovery error by 10 dB compared to pure magnitude-only.
ch16-ex18
Challenge(Information-Theoretic Lower Bound) (a) For the noiseless phase retrieval model with Gaussian , compute the Fisher information matrix for from the measurements .
(b) Show that the Fisher information per measurement scales as rather than as in linear measurements.
(c) Use the Cramer--Rao bound to derive a lower bound on the MSE of any unbiased estimator.
(d) Compare this lower bound with the empirical error of Wirtinger flow. How close is WF to the information-theoretic limit?
The Fisher information for involves terms.
WF is known to be near-optimal for Gaussian measurements.
Fisher information computation
For with : . .
Scaling analysis
. The CRB is .
WF optimality
Empirically, WF achieves MSE within a factor of 2--3 of the CRB for , confirming near-optimal statistical efficiency.
ch16-ex19
Challenge(Burer--Monteiro for Low-Rank Phase Retrieval) (a) Formulate the PhaseLift SDP with Burer--Monteiro factorization: with .
(b) Derive the gradient of the resulting non-convex problem with respect to .
(c) Implement gradient descent on for , . Compare convergence with Wirtinger flow.
(d) For what values of does the Burer--Monteiro approach recover the correct rank-1 solution? Is sufficient?
(e) Discuss: when might Burer--Monteiro be preferred over direct Wirtinger flow?
The gradient w.r.t. uses the chain rule: .
With , the Burer--Monteiro approach reduces to amplitude flow.
Non-convex formulation
. Gradient: .
Rank requirement
: equivalent to amplitude flow; works but slow convergence near saddle points. : no spurious local minima; converges reliably. is recommended.
Comparison with WF
For : BM with is 30% slower than WF (gradient involves matrix operations). For : BM can be faster than SDP but slower than direct WF. BM is useful when you want the lifted solution (e.g., for uncertainty quantification).
ch16-ex20
Challenge(Complete Phaseless RF Imaging Pipeline) Build an end-to-end phaseless RF imaging system:
(a) Scene: grid with 10 point scatterers and 2 extended targets (rectangles).
(b) System: MIMO radar, , , , center frequency 10 GHz, bandwidth 4 GHz. phase masks.
(c) Forward model: Near-field Born approximation. Power detection: . SNR = 20 dB.
(d) Reconstruction pipeline: (i) Spectral initialization. (ii) Wirtinger flow (200 iterations). (iii) Sparse refinement: threshold to components, then sparse WF (100 iterations). (iv) Final ADMM + TV for edge preservation.
(e) Evaluation: NMSE, SSIM, support F1-score at each stage.
(f) Plot quality (NMSE) vs. computation time for each variant.
The pipeline should progressively refine: spectral init, WF, sparse WF, ADMM+TV.
Use warm-starting between stages for faster convergence.
The coherent FISTA baseline provides the upper bound on achievable quality.
Forward model construction
Build the sensing matrix with entries for each Tx-Rx-frequency triple. Apply random phase masks.
Progressive reconstruction
| Stage | NMSE (dB) | Time (s) |
|---|---|---|
| Spectral init | 0.3 | |
| + WF (200 iter) | 3.5 | |
| + Sparse WF (100 iter) | 2.1 | |
| + ADMM-TV (50 iter) | 5.2 | |
| Coherent FISTA (reference) | 1.8 |
Cost-benefit analysis
The full pipeline (11 s) achieves within 4.4 dB of the coherent baseline. The sparse WF stage provides the best marginal gain per second. ADMM-TV adds 1.8 dB at moderate cost. The "phase retrieval tax" is 4.4 dB.