Non-Convex Methods: Wirtinger Flow
Wirtinger Flow β Non-Convex Gradient Descent for Phase Retrieval
Wirtinger flow (Candes, Li, and Soltanolkotabi, 2015) takes a radically different approach from PhaseLift: instead of convexifying the problem, it directly minimizes a non-convex loss function using gradient descent. Surprisingly, with a careful spectral initialization and sufficient measurements, gradient descent converges to the global optimum β despite the loss landscape having many saddle points.
The point is that the loss landscape, while non-convex globally, has no spurious local minima in a basin around the true solution. Spectral initialization places us inside this basin, and gradient descent does the rest. Wirtinger flow is the practical workhorse of modern phase retrieval.
Definition: The Intensity Loss Function
The Intensity Loss Function
Wirtinger flow minimizes the intensity-based loss:
This is a quartic function of β non-convex with potentially many saddle points.
Alternative: amplitude loss.
The amplitude loss has better conditioning near the solution but is non-smooth when . Truncated variants handle both losses.
Wirtinger Calculus
A framework for differentiating real-valued functions of complex variables. For , the Wirtinger gradient provides the steepest ascent direction, and is the gradient descent direction. Essential for optimizing over complex-valued signals.
Related: The Intensity Loss Function
Theorem: Wirtinger Gradient of the Intensity Loss
For the intensity loss , the Wirtinger gradient is:
Each term in the sum is the residual times the inner product times the measurement direction . Measurements with large residuals contribute more to the gradient β this is why truncation helps.
Wirtinger calculus basics
For , the Wirtinger derivative is: .
Chain rule application
Let . Then and .
Assembling the gradient
Applying the chain rule to each squared residual: . Summing and dividing by gives the stated formula.
Definition: Spectral Initialization
Spectral Initialization
The convergence of Wirtinger flow depends critically on initialization. Spectral initialization provides a starting point in the basin of attraction of the global optimum.
Algorithm:
- Form the weighted matrix .
- Compute the leading eigenvector: .
Intuition: , so the leading eigenvector of approximates (up to global phase).
Quality: With Gaussian measurements:
where as . Typically gives .
Spectral Initialization
An initialization strategy for non-convex phase retrieval that computes the leading eigenvector of the weighted matrix . Provides a starting point within a constant factor of the true signal, enabling gradient descent to converge globally.
Related: The Intensity Loss Function
Wirtinger Flow Algorithm
Complexity: per iteration; total with iterationsThe normalization by in line 5 ensures scale invariance: the step size does not depend on the signal energy.
Theorem: Convergence Guarantee for Wirtinger Flow
Suppose , (for a universal constant ), and is the spectral initialization. Then with step size , Wirtinger flow converges linearly:
with high probability. After iterations, the relative error is .
Total complexity: with Gaussian measurements. With FFT structure: .
The loss landscape has no spurious local minima in a neighborhood of (and its global phase rotations). Spectral initialization lands inside this neighborhood, and the restricted strong convexity of the loss in this region drives linear convergence.
Local restricted strong convexity
In the neighborhood , the intensity loss satisfies a restricted strong convexity condition: with high probability when .
Spectral initialization lands in $\mathcal{B}$
The spectral initialization satisfies with when .
Contraction per iteration
Combining restricted strong convexity with the Lipschitz property of : .
Iteration count
Iterating the contraction: after steps, .
Historical Note: Gerchberg--Saxton: The Grandfather of Phase Retrieval (1972)
1970sThe Gerchberg--Saxton (GS) algorithm, published in 1972 by Ralph Gerchberg and Owen Saxton, was the first practical algorithm for phase retrieval from Fourier magnitudes. It alternates between the Fourier and spatial domains, replacing the magnitude in each domain with the known constraints.
GS is an alternating projection method: it projects between the set of signals consistent with the Fourier magnitude and the set consistent with spatial constraints (support, non-negativity). On tree-structured constraint sets, alternating projections converge; on non-convex sets (like the magnitude constraint), convergence to the global optimum is not guaranteed.
Despite lacking convergence guarantees, GS and its descendants (Fienup's Hybrid Input-Output, error reduction) remained the standard algorithms for four decades, until Wirtinger flow provided the first provably convergent alternative.
Definition: Truncated Wirtinger Flow
Truncated Wirtinger Flow
Truncated Wirtinger Flow (Chen and Candes, 2017) discards measurements with large residuals from the gradient computation:
where .
Key improvement: Truncation reduces the measurement requirement from to β optimal up to constants.
Other variants:
- Reshaped Wirtinger flow: Uses the amplitude loss with truncation, achieving optimal without the log factor.
- Incremental (stochastic) Wirtinger flow: Uses one measurement per iteration for per-step cost β suitable for streaming data.
Wirtinger Flow Convergence
Demonstrates Wirtinger flow convergence on a 1D signal recovery task. The plot shows relative error versus iteration number.
Adjust the measurement ratio and noise level to observe:
- Linear convergence after spectral initialization
- Faster convergence with more measurements
- Error floor determined by noise level
- Effect of truncation on convergence speed
Parameters
Example: Wirtinger Flow vs. PhaseLift: Speed and Accuracy
Setup: , Gaussian measurements, SNR = 30 dB.
Compare recovery quality, computation time, and memory for PhaseLift, Wirtinger flow, truncated WF, and Gerchberg--Saxton.
Numerical comparison
| Method | Relative error | Time | Memory |
|---|---|---|---|
| PhaseLift (SDP) | 45 s | 128 MB | |
| Wirtinger flow (200 iter) | 0.15 s | 0.5 MB | |
| Truncated WF (100 iter) | 0.08 s | 0.5 MB | |
| Gerchberg--Saxton (500 iter) | 0.10 s | 0.5 MB |
Key observations
- Wirtinger flow matches PhaseLift accuracy in less time and less memory.
- Truncated WF is faster and slightly more accurate due to robustness against outlier gradients.
- Gerchberg--Saxton stagnates at a much higher error β it lacks spectral initialization and has no convergence guarantee.
Scaling comparison
At : Wirtinger flow takes 5 s (200 iterations), while PhaseLift is intractable. The gap widens dramatically with signal dimension.
Wirtinger Flow: Gradient Descent on the Non-Convex Landscape
Common Mistake: Random Initialization Fails for Wirtinger Flow
Mistake:
Using a random initialization (e.g., ) instead of spectral initialization. With random starting points, Wirtinger flow converges to the global optimum less than 5% of the time for β it gets trapped at saddle points or converges to spurious local minima.
Correction:
Always use spectral initialization. The leading eigenvector of costs only (via power iteration or Lanczos) and guarantees a starting point within the basin of attraction when .
Quick Check
What is the per-iteration computational cost of Wirtinger flow with Gaussian measurements and signal dimension ?
Each iteration computes all inner products at cost each, then assembles the gradient β total .
Key Takeaway
Wirtinger flow minimizes the non-convex intensity loss via gradient descent with Wirtinger calculus β per iteration. Spectral initialization places the starting point in the basin of attraction, enabling linear convergence to the global optimum with high probability. Truncated Wirtinger flow achieves optimal sample complexity by discarding outlier gradient contributions. Wirtinger flow is orders of magnitude faster than PhaseLift while achieving comparable accuracy β it is the practical method of choice for phase retrieval.