Ferkans — Interactive Telecom Tutor

PhaseLift — Convex Relaxation via Semidefinite Programming

The first rigorous approach to phase retrieval with provable guarantees is PhaseLift (Candes, Strohmer, and Voroninski, 2013). The key idea is lifting: reformulate the quadratic measurements as linear measurements of a rank-1 matrix, then relax the rank constraint to a semidefinite program (SDP). This converts the non-linear, non-convex phase retrieval problem into a convex optimization problem.

The point is that convexity buys us guaranteed global optimality — no local minima, no initialization sensitivity. The price is computational: we move from an $N$ -dimensional signal to an $N \times N$ matrix variable.

Definition:
The Lifting Trick

The intensity measurement $y_i = |\langle \mathbf{a}_i, \mathbf{x}\rangle|^2$ can be rewritten as:

$y_i = \mathbf{a}_i^H \mathbf{x}\mathbf{x}^H \mathbf{a}_i = \text{tr}(\mathbf{a}_i\mathbf{a}_i^H \mathbf{X}) = \text{tr}(\mathbf{A}_i \mathbf{X}),$

where $\mathbf{X} = \mathbf{x}\mathbf{x}^H \in \mathbb{C}^{N \times N}$ and $\mathbf{A}_i = \mathbf{a}_i\mathbf{a}_i^H$ .

The measurements are now linear in $\mathbf{X}$ :

$y_i = \text{tr}(\mathbf{A}_i \mathbf{X}), \quad i = 1, \ldots, M.$

The constraint that $\mathbf{X} = \mathbf{x}\mathbf{x}^H$ is equivalent to: $\mathbf{X} \succeq 0$ and $\text{rank}(\mathbf{X}) = 1$ . The rank constraint is what makes the problem hard.

Lifting (Phase Retrieval)

A technique that replaces the unknown signal $\mathbf{x}$ by the rank-1 matrix $\mathbf{X} = \mathbf{x}\mathbf{x}^H$ , converting quadratic measurements into linear constraints on $\mathbf{X}$ . The rank-1 constraint is then relaxed to a semidefinite constraint $\mathbf{X} \succeq 0$ .

Related: The Phase Retrieval Problem

Theorem: PhaseLift Recovery Guarantee

Consider the semidefinite program (PhaseLift):

$\min_{\mathbf{X} \succeq 0} \; \text{tr}(\mathbf{X}) \quad \text{s.t.} \quad \text{tr}(\mathbf{A}_i\mathbf{X}) = y_i, \quad i = 1, \ldots, M,$

where $\mathbf{A}_i = \mathbf{a}_i\mathbf{a}_i^H$ and $y_i = |\langle \mathbf{a}_i, \mathbf{x}_0\rangle|^2$ .

If $\mathbf{a}_i \sim \mathcal{CN}(\mathbf{0}, \mathbf{I})$ independently and $M \geq CN\log N$ for a universal constant $C$ , then with high probability the unique solution $\hat{\mathbf{X}}$ is rank-1, and $\hat{\mathbf{x}} = \sqrt{\lambda_1}\mathbf{v}_1$ (the leading eigenvector scaled by the leading eigenvalue) recovers $\mathbf{x}_0$ up to global phase.

Trace minimization for PSD matrices is the convex envelope of rank minimization — analogous to how the $\ell_1$ norm is the convex envelope of the $\ell_0$ "norm" in compressed sensing. When the measurements are sufficiently diverse (random Gaussian), the trace-minimizing solution naturally has rank 1.

Proof

Step 1: Feasibility

The true solution $\mathbf{X}_0 = \mathbf{x}_0\mathbf{x}_0^H$ is feasible: $\text{tr}(\mathbf{A}_i \mathbf{X}_0) = y_i$ by construction, and $\mathbf{X}_0 \succeq 0$ .

Step 2: Optimality via dual certificate

Construct a dual variable $\boldsymbol{\lambda} \in \mathbb{R}^M$ such that $\mathbf{I} - \sum_i \lambda_i \mathbf{A}_i \succeq 0$ and the complementary slackness condition holds: $(\mathbf{I} - \sum_i \lambda_i \mathbf{A}_i)\mathbf{X}_0 = 0$ . This certifies that $\mathbf{X}_0$ is optimal.

Step 3: Uniqueness

The dual certificate construction (via the golfing scheme) succeeds with high probability when $M \geq CN\log N$ . Any feasible $\mathbf{X}$ with $\text{tr}(\mathbf{X}) = \text{tr}(\mathbf{X}_0) = \|\mathbf{x}_0\|^2$ must equal $\mathbf{X}_0$ . Since $\mathbf{X}_0$ has rank 1, the solution is unique. $\blacksquare$

Example: PhaseLift for a Small Problem

Setup: $N = 16$ (complex signal), $M = 128$ Gaussian measurements, noiseless.

The SDP has optimization variable $\mathbf{X} \in \mathbb{C}^{16 \times 16}$ (Hermitian PSD) with 128 linear equality constraints. Solve using an interior-point SDP solver and examine the eigenvalue distribution of $\hat{\mathbf{X}}$ .

Solution

Eigenvalue analysis

The solver returns $\hat{\mathbf{X}}$ with eigenvalues $\lambda_1 = 14.2$ , $\lambda_2 = 3 \times 10^{-8}$ , $\ldots$ — effectively rank 1 (ratio $\lambda_1/\lambda_2 > 10^8$ ).

Signal extraction

Extract $\hat{\mathbf{x}} = \sqrt{\lambda_1}\mathbf{v}_1$ . After global phase alignment, the relative error is $\|\hat{\mathbf{x}} - e^{j\hat{\phi}}\mathbf{x}_0\|/ \|\mathbf{x}_0\| < 10^{-6}$ .

Scalability assessment

Computation time: 0.8 s for $N = 16$ . For $N = 256$ (a modest image), $\mathbf{X}$ has $\sim 65{,}000$ unknowns and the SDP requires $O(N^6)$ time — hours. For $N = 4096$ ( $64 \times 64$ image): completely intractable.

Definition:
Noisy PhaseLift

For noisy measurements $y_i = |\langle \mathbf{a}_i, \mathbf{x}\rangle|^2 + \eta_i$ , the equality constraints become infeasible. The noisy PhaseLift formulation uses a relaxed constraint:

$\min_{\mathbf{X} \succeq 0} \; \text{tr}(\mathbf{X}) \quad \text{s.t.} \quad \sum_{i=1}^M (\text{tr}(\mathbf{A}_i\mathbf{X}) - y_i)^2 \leq \epsilon,$

where $\epsilon$ bounds the total noise energy. Alternatively, a penalized form:

$\min_{\mathbf{X} \succeq 0} \; \sum_{i=1}^M (\text{tr}(\mathbf{A}_i\mathbf{X}) - y_i)^2 + \lambda\,\text{tr}(\mathbf{X}).$

The recovery error degrades gracefully: $\|\hat{\mathbf{X}} - \mathbf{X}_0\|_F \leq C\epsilon/\sqrt{M}$ with high probability.

Definition:
Scalable Variants of PhaseLift

Several approaches address PhaseLift's computational cost:

1. PhaseCut (Waldspurger, d'Aspremont, and Mallat, 2015): Instead of lifting to $N \times N$ , optimize over the phases directly:

$\max_{\boldsymbol{\phi}} \; \sum_{i} \sqrt{y_i}\,\text{Re}(e^{j\phi_i} \langle \mathbf{a}_i, \mathbf{x}\rangle),$

which is a max-cut-like SDP of smaller dimension.

2. Sketched PhaseLift (Yurtsever et al., 2017): Apply random projections to reduce the SDP dimension from $N^2$ to $O(N)$ , maintaining theoretical guarantees while reducing cost to $O(N^2)$ per iteration.

3. Low-rank SDP solvers (Burer--Monteiro): Set $\mathbf{X} = \mathbf{V}\mathbf{V}^H$ with $\mathbf{V} \in \mathbb{C}^{N \times r}$ ( $r \ll N$ ) and solve the non-convex problem over $\mathbf{V}$ . For rank-1 recovery, $r = 2$ or $3$ suffices.

Common Mistake: PhaseLift Does Not Scale to Real Imaging Problems

Mistake:

PhaseLift's elegant theory tempts one to use it for practical imaging. However, the computational cost is prohibitive:

$N$ (signal dim)	SDP variable size	Typical solve time
16	$16 \times 16 = 256$	1 s
64	$64 \times 64 = 4{,}096$	30 s
256	$256 \times 256 = 65{,}536$	hours
1024	$1024 \times 1024 = 10^6$	intractable

The $O(N^2)$ memory and $O(N^6)$ time of interior-point SDP solvers make PhaseLift impractical beyond $N \sim 256$ .

Correction:

PhaseLift is a theoretical milestone — it proves that phase retrieval is solvable in polynomial time. For practical imaging-scale problems ( $N > 1000$ ), use the non-convex methods of Section 16.3 (Wirtinger flow and variants), which operate directly on the $N$ -dimensional signal.

Convex (PhaseLift) vs. Non-Convex (Wirtinger Flow) Phase Retrieval

Property	PhaseLift (Convex)	Wirtinger Flow (Non-Convex)
Variable dimension	$N \times N$ matrix	$N$ -vector
Per-iteration cost	$O(N^{4.5})$ (SDP interior point)	$O(MN)$
Memory	$O(N^2)$	$O(MN)$
Measurement requirement	$M = O(N\log N)$	$M = O(N\log N)$ ; $O(N)$ with truncation
Initialization	Not needed (convex)	Critical (spectral initialization)
Convergence guarantee	Global optimum (convex)	Global optimum w.h.p. (spectral init + RIP)
Practical for $N > 256$ ?	No	Yes

Quick Check

In PhaseLift, a signal $\mathbf{x} \in \mathbb{C}^{64}$ is "lifted" to a matrix $\mathbf{X} = \mathbf{x}\mathbf{x}^H$ . How many real-valued unknowns does $\mathbf{X}$ have (exploiting Hermitian symmetry)?

64

4096

8192

2016

Correction:

4096

A Hermitian $64 \times 64$ matrix has $64^2 = 4096$ real degrees of freedom (64 real diagonal entries + $64 \times 63/2$ complex off-diagonal entries, each contributing 2 real numbers).

🔧Engineering Note

SDP Solver Choices for PhaseLift

Interior-point SDP solvers (SeDuMi, MOSEK, SDPT3) provide guaranteed convergence but require $O(N^2)$ memory and $O(N^{4.5})$ per iteration. For PhaseLift with $N > 100$ :

MOSEK is the fastest commercial solver; handles $N \leq 500$ in minutes.
SCS (splitting conic solver) uses first-order methods; lower per-iteration cost but slower convergence — suitable for $N$ up to $\sim$ 2000 at moderate accuracy.
Burer--Monteiro factorization reduces to a non-convex problem in $\mathbb{C}^{N \times r}$ — practical for $N > 1000$ but loses the convexity guarantee.

In research prototypes, CVXPY with MOSEK backend is the recommended interface for PhaseLift experiments.

Practical Constraints

•
Interior-point SDP solvers require $O(N^2)$ memory, limiting $N \leq 500$ on typical workstations
•
First-order SDP solvers (SCS) trade accuracy for scalability: convergence to $10^{-3}$ relative gap in $O(N^2)$ iterations
•
GPU-accelerated SDP solvers exist but provide only $2\text{--}5\times$ speedup for the matrix sizes in phase retrieval

Key Takeaway

Lifting converts the quadratic phase retrieval problem into a linear problem over $N \times N$ PSD matrices: $y_i = \text{tr}(\mathbf{A}_i \mathbf{X})$ . PhaseLift minimizes $\text{tr}(\mathbf{X})$ subject to measurement constraints — a convex SDP that provably recovers rank-1 solutions with $M = O(N\log N)$ Gaussian measurements. PhaseLift provides the first polynomial-time guarantee for phase retrieval, establishing theoretical feasibility. However, $O(N^2)$ memory and $O(N^6)$ computation make it impractical beyond $N \sim 256$ .

Convex Relaxation: PhaseLift