Manifold Optimization on the Complex Torus

Respect the Geometry

SDR lifts the problem to a convex space; element-wise BCD cycles coordinates. Manifold optimization takes a different view: it respects the native geometry of the feasible set. The unit-modulus torus is a smooth manifold M=(S1)N\mathcal{M} = (S^1)^N, and standard tools from Riemannian geometry (gradients, retractions, geodesics) give convergent, efficient algorithms tailored to this space. Manifold methods scale much better than SDR and give better solutions than pure element-wise. For large NN (β‰₯256\geq 256), manifold is the algorithm of choice.

Definition:

The Complex Unit-Modulus Manifold

The complex unit-modulus manifold is

M={Ο•βˆˆCN:βˆ£Ο•n∣=1Β βˆ€n}.\mathcal{M} = \{\boldsymbol{\phi} \in \mathbb{C}^N : |\phi_n| = 1\ \forall n\}.

M\mathcal{M} is the NN-fold Cartesian product of complex unit circles: M=S1Γ—S1Γ—β‹―Γ—S1\mathcal{M} = S^1 \times S^1 \times \cdots \times S^1. Each circle is a 1D smooth manifold, so M\mathcal{M} is an NN-dimensional smooth manifold.

The tangent space at a point Ο•βˆˆM\boldsymbol{\phi} \in \mathcal{M} is

TΟ•M={ξ∈CN:β„œ(Ο•nβˆ—ΞΎn)=0Β βˆ€n}.T_{\boldsymbol{\phi}} \mathcal{M} = \{\boldsymbol{\xi} \in \mathbb{C}^N : \Re(\phi_n^* \xi_n) = 0\ \forall n\}.

Each tangent vector is "perpendicular" to Ο•\boldsymbol{\phi} in a pointwise sense β€” it advances along the circle at each coordinate without leaving the torus.

The condition β„œ(Ο•nβˆ—ΞΎn)=0\Re(\phi_n^* \xi_n) = 0 is the linearization of βˆ£Ο•n∣2=1|\phi_n|^2 = 1: differentiating Ο•nβˆ—Ο•n=1\phi_n^* \phi_n = 1 gives Ο•nβˆ—ΞΎn+ΞΎnβˆ—Ο•n=2β„œ(Ο•nβˆ—ΞΎn)=0\phi_n^* \xi_n + \xi_n^* \phi_n = 2\Re(\phi_n^* \xi_n) = 0. The tangent space is thus the orthogonal complement of Ο•\boldsymbol{\phi} in the per-coordinate sense.

,

Definition:

Riemannian Gradient and Retraction

Given a smooth f:CNβ†’Rf: \mathbb{C}^N \to \mathbb{R} and its Euclidean gradient βˆ‡f(Ο•)∈CN\nabla f(\boldsymbol{\phi}) \in \mathbb{C}^N, the Riemannian gradient at Ο•βˆˆM\boldsymbol{\phi} \in \mathcal{M} is the projection of βˆ‡f\nabla f onto the tangent space:

gradMf(Ο•)=PTΟ•M(βˆ‡f)=βˆ‡fβˆ’β„œ(Ο•βˆ—βŠ™βˆ‡f)βŠ™Ο•,\text{grad}_{\mathcal{M}} f(\boldsymbol{\phi}) = P_{T_{\boldsymbol{\phi}}\mathcal{M}}(\nabla f) = \nabla f - \Re(\boldsymbol{\phi}^* \odot \nabla f) \odot \boldsymbol{\phi},

where βŠ™\odot is element-wise multiplication.

A retraction RΟ•(ΞΎ)R_{\boldsymbol{\phi}}(\boldsymbol{\xi}) maps a tangent vector ξ∈TΟ•M\boldsymbol{\xi} \in T_{\boldsymbol{\phi}}\mathcal{M} back to a point on M\mathcal{M}. A simple choice is per-coordinate normalization:

RΟ•(ΞΎ)=Ο•+ΞΎβˆ£Ο•+ξ∣ (element-wise).R_{\boldsymbol{\phi}}(\boldsymbol{\xi}) = \frac{\boldsymbol{\phi} + \boldsymbol{\xi}}{|\boldsymbol{\phi} + \boldsymbol{\xi}|}\ \text{(element-wise)}.

More sophisticated retractions use the exponential map (exact geodesic), but the simple normalization is typically sufficient.

,

Riemannian Gradient Descent on M\mathcal{M}

Complexity: O(Tβ‹…N2)O(T \cdot N^2): T∼50T \sim 50-200200 iterations, each O(N2)O(N^2) for gradient of quadratic
Input: initial point Ο•(0)∈M\boldsymbol{\phi}^{(0)} \in \mathcal{M}, step size Ξ±\alpha, tolerance Ο΅\epsilon.
Output: stationary point ϕ⋆\boldsymbol{\phi}^\star.
1. For t=0,1,2,…t = 0, 1, 2, \ldots:
2. \quad Compute Euclidean gradient g=βˆ‡f(Ο•(t))\mathbf{g} = \nabla f(\boldsymbol{\phi}^{(t)}).
3. \quad Project to tangent space:
Ξ·=gβˆ’β„œ((Ο•(t))βˆ—βŠ™g)βŠ™Ο•(t)\boldsymbol{\eta} = \mathbf{g} - \Re((\boldsymbol{\phi}^{(t)})^* \odot \mathbf{g}) \odot \boldsymbol{\phi}^{(t)}.
4. \quad If βˆ₯Ξ·βˆ₯<Ο΅\|\boldsymbol{\eta}\| < \epsilon: break.
5. \quad Backtrack: find Ξ±t\alpha_t such that f(RΟ•(t)(βˆ’Ξ±tΞ·))>f(Ο•(t))f(R_{\boldsymbol{\phi}^{(t)}}(-\alpha_t \boldsymbol{\eta})) > f(\boldsymbol{\phi}^{(t)}) (ascent).
6. \quad Retract: Ο•(t+1)=RΟ•(t)(βˆ’Ξ±tΞ·)\boldsymbol{\phi}^{(t+1)} = R_{\boldsymbol{\phi}^{(t)}}(-\alpha_t \boldsymbol{\eta}), i.e.,
element-wise normalize (Ο•(t)βˆ’Ξ±tΞ·)(\boldsymbol{\phi}^{(t)} - \alpha_t \boldsymbol{\eta}) to unit modulus.
7. return Ο•(t)\boldsymbol{\phi}^{(t)}.

The per-iteration cost is dominated by the gradient βˆ‡f\nabla f of the quadratic, which is O(N2)O(N^2) for dense A\mathbf{A}. Compared to SDR's O(N6.5)O(N^{6.5}), manifold is N4.5/TN^{4.5}/T times faster β€” a decisive advantage at Nβ‰₯128N \geq 128. Conjugate gradient or L-BFGS variants on the manifold are used in practice for faster convergence (see Manopt toolbox).

Theorem: Convergence of Riemannian Gradient Descent

Assume ff is twice continuously differentiable on CN\mathbb{C}^N, and the manifold M\mathcal{M} is compact (trivially, as the complex torus). Riemannian gradient ascent with Armijo backtracking satisfies:

  1. f(Ο•(t+1))>f(Ο•(t))f(\boldsymbol{\phi}^{(t+1)}) > f(\boldsymbol{\phi}^{(t)}) unless βˆ₯gradMf(Ο•(t))βˆ₯=0\|\text{grad}_{\mathcal{M}} f(\boldsymbol{\phi}^{(t)})\| = 0.
  2. The sequence {Ο•(t)}\{\boldsymbol{\phi}^{(t)}\} has a convergent subsequence whose limit is a stationary point of ff on M\mathcal{M} (i.e., gradMf=0\text{grad}_{\mathcal{M}} f = 0).
  3. Near a non-degenerate stationary point, convergence is linear with rate governed by the Riemannian Hessian.

For the RIS QCQP, the method typically converges to a local maximum within 50–200 iterations.

The Riemannian gradient points in the direction of steepest ascent on the manifold. Armijo backtracking ensures monotone improvement. Under smoothness, the sequence converges to a stationary point β€” the Riemannian analog of a zero gradient.

Riemannian Newton and Trust Region

Riemannian gradient descent converges linearly; Riemannian Newton converges quadratically β€” each iteration squares the optimality gap. Computing the Riemannian Hessian (the second fundamental form of M\mathcal{M} + Euclidean Hessian) is O(N2)O(N^2), same as the gradient, so Newton trades no compute per iteration for faster outer convergence. Practical implementations use the trust-region Newton method via the Manopt toolbox, giving the fastest reliable convergence for the RIS QCQP.

Riemannian Gradient Descent on S1Γ—S1S^1 \times S^1

For a 2-torus (N=2N = 2) example, animate the trajectory of Riemannian gradient descent. Each step computes a tangent vector orthogonal to the current Ο•\boldsymbol{\phi}, descends in that direction, and retracts back to the torus. The trajectory spirals toward a local maximum of the objective.

Example: Riemannian Gradient for the Rank-1 QCQP

For f(Ο•)=∣cHΟ•βˆ£2f(\boldsymbol{\phi}) = |\mathbf{c}^H \boldsymbol{\phi}|^2, compute the Euclidean and Riemannian gradients at Ο•=1\boldsymbol{\phi} = \mathbf{1} (all ones).

Manifold GD vs. SDR Convergence

Track the objective of Riemannian gradient descent (with Armijo backtracking) over iterations for a fixed problem instance, and compare with the SDR upper bound and the element-wise BCD trajectory. Manifold typically closes 80-90% of the gap to SDR within 50 iterations; element-wise plateaus earlier but at a lower objective.

Parameters
64
2
100

Common Mistake: Step Size Matters

Mistake:

"Just use a fixed step size Ξ±=0.1\alpha = 0.1 and iterate."

Correction:

Without backtracking, a fixed step size can cause non-monotone iterates and even divergence (overshooting). Always use Armijo backtracking: start with Ξ±\alpha, halve it until f(RΟ•(βˆ’Ξ±Ξ·))β‰₯f(Ο•)+cΞ±βˆ₯Ξ·βˆ₯2f(R_{\boldsymbol{\phi}}(-\alpha \boldsymbol{\eta})) \geq f(\boldsymbol{\phi}) + c \alpha \|\boldsymbol{\eta}\|^2 for some c∈(0,1)c \in (0, 1). This guarantees monotone progress at the cost of a few extra function evaluations per iteration. Modern implementations (Manopt) handle this automatically.

πŸ”§Engineering Note

Manopt: Let Someone Else Do the Work

The Manopt toolbox (MATLAB) and its Python/Julia ports implement Riemannian trust-region Newton out of the box for the complex unit-modulus manifold (among many others). Using Manopt:

  1. Define the objective ff and its Euclidean gradient.
  2. Select the manifold ("complexcircle" or "obliquecomplexfactor").
  3. Call trustregions(problem).

The toolbox handles tangent-space projection, retraction, Hessian approximation, and backtracking internally. For research and production alike, Manopt is the recommended black-box; avoid re-implementing Riemannian optimization unless you have a very specific reason.

Practical Constraints
  • β€’

    Manopt at N=256N = 256: typically 5-10 s per solve, ∼100\sim 100 iterations.

  • β€’

    Pymanopt / Julia Manopt have identical API β€” easy to port across languages.

  • β€’

    Warm-starting across AO iterations reduces per-solve time by 33-5Γ—5\times.