Ferkans — Interactive Telecom Tutor

ex-ch03-01

Easy

Show that the intersection of two halfspaces $\{\mathbf{x} : \mathbf{a}_1^T \mathbf{x} \leq b_1\}$ and $\{\mathbf{x} : \mathbf{a}_2^T \mathbf{x} \leq b_2\}$ is convex.

Show Hint

Each halfspace is convex. What do you know about intersections of convex sets?

Solution

Apply intersection theorem

Each halfspace is convex (verified in Section 3.1). By TIntersection of Convex Sets Is Convex, the intersection of convex sets is convex. $\blacksquare$

ex-ch03-02

Easy

Determine whether $f(x) = |x|$ is convex on $\mathbb{R}$ . (Note that $f$ is not differentiable at $x = 0$ .)

Show Hint

Use the definition directly: check $|\\theta x + (1-\\theta)y| \\leq \\theta|x| + (1-\\theta)|y|$ .

Solution

Triangle inequality

$|\theta x + (1-\theta)y| \leq |\theta x| + |(1-\theta)y| = \theta |x| + (1-\theta)|y|$ . This is exactly the definition of convexity. $\blacksquare$

ex-ch03-03

Easy

Show that $f(\mathbf{x}) = \max(x_1, x_2, \ldots, x_n)$ is convex.

Show Hint

Express as a pointwise supremum of linear (hence convex) functions.

Solution

Pointwise supremum

$f(\mathbf{x}) = \max_i x_i = \max_i \mathbf{e}_i^T \mathbf{x}$ . Each $\mathbf{e}_i^T \mathbf{x}$ is linear, hence convex. The pointwise maximum of convex functions is convex (TOperations That Preserve Convexity, rule 2). $\blacksquare$

ex-ch03-04

Easy

Write the dual of the LP: $\min\; 3x_1 + 2x_2$ subject to $x_1 + x_2 \geq 4$ , $x_1 \geq 0$ , $x_2 \geq 0$ .

Show Hint

Convert to standard form ( $\\leq$ constraints) first.

Solution

Standard form

Rewrite: $\min\; 3x_1 + 2x_2$ s.t. $-x_1 - x_2 \leq -4$ , $-x_1 \leq 0$ , $-x_2 \leq 0$ .

Dual

Dual: $\max\; -4\lambda_1$ s.t. $-\lambda_1 - \lambda_2 + 3 \geq 0$ , $-\lambda_1 - \lambda_3 + 2 \geq 0$ , $\lambda_i \geq 0$ . Simplifying: $\max\; 4\mu$ s.t. $\mu \leq 3$ , $\mu \leq 2$ , $\mu \geq 0$ . Optimal: $\mu^\star = 2$ , $d^\star = 8$ . (Primal: $x_1 = 0, x_2 = 4$ , $p^\star = 8$ .) Strong duality holds. $\blacksquare$

ex-ch03-05

Easy

Verify the KKT conditions for the problem $\min\; x^2$ subject to $x \geq 1$ . What are $x^\star$ and $\lambda^\star$ ?

Show Hint

Rewrite the constraint as $1 - x \\leq 0$ .

Solution

KKT conditions

Stationarity: $2x - \lambda = 0$ . Primal feasibility: $x \geq 1$ . Dual feasibility: $\lambda \geq 0$ . Complementary slackness: $\lambda(1 - x) = 0$ .

The constraint is active at $x = 1$ (since the unconstrained minimum $x = 0$ violates $x \geq 1$ ). So $x^\star = 1$ , $\lambda^\star = 2$ . $\blacksquare$

ex-ch03-06

Easy

Compute one step of gradient descent on $f(x) = (x-3)^2$ starting from $x_0 = 0$ with step size $\alpha = 0.5$ .

Show Hint

The gradient descent update is $x_{k+1} = x_k - \alpha \nabla f(x_k)$ .

Compute $\nabla f(x) = 2(x - 3)$ and evaluate at $x_0 = 0$ .

Solution

Gradient computation

$\nabla f(x_0) = 2(0 - 3) = -6$ . $x_1 = x_0 - \alpha \nabla f(x_0) = 0 - 0.5 \cdot (-6) = 3$ . In this case, one step reaches the exact minimiser because $f$ is quadratic and $\alpha = 1/L$ where $L = 2$ . $\blacksquare$

ex-ch03-07

Medium

Show that $f(\mathbf{x}) = \log\!\left(\sum_{i=1}^n e^{x_i}\right)$ (the log-sum-exp function) is convex.

Show Hint

Compute the Hessian and show it is PSD.

Alternatively, express as a pointwise supremum of linear functions plus a correction.

Solution

Hessian approach

Let $z_i = e^{x_i}$ and $S = \sum_i z_i$ . The gradient is $[\nabla f]_i = z_i / S$ . The Hessian is $\nabla^2 f = \frac{1}{S}\text{diag}(\mathbf{z}) - \frac{1}{S^2}\mathbf{z}\mathbf{z}^T$ .

For any $\mathbf{v}$ : $\mathbf{v}^T \nabla^2 f \, \mathbf{v} = \frac{1}{S}\sum_i z_i v_i^2 - \frac{1}{S^2}\left(\sum_i z_i v_i\right)^2$ .

By Cauchy–Schwarz applied to the probability distribution $p_i = z_i/S$ : $\left(\sum_i p_i v_i\right)^2 \leq \sum_i p_i v_i^2$ . Hence $\mathbf{v}^T \nabla^2 f \, \mathbf{v} \geq 0$ . $\blacksquare$

ex-ch03-08

Medium

Derive the dual of the QP: $\min\; \frac{1}{2}\mathbf{x}^T \mathbf{Q}\mathbf{x} + \mathbf{c}^T \mathbf{x}$ subject to $\mathbf{A}\mathbf{x} = \mathbf{b}$ , where $\mathbf{Q} \succ 0$ .

Show Hint

Form the Lagrangian with multiplier $\\boldsymbol{\\nu}$ for the equality constraint.

Minimise over $\\mathbf{x}$ by setting the gradient to zero.

Solution

Lagrangian

$\mathcal{L}(\mathbf{x}, \boldsymbol{\nu}) = \frac{1}{2}\mathbf{x}^T \mathbf{Q}\mathbf{x} + \mathbf{c}^T \mathbf{x} + \boldsymbol{\nu}^T(\mathbf{A}\mathbf{x} - \mathbf{b})$ .

Minimise over $\mathbf{x}$

$\nabla_{\mathbf{x}} \mathcal{L} = \mathbf{Q}\mathbf{x} + \mathbf{c} + \mathbf{A}^T \boldsymbol{\nu} = \mathbf{0}$ , so $\mathbf{x}^\star = -\mathbf{Q}^{-1}(\mathbf{c} + \mathbf{A}^T \boldsymbol{\nu})$ .

Dual function

Substituting back: $g(\boldsymbol{\nu}) = -\frac{1}{2}(\mathbf{c} + \mathbf{A}^T \boldsymbol{\nu})^T \mathbf{Q}^{-1} (\mathbf{c} + \mathbf{A}^T \boldsymbol{\nu}) - \boldsymbol{\nu}^T \mathbf{b}$ .

The dual problem: $\max_{\boldsymbol{\nu}}\; g(\boldsymbol{\nu})$ (unconstrained in $\boldsymbol{\nu}$ ). This is a concave QP in $\boldsymbol{\nu}$ . $\blacksquare$

ex-ch03-09

Medium

Water-filling with $N = 3$ sub-channels: noise floors $\sigma_i^2/g_i = [1, 2, 5]$ and total power $P_{\text{tot}} = 3$ . Find the optimal allocation and total rate.

Show Hint

Start by assuming all channels are active, then check for negative allocations.

Solution

Try all 3 active

$3\mu = 3 + (1+2+5) = 11$ , $\mu = 11/3 \approx 3.67$ . $p_3 = 3.67 - 5 = -1.33 < 0$ . Remove channel 3.

2 active channels

$2\mu = 3 + (1+2) = 6$ , $\mu = 3$ . $p_1 = 3 - 1 = 2$ , $p_2 = 3 - 2 = 1$ , $p_3 = 0$ . All non-negative. $\checkmark$

Total rate

$R = \log_2(3/1) + \log_2(3/2) + 0 = \log_2 3 + \log_2 1.5 \approx 1.585 + 0.585 = 2.170$ bits/s/Hz. $\blacksquare$

ex-ch03-10

Medium

Show that gradient descent with step size $\alpha > 2/L$ diverges on the quadratic $f(x) = \frac{L}{2}x^2$ .

Show Hint

Write the update rule and find the condition for $|x_{k+1}| > |x_k|$ .

Solution

Update rule

$\nabla f(x) = Lx$ , so $x_{k+1} = x_k - \alpha L x_k = (1 - \alpha L) x_k$ .

Divergence condition

$|x_{k+1}| = |1 - \alpha L| \cdot |x_k|$ . For $|x_k| \to \infty$ , we need $|1 - \alpha L| > 1$ . This holds iff $\alpha L > 2$ , i.e., $\alpha > 2/L$ . $\blacksquare$

ex-ch03-11

Medium

Project the vector $\mathbf{y} = (3, -1, 2)$ onto the probability simplex $\Delta = \{\mathbf{x} : x_i \geq 0,\; \sum_i x_i = 1\}$ .

Show Hint

Sort the components and apply the simplex projection algorithm.

Solution

Sort

Sorted: $y_{(1)} = 3$ , $y_{(2)} = 2$ , $y_{(3)} = -1$ .

Find $\rho$

$j=1$ : $3 - (3-1)/1 = 3 - 2 = 1 > 0$ . $\checkmark$ $j=2$ : $2 - (3+2-1)/2 = 2 - 2 = 0$ . Not $> 0$ . So $\rho = 1$ , $\tau = (3-1)/1 = 2$ .

Project

$[\Pi(\mathbf{y})]_i = (y_i - 2)^+$ : $(3-2, -1-2, 2-2)^+ = (1, 0, 0)$ . Check: $1 + 0 + 0 = 1$ . $\checkmark$ $\blacksquare$

ex-ch03-12

Medium

Show that $f(\mathbf{X}) = -\log\det(\mathbf{X})$ is convex on the cone of positive definite matrices $\mathbb{S}^n_{++}$ .

Show Hint

Restrict to a line $\\mathbf{X}(t) = \\mathbf{X}_0 + t\\mathbf{V}$ and show the resulting scalar function is convex.

Solution

Restrict to a line

Let $g(t) = -\log\det(\mathbf{X}_0 + t\mathbf{V}) = -\log\det(\mathbf{X}_0) - \log\det(\mathbf{I} + t\mathbf{X}_0^{-1/2}\mathbf{V}\mathbf{X}_0^{-1/2})$ . Let $\lambda_1, \ldots, \lambda_n$ be eigenvalues of $\mathbf{X}_0^{-1/2}\mathbf{V}\mathbf{X}_0^{-1/2}$ .

Second derivative

$g(t) = \text{const} - \sum_i \log(1 + t\lambda_i)$ . $g''(t) = \sum_i \frac{\lambda_i^2}{(1 + t\lambda_i)^2} \geq 0$ . Hence $g$ is convex in $t$ , and since this holds for all lines, $f$ is convex. $\blacksquare$

ex-ch03-13

Medium

The Rosenbrock function is $f(x_1, x_2) = (1-x_1)^2 + 100(x_2 - x_1^2)^2$ . Is it convex? Find its global minimum.

Show Hint

Compute the Hessian at the critical point.

Solution

Global minimum

Setting $\nabla f = 0$ : the unique critical point is $(x_1, x_2) = (1, 1)$ with $f(1,1) = 0$ .

Convexity check

The Hessian at $(1,1)$ is $\nabla^2 f = \begin{pmatrix} 802 & -400 \\ -400 & 200 \end{pmatrix}$ . Eigenvalues: $\approx 2.0$ and $\approx 1000$ . PSD at $(1,1)$ .

But at $(0, 0)$ : $\nabla^2 f = \begin{pmatrix} 2 - 400\cdot 0 + 1200\cdot 0 & 0 \\ 0 & 200 \end{pmatrix} = \begin{pmatrix} 2 & 0 \\ 0 & 200 \end{pmatrix}$ which is PSD. However, at general points the Hessian can be indefinite (e.g., near $x_1 \approx 0.5$ , the $(1,1)$ entry becomes $2 + 1200(0.25) - 400 = 2 + 300 - 400 = -98 < 0$ ).

So $f$ is not convex globally, but has a unique global minimum at $(1,1)$ . $\blacksquare$

ex-ch03-14

Medium

Show that the MMSE estimator $\hat{\mathbf{x}} = (\mathbf{H}^H \mathbf{H} + \sigma^2 \mathbf{I})^{-1} \mathbf{H}^H \mathbf{y}$ solves the convex QP: $\min_{\mathbf{x}} \|\mathbf{y} - \mathbf{H}\mathbf{x}\|^2 + \sigma^2 \|\mathbf{x}\|^2$ .

Show Hint

Expand the objective, take the gradient, and set it to zero.

Solution

Expand

$f(\mathbf{x}) = \mathbf{y}^H \mathbf{y} - 2\text{Re}(\mathbf{y}^H \mathbf{H}\mathbf{x}) + \mathbf{x}^H \mathbf{H}^H \mathbf{H}\mathbf{x} + \sigma^2 \mathbf{x}^H \mathbf{x}$ .

Gradient

$\nabla_{\mathbf{x}} f = -2\mathbf{H}^H \mathbf{y} + 2(\mathbf{H}^H \mathbf{H} + \sigma^2 \mathbf{I})\mathbf{x}$ . Setting to zero: $\mathbf{x}^\star = (\mathbf{H}^H \mathbf{H} + \sigma^2 \mathbf{I})^{-1} \mathbf{H}^H \mathbf{y}$ .

Convexity

$\nabla^2 f = 2(\mathbf{H}^H \mathbf{H} + \sigma^2 \mathbf{I}) \succ 0$ for $\sigma^2 > 0$ , confirming strict convexity. $\blacksquare$

ex-ch03-15

Hard

(Dual decomposition for multi-cell power control.) Consider $K$ single-antenna links sharing a band. Link $k$ transmits with power $p_k$ and achieves rate $R_k = \log_2(1 + p_k g_{kk} / (\sum_{j \neq k} p_j g_{jk} + \sigma^2))$ . The network wants to maximise $\sum_k \log R_k$ (proportional fair) subject to $p_k \leq P_{\max}$ .

Explain why this is non-convex.
Formulate the high-SINR approximation (treat interference as noise, fix interference) as a convex problem.
Write the Lagrangian dual and interpret the dual variables.

Show Hint

In (1), note that $R_k$ is not concave jointly in $(p_1, \\ldots, p_K)$ because of the interference coupling.

In (2), fix all $p_j$ for $j \\neq k$ and solve for $p_k$ — this gives a set of decoupled water-filling problems.

Solution

Non-convexity

$R_k$ is concave in $p_k$ alone (for fixed interference), but the $\sum_{j \neq k} p_j g_{jk}$ term in the denominator makes $R_k$ non-concave in the joint variable $(p_1, \ldots, p_K)$ . Hence $\sum_k \log R_k$ is non-convex.

Fixed-interference approximation

Fix the interference: $I_k = \sum_{j \neq k} p_j g_{jk} + \sigma^2$ . Then $R_k(p_k) = \log_2(1 + p_k g_{kk}/I_k)$ is concave in $p_k$ . Each link solves an independent water-filling problem.

Dual interpretation

Introducing multiplier $\mu_k$ for each $p_k \leq P_{\max}$ gives $\mu_k = \partial(\log R_k)/\partial p_k$ , the marginal utility of power. An iterative algorithm alternates between updating powers (primal) and updating interference levels (akin to dual variables). $\blacksquare$

ex-ch03-16

Hard

(SDP relaxation for MIMO detection.) The ML detection problem is $\min_{\mathbf{x} \in \{-1,+1\}^n} \|\mathbf{y} - \mathbf{H}\mathbf{x}\|^2$ .

Show this is equivalent to $\max_{\mathbf{x} \in \{-1,+1\}^n} \mathbf{x}^T \mathbf{Q}\mathbf{x} + \mathbf{q}^T \mathbf{x}$ for appropriate $\mathbf{Q}$ and $\mathbf{q}$ .
Write the SDP relaxation.
Explain randomised rounding.

Show Hint

Expand $\\|\\mathbf{y} - \\mathbf{H}\\mathbf{x}\\|^2$ and identify the terms depending on $\\mathbf{x}$ .

Solution

Reformulation

$\|\mathbf{y} - \mathbf{H}\mathbf{x}\|^2 = \mathbf{y}^T \mathbf{y} - 2\mathbf{y}^T \mathbf{H}\mathbf{x} + \mathbf{x}^T \mathbf{H}^T \mathbf{H}\mathbf{x}$ . Since $\|\mathbf{x}\|^2 = n$ is constant for $x_i \in \{-1,+1\}$ , minimising is equivalent to maximising $2\mathbf{y}^T \mathbf{H}\mathbf{x} - \mathbf{x}^T \mathbf{H}^T \mathbf{H}\mathbf{x}$ . Set $\mathbf{Q} = -\mathbf{H}^T \mathbf{H}$ , $\mathbf{q} = 2\mathbf{H}^T \mathbf{y}$ .

SDP relaxation

Lift: $\mathbf{X} = \mathbf{x}\mathbf{x}^T$ , so $\mathbf{x}^T \mathbf{Q}\mathbf{x} = \text{tr}(\mathbf{Q}\mathbf{X})$ . Relax $\text{rank}(\mathbf{X}) = 1$ to $\mathbf{X} \succeq 0$ :

$\max\;\text{tr}(\mathbf{Q}\mathbf{X}) + \mathbf{q}^T \mathbf{x}$ s.t. $X_{ii} = 1$ , $\mathbf{X} \succeq 0$ , $\begin{pmatrix} \mathbf{X} & \mathbf{x} \\ \mathbf{x}^T & 1 \end{pmatrix} \succeq 0$ .

Randomised rounding

Sample $\tilde{\mathbf{x}} \sim \mathcal{N}(\mathbf{0}, \mathbf{X}^\star)$ , then round: $\hat{x}_i = \text{sign}(\tilde{x}_i)$ . Repeat multiple times and keep the best. $\blacksquare$

ex-ch03-17

Hard

(Convergence of projected gradient descent.) Let $f$ be $L$ -smooth and convex, and $\mathcal{C}$ be a closed convex set. Show that projected gradient descent with $\alpha = 1/L$ satisfies

$f(\mathbf{x}_k) - f(\mathbf{x}^\star) \leq \frac{L\|\mathbf{x}_0 - \mathbf{x}^\star\|^2}{2k}$

where $\mathbf{x}^\star \in \mathcal{C}$ is the constrained minimum.

Show Hint

Use the non-expansiveness of projection: $\\|\\Pi_{\\mathcal{C}}(\\mathbf{a}) - \\Pi_{\\mathcal{C}}(\\mathbf{b})\\| \\leq \\|\\mathbf{a} - \\mathbf{b}\\|$ .

Use the descent lemma: $f(\\mathbf{x}_{k+1}) \\leq f(\\mathbf{x}_k) - \\frac{1}{2L}\\|\\nabla f(\\mathbf{x}_k)\\|^2$ .

Solution

Descent property

By $L$ -smoothness and the update rule: $f(\mathbf{x}_{k+1}) \leq f(\mathbf{x}_k) - \frac{1}{2L}\|\mathbf{x}_k - \mathbf{x}_{k+1}\|^2 \cdot L^2 = f(\mathbf{x}_k) - \frac{L}{2}\|\mathbf{x}_k - \mathbf{x}_{k+1}\|^2$ . (This uses the gradient Lipschitz condition applied at the projected point.)

Distance contraction

By the projection property and first-order convexity: $\|\mathbf{x}_{k+1} - \mathbf{x}^\star\|^2 \leq \|\mathbf{x}_k - \mathbf{x}^\star\|^2 - \frac{2}{L}(f(\mathbf{x}_k) - f(\mathbf{x}^\star))$ . Telescoping over $k$ steps gives the $O(1/k)$ bound. $\blacksquare$

ex-ch03-18

Hard

(Greedy antenna selection is near-optimal.) For a MIMO system with $M$ antennas and $K$ RF chains, the capacity with selected set $S$ is $C(S) = \log_2\det(\mathbf{I} + \text{SNR} \cdot \mathbf{H}_S \mathbf{H}_S^H)$ .

Show that $C(S)$ is monotone: $C(S) \leq C(T)$ for $S \subseteq T$ .
Show submodularity by proving the marginal gain from adding antenna $m$ decreases as the set grows.

Show Hint

For (1), use the fact that adding columns to $\\mathbf{H}_S$ can only increase $\\det(\\mathbf{I} + \\text{SNR} \\cdot \\mathbf{H}_S \\mathbf{H}_S^H)$ .

For (2), use the matrix determinant lemma for rank-one updates.

Solution

Monotonicity

$\mathbf{H}_T \mathbf{H}_T^H \succeq \mathbf{H}_S \mathbf{H}_S^H$ for $S \subseteq T$ (adding PSD terms). Since $\log\det$ is operator monotone on PSD matrices, $C(S) \leq C(T)$ .

Submodularity

The marginal gain from adding antenna $m$ to set $S$ is $\Delta C(m \mid S) = \log_2\det(\mathbf{I} + \text{SNR} \cdot \mathbf{H}_{S \cup m} \mathbf{H}_{S \cup m}^H) - \log_2\det(\mathbf{I} + \text{SNR} \cdot \mathbf{H}_S \mathbf{H}_S^H)$ .

By the matrix determinant lemma: $\Delta C(m \mid S) = \log_2(1 + \text{SNR} \cdot \mathbf{h}_m^H (\mathbf{I} + \text{SNR} \cdot \mathbf{H}_S \mathbf{H}_S^H)^{-1} \mathbf{h}_m)$ .

For $S \subseteq T$ : $(\mathbf{I} + \text{SNR} \cdot \mathbf{H}_T \mathbf{H}_T^H)^{-1} \preceq (\mathbf{I} + \text{SNR} \cdot \mathbf{H}_S \mathbf{H}_S^H)^{-1}$ , so $\Delta C(m \mid T) \leq \Delta C(m \mid S)$ . $\blacksquare$

ex-ch03-19

Challenge

(Dual decomposition for NUM.) Consider $K$ flows sharing $L$ links in a network. Flow $k$ uses links in set $\mathcal{L}_k$ and achieves rate $x_k$ . Link $\ell$ has capacity $c_\ell$ . The network utility maximisation (NUM) problem is:

$\max \sum_{k=1}^K U_k(x_k) \quad \text{s.t.} \quad \sum_{k : \ell \in \mathcal{L}_k} x_k \leq c_\ell, \quad x_k \geq 0.$

Assume $U_k$ is strictly concave and increasing (e.g., $\log x_k$ for proportional fairness).

Form the Lagrangian dual.
Show the dual decomposes into $K$ independent sub-problems.
Propose a distributed algorithm using gradient ascent on the dual.
Interpret the dual variables as link prices.

Show Hint

Introduce multipliers $\lambda_\ell \geq 0$ for each capacity constraint and write the Lagrangian. Group terms by flow index $k$ .

Define the "path price" $q_k = \sum_{\ell \in \mathcal{L}_k} \lambda_\ell$ . Each flow sees only its own price.

For the distributed algorithm, update prices via subgradient ascent: $\lambda_\ell^{(t+1)} = [\lambda_\ell^{(t)} + \alpha(\text{load}_\ell - c_\ell)]^+$ .

Solution

Lagrangian

$\mathcal{L}(\mathbf{x}, \boldsymbol{\lambda}) = \sum_k U_k(x_k) - \sum_\ell \lambda_\ell (\sum_{k : \ell \in \mathcal{L}_k} x_k - c_\ell)$ .

Decomposition

$\mathcal{L} = \sum_k [U_k(x_k) - x_k \sum_{\ell \in \mathcal{L}_k} \lambda_\ell] + \sum_\ell \lambda_\ell c_\ell$ . For fixed $\boldsymbol{\lambda}$ , each flow $k$ independently solves $\max_{x_k \geq 0} U_k(x_k) - x_k q_k$ where $q_k = \sum_{\ell \in \mathcal{L}_k} \lambda_\ell$ is the "path price."

Distributed algorithm

Source update: $x_k^{(t+1)} = (U_k')^{-1}(q_k^{(t)})$ .
Link update: $\lambda_\ell^{(t+1)} = [\lambda_\ell^{(t)} + \alpha(\sum_k x_k^{(t+1)} - c_\ell)]^+$ . This is a dual subgradient ascent. Each source needs only its path price; each link needs only its aggregate load.

Interpretation

$\lambda_\ell$ is the "congestion price" of link $\ell$ . TCP Vegas and similar protocols implement this decomposition implicitly: the round-trip time acts as the price signal. $\blacksquare$

ex-ch03-20

Challenge

(Nesterov acceleration.) Prove that Nesterov's accelerated gradient method achieves an $O(1/k^2)$ convergence rate for $L$ -smooth convex functions — matching the theoretical lower bound for first-order methods.

The algorithm: set $\mathbf{y}_0 = \mathbf{x}_0$ . For $k \geq 0$ :

$\mathbf{x}_{k+1} = \mathbf{y}_k - \frac{1}{L}\nabla f(\mathbf{y}_k), \quad \mathbf{y}_{k+1} = \mathbf{x}_{k+1} + \frac{k}{k+3}(\mathbf{x}_{k+1} - \mathbf{x}_k).$

Show Hint

Define a potential function $V_k = k(k-1)(f(\\mathbf{x}_k) - f^\\star) + 2L\\|\\mathbf{v}_k - \\mathbf{x}^\\star\\|^2$ for a suitable sequence $\\mathbf{v}_k$ .

Show that $V_k$ is non-increasing.

Solution

Lyapunov argument (sketch)

Define the potential $\Phi_k = t_k^2(f(\mathbf{x}_k) - f^\star) + \frac{L}{2}\|\mathbf{v}_k - \mathbf{x}^\star\|^2$ where $t_k = (k+1)/2$ and $\mathbf{v}_k = \mathbf{x}_k + t_k(\mathbf{x}_k - \mathbf{x}_{k-1})$ .

Using the $L$ -smoothness descent lemma and the first-order convexity condition, one shows $\Phi_{k+1} \leq \Phi_k$ . Since $\Phi_0 = \frac{L}{2}\|\mathbf{x}_0 - \mathbf{x}^\star\|^2$ : $f(\mathbf{x}_k) - f^\star \leq \frac{2L\|\mathbf{x}_0 - \mathbf{x}^\star\|^2}{(k+1)^2}$ . $\blacksquare$

Exercises

ex-ch03-01

Apply intersection theorem

ex-ch03-02

Triangle inequality

ex-ch03-03

Pointwise supremum

ex-ch03-04

Standard form

Dual

ex-ch03-05

KKT conditions

ex-ch03-06

Gradient computation

ex-ch03-07

Hessian approach

ex-ch03-08

Lagrangian

Minimise over $\mathbf{x}$

Dual function

ex-ch03-09

Try all 3 active

2 active channels

Total rate

ex-ch03-10

Update rule

Divergence condition

ex-ch03-11

Sort

Find $\rho$

Project

ex-ch03-12

Restrict to a line

Second derivative

ex-ch03-13

Global minimum

Convexity check

ex-ch03-14

Expand

Gradient

Convexity

ex-ch03-15

Non-convexity

Fixed-interference approximation

Dual interpretation

ex-ch03-16

Reformulation

SDP relaxation

Randomised rounding

ex-ch03-17

Descent property

Distance contraction

ex-ch03-18

Monotonicity

Submodularity

ex-ch03-19

Lagrangian

Decomposition

Distributed algorithm

Interpretation

ex-ch03-20

Lyapunov argument (sketch)