Exercises
ex-sp-ch08-01
EasyMinimize using scipy.optimize.minimize
with method 'BFGS'. Provide the analytical gradient. Verify that
the solution is .
The gradient is .
Start from .
Implementation
import numpy as np
from scipy.optimize import minimize
def f(x):
return (x[0] - 3)**2 + (x[1] + 1)**2
def grad_f(x):
return np.array([2*(x[0] - 3), 2*(x[1] + 1)])
res = minimize(f, [0, 0], method='BFGS', jac=grad_f)
print(f"x* = {res.x}") # [3.0, -1.0]
print(f"f(x*) = {res.fun:.2e}") # ~0
print(f"Converged: {res.success}")
ex-sp-ch08-02
EasyUse scipy.optimize.brentq to find the root of
on the interval . Verify by plugging back into .
Check that and (or vice versa) for a valid bracket.
Implementation
import numpy as np
from scipy.optimize import brentq
f = lambda x: np.exp(x) - 3*x
x_star = brentq(f, 0, 2)
print(f"Root: {x_star:.10f}")
print(f"f(x*) = {f(x_star):.2e}")
ex-sp-ch08-03
EasyImplement the soft-thresholding operator and apply it to the vector with . How many components become exactly zero?
.
Implementation
import numpy as np
def soft_threshold(v, lam):
return np.sign(v) * np.maximum(np.abs(v) - lam, 0)
v = np.array([-3, -1, 0.5, 0, 2, -0.3, 4])
result = soft_threshold(v, 1.0)
print(f"Result: {result}")
# [-2, 0, 0, 0, 1, 0, 3]
print(f"Zeros: {np.sum(result == 0)}") # 4
ex-sp-ch08-04
EasySolve the linear program: maximize subject to
, , using linprog.
Negate the objective for minimization.
Implementation
from scipy.optimize import linprog
c = [-3, -5] # negate for maximization
A_ub = [[1, 1], [2, 1]]
b_ub = [10, 14]
bounds = [(0, None), (0, None)]
res = linprog(c, A_ub=A_ub, b_ub=b_ub, bounds=bounds)
print(f"x={res.x[0]:.1f}, y={res.x[1]:.1f}")
print(f"Max profit = {-res.fun:.1f}")
ex-sp-ch08-05
EasyUse scipy.optimize.fixed_point to find the fixed point of
starting from . Compare to the
analytical value (Dottie number ).
from scipy.optimize import fixed_point
Implementation
import numpy as np
from scipy.optimize import fixed_point
x_star = fixed_point(np.cos, 0.5)
print(f"Fixed point: {x_star:.10f}")
print(f"cos(x*) = {np.cos(x_star):.10f}")
print(f"|x* - cos(x*)| = {abs(x_star - np.cos(x_star)):.2e}")
ex-sp-ch08-06
MediumCompare the iteration counts of Nelder-Mead, CG, BFGS, and Newton-CG on the Rosenbrock function from starting point . Provide analytical gradient and Hessian for the methods that use them.
The Hessian is .
Use result.nit for iteration count and result.nfev for function evaluations.
Implementation
import numpy as np
from scipy.optimize import minimize
def rosen(x):
return (1-x[0])**2 + 100*(x[1]-x[0]**2)**2
def rosen_grad(x):
return np.array([
-2*(1-x[0]) - 400*x[0]*(x[1]-x[0]**2),
200*(x[1]-x[0]**2)
])
def rosen_hess(x):
return np.array([
[2 - 400*(x[1]-3*x[0]**2), -400*x[0]],
[-400*x[0], 200]
])
x0 = np.array([-1.5, 2.0])
for method in ['Nelder-Mead', 'CG', 'BFGS', 'Newton-CG']:
kw = {}
if method in ('CG', 'BFGS', 'Newton-CG'):
kw['jac'] = rosen_grad
if method == 'Newton-CG':
kw['hess'] = rosen_hess
res = minimize(rosen, x0, method=method, **kw)
print(f"{method:12s}: nit={res.nit:4d}, nfev={res.nfev:4d}, "
f"f*={res.fun:.2e}")
ex-sp-ch08-07
MediumImplement water-filling power allocation for channels with total power . Channel gains are for . How many channels receive zero power? Verify with CVXPY.
Sort channels by gain; use the iterative water-level formula.
In CVXPY: cp.Maximize(cp.sum(cp.log(1 + cp.multiply(gains, p)))).
Water-filling
import numpy as np
N = 12
gains = 10**((20 - 3*np.arange(N))/10)
P, sigma2 = 20.0, 1.0
noise_floor = sigma2 / gains
idx = np.argsort(noise_floor)
nf_sorted = noise_floor[idx]
for K in range(N, 0, -1):
mu = (P + np.sum(nf_sorted[:K])) / K
p = np.maximum(mu - nf_sorted, 0)
if p[K-1] > 0:
break
powers = np.zeros(N)
powers[idx] = p
print(f"Zero-power channels: {np.sum(powers == 0)}")
ex-sp-ch08-08
MediumSolve the LASSO problem for a random system with 10 true nonzeros using both CVXPY and ISTA. Compare the solutions and convergence.
For ISTA, use step size where .
Run ISTA for 1000 iterations and plot the objective vs iteration.
Implementation
import numpy as np
import cvxpy as cp
np.random.seed(42)
m, n, k = 50, 200, 10
A = np.random.randn(m, n)
x_true = np.zeros(n); x_true[:k] = np.random.randn(k)
b = A @ x_true + 0.1*np.random.randn(m)
lam = 0.5
# CVXPY
x_cv = cp.Variable(n)
prob = cp.Problem(cp.Minimize(0.5*cp.sum_squares(A@x_cv-b) + lam*cp.norm(x_cv,1)))
prob.solve()
# ISTA
L = np.linalg.norm(A.T @ A, 2)
x_is = np.zeros(n)
for _ in range(1000):
x_is = np.sign(x_is - A.T@(A@x_is-b)/L) * np.maximum(
np.abs(x_is - A.T@(A@x_is-b)/L) - lam/L, 0)
print(f"||x_cvxpy - x_ista|| = {np.linalg.norm(x_cv.value - x_is):.4f}")
ex-sp-ch08-09
MediumSolve the system of nonlinear equations
, using fsolve from multiple
starting points. Find all four solutions.
Try starting points in each quadrant: .
Provide the Jacobian for faster convergence.
Implementation
from scipy.optimize import fsolve
import numpy as np
def sys(v):
x, y = v
return [x**2 + y**2 - 25, x**2 - y - 3]
def jac(v):
x, y = v
return [[2*x, 2*y], [2*x, -1]]
starts = [(3,3), (-3,3), (3,-3), (-3,-3)]
solutions = set()
for s in starts:
sol, info, ier, msg = fsolve(sys, s, fprime=jac, full_output=True)
if ier == 1:
solutions.add(tuple(np.round(sol, 8)))
for s in sorted(solutions):
print(f"({s[0]:.4f}, {s[1]:.4f})")
ex-sp-ch08-10
MediumImplement the projection onto the probability simplex and test it on the vector . Verify that the result satisfies the simplex constraints.
Sort in decreasing order and find the threshold.
Implementation
import numpy as np
def proj_simplex(v):
n = len(v)
u = np.sort(v)[::-1]
cssv = np.cumsum(u) - 1
rho = np.max(np.where(u > cssv / np.arange(1, n+1)))
theta = cssv[rho] / (rho + 1)
return np.maximum(v - theta, 0)
v = np.array([1.5, -0.5, 0.3, 0.8, -0.1])
w = proj_simplex(v)
print(f"Projection: {w}")
print(f"Sum: {np.sum(w):.10f}")
print(f"All >= 0: {np.all(w >= -1e-10)}")
ex-sp-ch08-11
MediumUse trust-constr to minimize subject to
and , . Compare the result
with the analytical solution.
The analytical solution is .
Use LinearConstraint for the equality constraint.
Implementation
import numpy as np
from scipy.optimize import minimize, LinearConstraint, Bounds
f = lambda x: x[0]**2 + x[1]**2
grad_f = lambda x: 2*x
hess_f = lambda x: 2*np.eye(2)
lc = LinearConstraint([[1, 1]], [1], [1])
bounds = Bounds([0.1, 0.1], [np.inf, np.inf])
res = minimize(f, [0.5, 0.5], method='trust-constr',
jac=grad_f, hess=hess_f,
constraints=lc, bounds=bounds)
print(f"x* = {res.x}")
print(f"f* = {res.fun:.6f}")
ex-sp-ch08-12
HardImplement FISTA (Fast ISTA) for the LASSO problem and compare its convergence rate with ISTA on a random system. Plot vs iteration for both methods on a log scale.
FISTA adds momentum: .
Use CVXPY to find as reference.
Key idea
ISTA converges as while FISTA converges as . On the log-scale plot, ISTA should show slope and FISTA slope (asymptotically).
ex-sp-ch08-13
HardSolve the SDP relaxation of MAX-CUT for a random graph with nodes and edge probability . Implement Goemans-Williamson randomized rounding and compute the approximation ratio.
Use cp.Variable((n,n), symmetric=True) with X >> 0 and X[i,i] == 1.
Round using random hyperplanes: where and .
Key idea
The Goemans-Williamson algorithm guarantees at least of the optimal cut in expectation. With 1000 random hyperplanes, you should find a good cut.
ex-sp-ch08-14
HardImplement Anderson acceleration for the fixed-point iteration and compare convergence speed with plain iteration. Use history depth .
Anderson acceleration finds weights that minimize the residual.
Use scipy.optimize.root(method="anderson").
Key idea
Anderson acceleration typically reduces iteration count by 3-10x compared to plain iteration. The residual norms should decrease much faster.
ex-sp-ch08-15
HardImplement the 1D total variation proximal operator using Chambolle's dual algorithm. Apply it to a noisy piecewise-constant signal (, 5 pieces, SNR=10 dB) and compare denoised output for different values.
Generate: constant segments of random heights + Gaussian noise.
Chambolle iteration: .
Key idea
Small preserves detail but retains noise; large over-smooths. The optimal trades off between the two. TV denoising preserves sharp edges, unlike Gaussian smoothing.
ex-sp-ch08-16
HardUse scipy.optimize.minimize with method='Newton-CG' providing
only Hessian-vector products (via hessp) to minimize a logistic
regression loss on a dataset. Compare timing
with full Hessian via hess.
The Hessian-vector product is where .
This avoids forming the Hessian matrix.
Key idea
For large , Hessian-vector products cost while
forming and applying the Hessian costs .
The hessp approach should be faster when is large.
ex-sp-ch08-17
ChallengeImplement ADMM (Alternating Direction Method of Multipliers) for the basis pursuit denoising problem . Compare with CVXPY's solution on a system with 5 true nonzeros.
ADMM splits: -update = L2 projection, -update = soft-threshold.
Use augmented Lagrangian with penalty .
Key idea
ADMM decomposes the problem into easy subproblems. The -update is a least-squares solve; the -update is soft-thresholding. Convergence may require 100-500 iterations but each is cheap.
ex-sp-ch08-18
ChallengeFormulate and solve a robust beamforming problem as an SOCP in CVXPY: minimize transmit power subject to and for interferer directions , where is the steering vector for a ULA with antennas.
Steering vector: for .
The constraint can be written as by absorbing phase.
Key idea
After fixing the phase of the desired signal constraint, this becomes an SOCP. CVXPY handles complex variables natively. The resulting beamformer should show a beam toward and nulls toward interferers.