Exercises
ex-ch14-01
EasyWrite the ISTA update for LASSO in one line. Identify the step size and the proximal operator.
Gradient step + soft threshold.
Update
, with .
ex-ch14-02
EasyFISTA improves ISTA's rate from to . By how many iterations must ISTA run to match FISTA at iteration ?
.
Match rates
.
ex-ch14-03
EasyWrite the ADMM updates for LASSO.
Split .
Updates
, , .
ex-ch14-04
EasyDescribe one OMP iteration in words.
Correlation, selection, least squares, residual.
Steps
(1) Compute ; (2) add to the support; (3) solve least-squares on the active set; (4) update residual .
ex-ch14-05
EasyExplain why the posterior mean under the Bernoulli-Gaussian model is generically not sparse.
It averages over supports.
Mixture average
The posterior is a mixture over all supports. Entries of the posterior mean are weighted averages across supports; the weight on "" supports shrinks but does not make exactly zero unless the posterior collapses.
ex-ch14-06
MediumProve that ISTA with step size is a contraction on the LASSO objective.
Use Lipschitz continuity of the gradient of the smooth part and the non-expansiveness of the proximal operator.
Smooth descent
Let . Lipschitz gradient with gives for a full gradient step.
Prox non-expansiveness
.
Combine
The composite iteration decreases at rate .
ex-ch14-07
MediumDerive the ADMM -update for LASSO when with (use the Woodbury identity).
via Woodbury becomes solve.
Woodbury
.
Cost
One Cholesky factorization () shared across iterations; per-iteration cost becomes .
ex-ch14-08
MediumShow that OMP with exact arithmetic recovers every -sparse signal when .
Tropp 2004, coherence-based guarantee.
Inductive argument
At each iteration, the correlation of the residual with any in-support atom dominates the correlation with any out-of-support atom, provided the coherence is small.
Condition
The worst-case correlation gap is , i.e.\ . This ensures OMP never selects a wrong atom and exits after exactly iterations.
ex-ch14-09
MediumExplain why hard thresholding is non-convex while soft thresholding is the proximal operator of a convex function.
Hard = prox of (non-convex); soft = prox of (convex).
Hard threshold
. Discontinuous; equivalent to proximal of which is nonconvex.
Soft threshold
Lipschitz-continuous, monotone, equals which is convex.
Implication
Hard thresholding algorithms (IHT) lack global convergence guarantees; soft thresholding algorithms (ISTA) converge globally.
ex-ch14-10
MediumCompute the proximal operator of the group norm.
Block soft threshold.
Group-wise
For each group , .
ex-ch14-11
MediumShow that ADMM's primal and dual residuals converge to zero as for convex LASSO.
Use Boyd et al. Sec. 3.2.
Lyapunov function
is nonincreasing and bounded below.
Residual decay
where are primal/dual residuals. Summability forces .
ex-ch14-12
MediumFor SBL, derive the M-step update .
Type-II ML in the EM framework.
E-step
Compute posterior : Gaussian with mean , covariance .
M-step
Maximize in : gives .
ex-ch14-13
HardProve Nesterov's rate for FISTA on the LASSO objective.
Construct the potential and bound its increment.
Potential construction
Define with .
Monotonicity
Using the descent lemma at the momentum point , one shows .
Conclude
gives .
ex-ch14-14
HardProve the CoSaMP recovery guarantee: if , CoSaMP recovers -sparse signals to accuracy .
Four steps: identification, support merge, LS estimation, pruning. Use RIP at each step.
Identification
Correlate residual with columns; select top . RIP guarantees signal components dominate noise.
Merge and LS
Merge with current support ( atoms); solve least-squares restricted to this support.
Pruning + induction
Keep top magnitudes. Using -RIP, the error contracts by a factor per iteration, giving geometric convergence to an -neighborhood.
ex-ch14-15
HardShow that SBL fixed points have at most nonzero entries.
Use the rank structure of the marginal covariance .
Marginal likelihood
with , .
Rank argument
is ; adding a nonzero adds at most rank . Once rank is reached, further positive give no improvement to .
Conclude
Wipf-Rao (2004) show the unconstrained maxima lie on faces of the positive orthant with at most positive coordinates.