Exercises

ex-sp-ch26-01

Easy

Create an nn.Module that implements a single linear layer y=Wx+by = Wx + b without using nn.Linear. Use nn.Parameter for WW and bb.

ex-sp-ch26-02

Easy

Count the total number of parameters in an nn.Sequential model with layers [Linear(100, 256), ReLU, Linear(256, 128), ReLU, Linear(128, 10)]. Verify with sum(p.numel() for p in model.parameters()).

ex-sp-ch26-03

Easy

Write a training loop for a linear regression model y=wx+by = wx + b on synthetic data y=3x+2+ϵy = 3x + 2 + \epsilon where ϵN(0,0.1)\epsilon \sim \mathcal{N}(0, 0.1). Train for 200 steps and verify w3w \approx 3, b2b \approx 2.

ex-sp-ch26-04

Easy

Implement the overfit-one-batch test for a 3-layer MLP on random classification data with 5 classes. Verify the training loss reaches near zero within 500 steps.

ex-sp-ch26-05

Easy

Use a forward hook to record the output of the first hidden layer of an MLP during a forward pass. Print the mean and std of the recorded activations.

ex-sp-ch26-06

Medium

Implement a Dataset and DataLoader for the function y=sin(2πx)+0.1ϵy = \sin(2\pi x) + 0.1\epsilon with 10000 samples, batch size 64, and 80/20 train/validation split using random_split.

ex-sp-ch26-07

Medium

Compare SGD (with momentum 0.9), Adam, and AdamW on a 4-layer MLP for the sine regression task. Plot training loss curves for all three.

ex-sp-ch26-08

Medium

Implement cosine annealing with warmup: linearly increase LR from 0 to lr_max over the first 10 epochs, then cosine decay to lr_min over the remaining epochs.

ex-sp-ch26-09

Medium

Implement a training loop with gradient clipping and gradient norm logging. Print a warning if any gradient norm exceeds 10.0.

ex-sp-ch26-10

Medium

Implement early stopping: stop training when validation loss has not improved for patience=10 consecutive epochs. Save the best model.

ex-sp-ch26-11

Hard

Implement mixed-precision training using torch.amp (automatic mixed precision). Compare training speed and memory usage with and without AMP on a 10-layer MLP with width 1024.

ex-sp-ch26-12

Hard

Build a modular training framework with a Trainer class that accepts a model, optimizer, loss function, and callbacks (e.g., logging, checkpointing, early stopping) as constructor arguments.

ex-sp-ch26-13

Hard

Implement gradient accumulation to simulate a batch size of 512 using actual batches of 32 (accumulate over 16 steps before updating).

ex-sp-ch26-14

Hard

Implement a custom autograd function for a "straight-through estimator" that applies hard thresholding in the forward pass but passes gradients through as if it were the identity in the backward pass.

ex-sp-ch26-15

Challenge

Implement a neural network that learns the BPSK BER function Pb=Q(2Eb/N0)P_b = Q(\sqrt{2 E_b/N_0}) from simulated data. Generate 10000 (Eb/N0,BER)(E_b/N_0, \text{BER}) pairs via Monte Carlo, train an MLP, and compare the learned function to the analytical curve on a log scale.

ex-sp-ch26-16

Challenge

Implement data-parallel training using torch.nn.DataParallel or torch.nn.parallel.DistributedDataParallel. Measure the speedup (or slowdown) with 1 vs 2 GPUs on a large MLP.