Exercises
ex-sp-ch30-01
EasyImplement scaled dot-product attention from scratch. Verify on Q, K, V of shape (1, 4, 8).
ex-sp-ch30-02
EasyCompute sinusoidal positional encodings for d_model=64, max_len=100. Plot as a heatmap.
ex-sp-ch30-03
EasyCreate a causal mask for sequence length 8 and verify masked positions get zero attention.
ex-sp-ch30-04
EasyImplement nn.MultiheadAttention and compare output to your from-scratch implementation.
ex-sp-ch30-05
EasyCount parameters in a transformer block with d_model=512, n_heads=8, d_ff=2048.
ex-sp-ch30-06
MediumImplement multi-head attention from scratch with h=4 heads and d_model=64.
ex-sp-ch30-07
MediumBuild a 4-layer transformer encoder and train it for sequence classification.
ex-sp-ch30-08
MediumImplement ViT for CIFAR-10 (32x32, patch_size=4). Train for 20 epochs.
ex-sp-ch30-09
MediumVisualise attention weights for each head and interpret what patterns emerge.
ex-sp-ch30-10
MediumCompare transformer vs LSTM on a sequence classification task. Report accuracy and speed.
ex-sp-ch30-11
HardImplement a transformer decoder with causal masking and cross-attention.
ex-sp-ch30-12
HardImplement KV-cache for efficient autoregressive generation. Measure speedup.
ex-sp-ch30-13
HardImplement learned positional embeddings and compare to sinusoidal on a task.
ex-sp-ch30-14
HardImplement rotary position embeddings (RoPE) and integrate with multi-head attention.
ex-sp-ch30-15
ChallengeApply a transformer to OFDM channel estimation: tokens = subcarriers, features = pilot measurements.
ex-sp-ch30-16
ChallengeImplement Flash Attention v2 (tiled, IO-aware) and compare memory usage to standard attention.