Chapter Summary
Chapter Summary
Key Points
- 1.
nn.Module is the building block. Every PyTorch model is an nn.Module tree. Compose with nn.Sequential for chains, nn.ModuleList for indexed collections, nn.ModuleDict for named lookups. Always call
super().__init__()and usemodel(x)notmodel.forward(x). - 2.
The training loop has five steps. (1)
optimizer.zero_grad(), (2) forward pass, (3) loss computation, (4)loss.backward(), (5)optimizer.step(). Callmodel.eval()andtorch.no_grad()for validation. This explicit loop gives full control over every aspect of training. - 3.
Choose losses that match your task. MSE for regression (Gaussian noise assumption), CrossEntropyLoss for classification (categorical distribution), BCEWithLogitsLoss for multi-label. Never apply softmax before CrossEntropyLoss. Custom losses must be differentiable.
- 4.
Infrastructure scales training. DataLoader with
pin_memory=Trueand appropriatenum_workersfor efficient GPU feeding. Cosine annealing LR schedules. Checkpoint model and optimizer state after each epoch. AdamW is the default optimizer. - 5.
Debug systematically. Always verify with the overfit-one-batch test. Monitor gradient norms to detect vanishing/exploding gradients. Use gradient clipping for stability. Check for NaN in gradients and activations.
Looking Ahead
Chapter 27 introduces convolutional neural networks (CNNs), which replace fully connected layers with parameter-sharing convolutions for spatial data. The nn.Module patterns, training loop, and debugging strategies from this chapter apply directly to all architectures in Part VI.