Chapter Summary

Chapter Summary

Key Points

  • 1.

    nn.Module is the building block. Every PyTorch model is an nn.Module tree. Compose with nn.Sequential for chains, nn.ModuleList for indexed collections, nn.ModuleDict for named lookups. Always call super().__init__() and use model(x) not model.forward(x).

  • 2.

    The training loop has five steps. (1) optimizer.zero_grad(), (2) forward pass, (3) loss computation, (4) loss.backward(), (5) optimizer.step(). Call model.eval() and torch.no_grad() for validation. This explicit loop gives full control over every aspect of training.

  • 3.

    Choose losses that match your task. MSE for regression (Gaussian noise assumption), CrossEntropyLoss for classification (categorical distribution), BCEWithLogitsLoss for multi-label. Never apply softmax before CrossEntropyLoss. Custom losses must be differentiable.

  • 4.

    Infrastructure scales training. DataLoader with pin_memory=True and appropriate num_workers for efficient GPU feeding. Cosine annealing LR schedules. Checkpoint model and optimizer state after each epoch. AdamW is the default optimizer.

  • 5.

    Debug systematically. Always verify with the overfit-one-batch test. Monitor gradient norms to detect vanishing/exploding gradients. Use gradient clipping for stability. Check for NaN in gradients and activations.

Looking Ahead

Chapter 27 introduces convolutional neural networks (CNNs), which replace fully connected layers with parameter-sharing convolutions for spatial data. The nn.Module patterns, training loop, and debugging strategies from this chapter apply directly to all architectures in Part VI.