Prerequisites & Notation

Before You Begin

This chapter assumes familiarity with NumPy array manipulation (Chapter 5), basic linear algebra (Chapter 6), and gradient-based optimisation concepts (Chapter 8). You should also have a working PyTorch installation (pip install torch).

NumPy arrays, broadcasting, and dtypes (Chapter 5)(Review ch05)
Self-check: Can you reshape, slice, and broadcast NumPy arrays confidently?
Linear algebra: matrix-vector products, eigendecomposition (Chapter 6)(Review ch06)
Self-check: Can you compute $\mathbf{y} = \mathbf{W}\mathbf{x} + \mathbf{b}$ in NumPy?
Gradient descent and optimisation (Chapter 8)(Review ch08)
Self-check: Do you know the update rule $\theta \leftarrow \theta - \eta \nabla_{\theta} L$ ?
Python classes, inheritance, and __init__ / __call__ protocols (Chapter 3)(Review ch03)
Self-check: Can you write a class with super().__init__() and override a method?

Notation for This Chapter

Symbols and conventions used throughout this chapter.

Symbol	Meaning	Introduced
$\\mathbf{W}$ , $\\mathbf{b}$	Weight matrix and bias vector of a linear layer	s01
$\\sigma(\\cdot)$	Activation function (ReLU, sigmoid, etc.)	s01
$L(\\hat{\\mathbf{y}}, \\mathbf{y})$	Loss function measuring prediction error	s02
$\\eta$	Learning rate	s02
$\\theta$	Collective model parameters	s01
$\\nabla_\\theta L$	Gradient of loss with respect to parameters	s02
$\\hat{\\mathbf{y}}$	Model prediction (network output)	s02
$B$	Mini-batch size	s04

← Ch 25 nn.Module and Model Definition