Prerequisites & Notation

Before You Begin

This chapter assumes familiarity with NumPy array manipulation (Chapter 5), basic linear algebra (Chapter 6), and gradient-based optimisation concepts (Chapter 8). You should also have a working PyTorch installation (pip install torch).

  • NumPy arrays, broadcasting, and dtypes (Chapter 5)(Review ch05)

    Self-check: Can you reshape, slice, and broadcast NumPy arrays confidently?

  • Linear algebra: matrix-vector products, eigendecomposition (Chapter 6)(Review ch06)

    Self-check: Can you compute y=Wx+b\mathbf{y} = \mathbf{W}\mathbf{x} + \mathbf{b} in NumPy?

  • Gradient descent and optimisation (Chapter 8)(Review ch08)

    Self-check: Do you know the update rule θθηθL\theta \leftarrow \theta - \eta \nabla_{\theta} L?

  • Python classes, inheritance, and __init__ / __call__ protocols (Chapter 3)(Review ch03)

    Self-check: Can you write a class with super().__init__() and override a method?

Notation for This Chapter

Symbols and conventions used throughout this chapter.

SymbolMeaningIntroduced
mathbfW\\mathbf{W}, mathbfb\\mathbf{b}Weight matrix and bias vector of a linear layers01
sigma(cdot)\\sigma(\\cdot)Activation function (ReLU, sigmoid, etc.)s01
L(hatmathbfy,mathbfy)L(\\hat{\\mathbf{y}}, \\mathbf{y})Loss function measuring prediction errors02
eta\\etaLearning rates02
theta\\thetaCollective model parameterss01
\\nabla_\\theta LGradient of loss with respect to parameterss02
hatmathbfy\\hat{\\mathbf{y}}Model prediction (network output)s02
BBMini-batch sizes04