Prerequisites & Notation

Prerequisites

This chapter builds on:

  • NLP Foundations (Chapter 34): tokenization, embeddings, attention
  • Deep learning (Chapters 26, 30): PyTorch, training loops, RNNs
  • Transformer basics (Chapter 32): encoder-decoder architecture

We go deep into how GPT-family models work, how they are trained at scale, and how RLHF aligns them with human preferences.

Definition:

Notation for This Chapter

Symbol Meaning
LL Number of transformer layers
dmodeld_\text{model} Hidden dimension
nhn_h Number of attention heads
dk=dmodel/nhd_k = d_\text{model}/n_h Per-head dimension
NN Number of model parameters
DD Dataset size (tokens)
CC Compute budget (FLOPs)
θ\theta Model parameters
πθ\pi_\theta Policy (the LLM as a policy)
rϕr_\phi Reward model