Prerequisites & Notation

This chapter builds on:

We go deep into how GPT-family models work, how they are trained at scale, and how RLHF aligns them with human preferences.

Symbol	Meaning
$L$	Number of transformer layers
$d_\text{model}$	Hidden dimension
$n_h$	Number of attention heads
$d_k = d_\text{model}/n_h$	Per-head dimension
$N$	Number of model parameters
$D$	Dataset size (tokens)
$C$	Compute budget (FLOPs)
$\theta$	Model parameters
$\pi_\theta$	Policy (the LLM as a policy)
$r_\phi$	Reward model