Prerequisites & Notation

Prerequisites

This chapter assumes familiarity with:

NumPy and linear algebra (Chapters 5-6): matrix operations, SVD
PyTorch fundamentals (Chapter 26): tensors, autograd, nn.Module
Recurrent networks (Chapter 30): RNNs, LSTMs, sequence modeling

We introduce NLP from the ground up, starting with how to represent text as numbers, progressing through embeddings, and culminating in language modeling with both RNNs and transformers.

Definition:
Notation for This Chapter

Symbol	Meaning
$\mathcal{V}$	Vocabulary (set of all tokens)
$V = \|\mathcal{V}\|$	Vocabulary size
$d$	Embedding dimension
$\mathbf{E} \in \mathbb{R}^{V \times d}$	Embedding matrix
$\mathbf{e}_w \in \mathbb{R}^d$	Embedding vector for word $w$
$T$	Sequence length
$w_t$	Token at position $t$
$P(w_t \mid w_{<t})$	Next-token probability

← Ch 33 Text Representation

Prerequisites & Notation

Prerequisites

Definition: Notation for This Chapter

Definition:
Notation for This Chapter