Part 7: Large Language Models and Advanced ML
Chapter 35: How Large Language Models Work
Advanced~150 min
Learning Objectives
- Explain the GPT architecture including positional encoding, causal masking, and layer norm
- Describe pre-training objectives and scaling laws for large language models
- Explain RLHF and alignment techniques using policy gradients
- Compare encoder (BERT), decoder (GPT), and encoder-decoder (T5) model families
Sections
💬 Discussion
Loading discussions...