Part 7: Large Language Models and Advanced ML

Chapter 35: How Large Language Models Work

Advanced~150 min

Learning Objectives

  • Explain the GPT architecture including positional encoding, causal masking, and layer norm
  • Describe pre-training objectives and scaling laws for large language models
  • Explain RLHF and alignment techniques using policy gradients
  • Compare encoder (BERT), decoder (GPT), and encoder-decoder (T5) model families

Sections

Prerequisites

💬 Discussion

Loading discussions...