Part 6: Deep Learning with PyTorch

Chapter 30: Attention and Transformer Architectures

Intermediate~150 min

Learning Objectives

Implement scaled dot-product attention and multi-head attention from scratch
Build a complete transformer encoder-decoder architecture
Implement Vision Transformer (ViT) for image classification and processing
Adapt transformers for scientific data: sequences, grids, and point clouds

Sections

Prerequisites & Notation

nextSpecial

The Attention Mechanism

The Transformer Architecture

Vision Transformer (ViT)

Transformers for Scientific Data

Chapter Summary

Special

References & Further Reading

Special

Prerequisites

💬 Discussion

Loading discussions...