Part 3: GPU Computing: CuPy and PyTorch Tensors

Chapter 13: Performance Patterns and Memory Management

Intermediate~120 min

Learning Objectives

Manage GPU memory allocation and avoid out-of-memory errors in large simulations
Implement batched operations for throughput-optimal GPU utilization
Use mixed precision (FP16/BF16) for 2x speedup with acceptable numerical error
Design data loading pipelines that overlap CPU I/O with GPU computation

Sections

Prerequisites & Notation

nextSpecial

Memory Management on GPU

Batched Operations

Mixed Precision

Data Loading and Streaming

Multi-GPU and Distributed Computing

Chapter Summary

Special

References & Further Reading

Special

Prerequisites

💬 Discussion

Loading discussions...