Part 3: GPU Computing: CuPy and PyTorch Tensors

Chapter 13: Performance Patterns and Memory Management

Intermediate~120 min

Learning Objectives

  • Manage GPU memory allocation and avoid out-of-memory errors in large simulations
  • Implement batched operations for throughput-optimal GPU utilization
  • Use mixed precision (FP16/BF16) for 2x speedup with acceptable numerical error
  • Design data loading pipelines that overlap CPU I/O with GPU computation

Sections

Prerequisites

💬 Discussion

Loading discussions...