Prerequisites & Notation

Before You Begin

This chapter introduces GPU computing concepts from the ground up, but assumes comfort with NumPy arrays, basic Python profiling, and a working understanding of computer memory (RAM, cache). If any of these feel unfamiliar, review the linked material first.

  • NumPy array creation, slicing, and vectorized operations (Chapter 5)(Review ch05)

    Self-check: Can you explain why a + b on NumPy arrays is faster than a Python for-loop?

  • Linear algebra operations with NumPy and SciPy (Chapter 6)(Review ch06)

    Self-check: Can you compute a matrix-vector product using the @ operator?

  • Basic understanding of computer memory: RAM, cache, bus

    Self-check: Do you know why accessing contiguous memory is faster than random access?

  • Python virtual environments and conda

    Self-check: Can you create a conda environment and install packages into it?

Notation for This Chapter

Symbols and conventions introduced in this chapter. GPU-specific terminology uses NVIDIA's CUDA nomenclature, which is the industry standard even when discussing vendor-neutral concepts.

SymbolMeaningIntroduced
SMSMStreaming Multiprocessor — the basic compute unit on an NVIDIA GPUs01
CUDAcoreCUDA coreA single-precision floating-point execution unit within an SMs01
NthreadsN_{\mathrm{threads}}Total number of threads in a kernel launchs02
Bx,By,BzB_x, B_y, B_zBlock dimensions (threads per block in each direction)s02
Gx,Gy,GzG_x, G_y, G_zGrid dimensions (blocks per grid in each direction)s02
HBMHBMHigh Bandwidth Memory — main GPU memory (global memory)s01
TxferT_{\mathrm{xfer}}Host-to-device (or device-to-host) transfer times02
TkernelT_{\mathrm{kernel}}Kernel execution time on the GPUs04