Prerequisites & Notation
Before You Begin
This chapter introduces GPU computing concepts from the ground up, but assumes comfort with NumPy arrays, basic Python profiling, and a working understanding of computer memory (RAM, cache). If any of these feel unfamiliar, review the linked material first.
- NumPy array creation, slicing, and vectorized operations (Chapter 5)(Review ch05)
Self-check: Can you explain why
a + bon NumPy arrays is faster than a Python for-loop? - Linear algebra operations with NumPy and SciPy (Chapter 6)(Review ch06)
Self-check: Can you compute a matrix-vector product using the @ operator?
- Basic understanding of computer memory: RAM, cache, bus
Self-check: Do you know why accessing contiguous memory is faster than random access?
- Python virtual environments and conda
Self-check: Can you create a conda environment and install packages into it?
Notation for This Chapter
Symbols and conventions introduced in this chapter. GPU-specific terminology uses NVIDIA's CUDA nomenclature, which is the industry standard even when discussing vendor-neutral concepts.
| Symbol | Meaning | Introduced |
|---|---|---|
| Streaming Multiprocessor — the basic compute unit on an NVIDIA GPU | s01 | |
| A single-precision floating-point execution unit within an SM | s01 | |
| Total number of threads in a kernel launch | s02 | |
| Block dimensions (threads per block in each direction) | s02 | |
| Grid dimensions (blocks per grid in each direction) | s02 | |
| High Bandwidth Memory — main GPU memory (global memory) | s01 | |
| Host-to-device (or device-to-host) transfer time | s02 | |
| Kernel execution time on the GPU | s04 |