Setting Up the GPU Environment

The GPU Software Stack Is Deep

Getting a GPU working for scientific Python requires several software layers that must be version-compatible: kernel driver, CUDA toolkit, cuDNN, and Python libraries. Mismatched versions are the #1 cause of GPU setup failures.

The good news: conda has largely solved this problem with metapackages that install compatible CUDA+cuDNN+library bundles automatically. But understanding the stack helps when things go wrong.

Definition:

NVIDIA GPU Driver

The NVIDIA driver is the kernel-level software that communicates directly with the GPU hardware. It provides:

  • Hardware initialization and power management
  • Memory management (allocating GPU memory)
  • The CUDA Driver API (low-level interface)

Each driver version has a maximum supported CUDA version. You can run any CUDA toolkit version up to this maximum. The driver is backward compatible β€” a newer driver supports older CUDA applications.

# Check driver version and supported CUDA
nvidia-smi

The output shows the driver version (e.g., 545.23.08) and the maximum CUDA version (e.g., CUDA 12.3).

Definition:

CUDA Toolkit

The CUDA Toolkit includes:

  • nvcc: The NVIDIA CUDA compiler
  • Runtime libraries: libcudart, libcublas, libcufft, etc.
  • cuda-gdb: GPU debugger
  • nsight: Profiling and debugging tools
  • Header files for CUDA C/C++ development

For Python-only workflows, you often do not need the full toolkit installed system-wide. Libraries like CuPy and PyTorch bundle their own CUDA runtime via cudatoolkit conda packages.

# Check installed CUDA toolkit version
nvcc --version

The CUDA toolkit version and the driver's reported CUDA version can differ. The driver reports the maximum supported CUDA; the toolkit reports the installed CUDA runtime. They do not need to match β€” the runtime just must not exceed the driver's maximum.

Definition:

cuDNN (CUDA Deep Neural Network Library)

cuDNN is NVIDIA's GPU-accelerated library for deep learning primitives:

  • Convolution (forward, backward data, backward filter)
  • Pooling, normalization, activation functions
  • RNN and attention layers

cuDNN is required by PyTorch and TensorFlow for GPU-accelerated training and inference. It is not needed for CuPy or basic CUDA computing.

cuDNN versions must be compatible with the CUDA toolkit version: e.g., cuDNN 8.9.x works with CUDA 11.x and 12.x.

Example: Setting Up a Complete GPU Environment with Conda

Create a conda environment with PyTorch (CUDA 12.1), CuPy, and common scientific libraries. Verify that all libraries can see the GPU.

Example: Diagnosing Common GPU Setup Failures

torch.cuda.is_available() returns False on a machine with an NVIDIA GPU. Diagnose and fix the issue.

Example: GPU Computing with Docker

Run GPU-accelerated Python code inside a Docker container for reproducible environments.

GPU Environment Setup Methods Compared

MethodProsConsBest For
conda installVersion management, easy rollback, cross-platformLarge downloads, conda channel complexityLocal development, research
pip + system CUDALightweight, fast installManual CUDA management, version conflictsServers with pre-installed CUDA
Docker + nvidia-containerPerfect reproducibility, isolationDocker overhead, disk spaceProduction, shared clusters
Cloud (Colab, SageMaker)Zero setup, free/pay-per-useLimited control, session limitsPrototyping, teaching

NVIDIA CUDA vs AMD ROCm Ecosystem

AspectNVIDIA CUDAAMD ROCm
HardwareGeForce, Quadro, Tesla, A100, H100Radeon, Instinct MI200, MI300
Compilernvcchipcc
RuntimeCUDA Runtime APIHIP Runtime API
Python LibrariesCuPy, PyTorch, TensorFlow (native)PyTorch (ROCm build), TensorFlow
ProfilingNsight Systems/Compute, nvprofrocprof, omniperf
Ecosystem MaturityDominant, 15+ yearsGrowing, improving rapidly
Cloud AvailabilityAWS, GCP, Azure, all providersAzure (MI300X), some providers

Quick Check

Your nvidia-smi shows 'CUDA Version: 12.3' but you install PyTorch with CUDA 11.8. Will it work?

No β€” the CUDA versions must match exactly

Yes β€” the driver supports CUDA 11.8 since it supports up to 12.3

Only with a compatibility shim

Common Mistake: Accidentally Installing CPU-Only PyTorch

Mistake:

Running pip install torch or conda install pytorch without specifying the CUDA channel, which installs the CPU-only build:

pip install torch  # CPU-only on most systems!

Correction:

Always specify the CUDA version explicitly:

# pip (from PyTorch website)
pip install torch --index-url https://download.pytorch.org/whl/cu121

# conda
conda install pytorch pytorch-cuda=12.1 -c pytorch -c nvidia

Verify with: python -c "import torch; print(torch.cuda.is_available())"

Historical Note: Evolution of the CUDA Ecosystem

2007-2024

CUDA 1.0 (2007) supported only C and required extensive boilerplate. CUDA 4.0 (2011) introduced unified virtual addressing (UVA), eliminating explicit host/device pointer tracking. CUDA 6.0 (2014) added Unified Memory, allowing the runtime to automatically migrate pages between host and device. CUDA 12.0 (2022) introduced lazy module loading and improved graph APIs. Each generation has simplified GPU programming while adding hardware capabilities.

Why This Matters: GPU Acceleration in 5G Base Stations

NVIDIA's Aerial SDK uses GPUs for real-time 5G signal processing. The software-defined baseband runs LDPC decoding, OFDM processing, and MIMO detection entirely on the GPU, replacing dedicated ASIC hardware. This enables flexible, software-upgradeable base stations that can adapt to new radio standards. The setup requires specific CUDA toolkit versions, MPS (Multi-Process Service) for sharing the GPU across multiple cells, and precise driver configurations.

See full treatment in Chapter 20

Key Takeaway

Use conda install pytorch pytorch-cuda=XX.X -c pytorch -c nvidia for the simplest GPU setup. Always verify with torch.cuda.is_available() and nvidia-smi. The #1 setup failure is version mismatch between driver, CUDA toolkit, and Python library builds.

cuDNN

NVIDIA's GPU-accelerated library for deep learning primitives (convolution, pooling, normalization, RNNs), required by PyTorch and TensorFlow.

Related: CUDA Toolkit

CUDA Toolkit

NVIDIA's SDK including the CUDA compiler (nvcc), runtime libraries, and tools for developing GPU-accelerated applications.

Related: cuDNN

ROCm

AMD's open-source platform for GPU computing, providing HIP (a CUDA-compatible API), compiler toolchain, and math libraries for AMD GPUs.

Related: CUDA Toolkit