Setting Up the GPU Environment
The GPU Software Stack Is Deep
Getting a GPU working for scientific Python requires several software layers that must be version-compatible: kernel driver, CUDA toolkit, cuDNN, and Python libraries. Mismatched versions are the #1 cause of GPU setup failures.
The good news: conda has largely solved this problem with metapackages that install compatible CUDA+cuDNN+library bundles automatically. But understanding the stack helps when things go wrong.
Definition: NVIDIA GPU Driver
NVIDIA GPU Driver
The NVIDIA driver is the kernel-level software that communicates directly with the GPU hardware. It provides:
- Hardware initialization and power management
- Memory management (allocating GPU memory)
- The CUDA Driver API (low-level interface)
Each driver version has a maximum supported CUDA version. You can run any CUDA toolkit version up to this maximum. The driver is backward compatible β a newer driver supports older CUDA applications.
# Check driver version and supported CUDA
nvidia-smi
The output shows the driver version (e.g., 545.23.08) and the maximum CUDA version (e.g., CUDA 12.3).
Definition: CUDA Toolkit
CUDA Toolkit
The CUDA Toolkit includes:
nvcc: The NVIDIA CUDA compiler- Runtime libraries:
libcudart,libcublas,libcufft, etc. cuda-gdb: GPU debuggernsight: Profiling and debugging tools- Header files for CUDA C/C++ development
For Python-only workflows, you often do not need the full
toolkit installed system-wide. Libraries like CuPy and PyTorch
bundle their own CUDA runtime via cudatoolkit conda packages.
# Check installed CUDA toolkit version
nvcc --version
The CUDA toolkit version and the driver's reported CUDA version can differ. The driver reports the maximum supported CUDA; the toolkit reports the installed CUDA runtime. They do not need to match β the runtime just must not exceed the driver's maximum.
Definition: cuDNN (CUDA Deep Neural Network Library)
cuDNN (CUDA Deep Neural Network Library)
cuDNN is NVIDIA's GPU-accelerated library for deep learning primitives:
- Convolution (forward, backward data, backward filter)
- Pooling, normalization, activation functions
- RNN and attention layers
cuDNN is required by PyTorch and TensorFlow for GPU-accelerated training and inference. It is not needed for CuPy or basic CUDA computing.
cuDNN versions must be compatible with the CUDA toolkit version: e.g., cuDNN 8.9.x works with CUDA 11.x and 12.x.
Example: Setting Up a Complete GPU Environment with Conda
Create a conda environment with PyTorch (CUDA 12.1), CuPy, and common scientific libraries. Verify that all libraries can see the GPU.
Create the environment
conda create -n gpu-sci python=3.11 -y
conda activate gpu-sci
# PyTorch with CUDA 12.1
conda install pytorch torchvision torchaudio \
pytorch-cuda=12.1 -c pytorch -c nvidia -y
# CuPy (matches CUDA 12.x)
conda install -c conda-forge cupy cuda-version=12 -y
# Scientific stack
conda install numpy scipy matplotlib jupyterlab -y
Verify GPU access
# PyTorch
import torch
print(f"PyTorch CUDA: {torch.cuda.is_available()}")
print(f"GPU: {torch.cuda.get_device_name(0)}")
print(f"CUDA version: {torch.version.cuda}")
# CuPy
import cupy as cp
print(f"CuPy CUDA: {cp.cuda.runtime.runtimeGetVersion()}")
x = cp.ones(1000)
print(f"CuPy works: {float(cp.sum(x))}")
Common output
PyTorch CUDA: True
GPU: NVIDIA A100-SXM4-80GB
CUDA version: 12.1
CuPy CUDA: 12010
CuPy works: 1000.0
Example: Diagnosing Common GPU Setup Failures
torch.cuda.is_available() returns False on a machine with
an NVIDIA GPU. Diagnose and fix the issue.
Check driver
nvidia-smi
If this fails: the NVIDIA driver is not installed or the GPU is not detected. Install the driver:
# Ubuntu
sudo apt install nvidia-driver-545
sudo reboot
Check CUDA version compatibility
Compare the driver's CUDA version (from nvidia-smi) with
PyTorch's required CUDA version:
import torch
print(torch.version.cuda) # e.g., "12.1"
If the driver supports CUDA 11.8 but PyTorch was built for CUDA 12.1, install the matching PyTorch:
conda install pytorch pytorch-cuda=11.8 -c pytorch -c nvidia
Check environment isolation
Common cause: the wrong conda environment is active, or a CPU-only PyTorch was installed. Verify:
python -c "import torch; print(torch.__file__)"
conda list | grep pytorch
Ensure the pytorch-cuda package appears in the listing.
Example: GPU Computing with Docker
Run GPU-accelerated Python code inside a Docker container for reproducible environments.
Install NVIDIA Container Toolkit
# Ubuntu
distribution=ID$VERSION_ID)
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \
sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container.gpg
sudo apt update && sudo apt install nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
Use NVIDIA's base images
FROM nvidia/cuda:12.1.1-cudnn8-runtime-ubuntu22.04
RUN apt-get update && apt-get install -y python3-pip
RUN pip install torch cupy-cuda12x numpy scipy
COPY my_script.py /app/
WORKDIR /app
CMD ["python3", "my_script.py"]
Run with GPU access
docker build -t gpu-sci .
docker run --gpus all gpu-sci
The --gpus all flag passes all GPUs to the container.
GPU Environment Setup Methods Compared
| Method | Pros | Cons | Best For |
|---|---|---|---|
| conda install | Version management, easy rollback, cross-platform | Large downloads, conda channel complexity | Local development, research |
| pip + system CUDA | Lightweight, fast install | Manual CUDA management, version conflicts | Servers with pre-installed CUDA |
| Docker + nvidia-container | Perfect reproducibility, isolation | Docker overhead, disk space | Production, shared clusters |
| Cloud (Colab, SageMaker) | Zero setup, free/pay-per-use | Limited control, session limits | Prototyping, teaching |
NVIDIA CUDA vs AMD ROCm Ecosystem
| Aspect | NVIDIA CUDA | AMD ROCm |
|---|---|---|
| Hardware | GeForce, Quadro, Tesla, A100, H100 | Radeon, Instinct MI200, MI300 |
| Compiler | nvcc | hipcc |
| Runtime | CUDA Runtime API | HIP Runtime API |
| Python Libraries | CuPy, PyTorch, TensorFlow (native) | PyTorch (ROCm build), TensorFlow |
| Profiling | Nsight Systems/Compute, nvprof | rocprof, omniperf |
| Ecosystem Maturity | Dominant, 15+ years | Growing, improving rapidly |
| Cloud Availability | AWS, GCP, Azure, all providers | Azure (MI300X), some providers |
Quick Check
Your nvidia-smi shows 'CUDA Version: 12.3' but you install PyTorch with CUDA 11.8. Will it work?
No β the CUDA versions must match exactly
Yes β the driver supports CUDA 11.8 since it supports up to 12.3
Only with a compatibility shim
The driver's CUDA version is the maximum supported. Any runtime version <= 12.3 works.
Common Mistake: Accidentally Installing CPU-Only PyTorch
Mistake:
Running pip install torch or conda install pytorch without
specifying the CUDA channel, which installs the CPU-only build:
pip install torch # CPU-only on most systems!
Correction:
Always specify the CUDA version explicitly:
# pip (from PyTorch website)
pip install torch --index-url https://download.pytorch.org/whl/cu121
# conda
conda install pytorch pytorch-cuda=12.1 -c pytorch -c nvidia
Verify with: python -c "import torch; print(torch.cuda.is_available())"
Historical Note: Evolution of the CUDA Ecosystem
2007-2024CUDA 1.0 (2007) supported only C and required extensive boilerplate. CUDA 4.0 (2011) introduced unified virtual addressing (UVA), eliminating explicit host/device pointer tracking. CUDA 6.0 (2014) added Unified Memory, allowing the runtime to automatically migrate pages between host and device. CUDA 12.0 (2022) introduced lazy module loading and improved graph APIs. Each generation has simplified GPU programming while adding hardware capabilities.
Why This Matters: GPU Acceleration in 5G Base Stations
NVIDIA's Aerial SDK uses GPUs for real-time 5G signal processing. The software-defined baseband runs LDPC decoding, OFDM processing, and MIMO detection entirely on the GPU, replacing dedicated ASIC hardware. This enables flexible, software-upgradeable base stations that can adapt to new radio standards. The setup requires specific CUDA toolkit versions, MPS (Multi-Process Service) for sharing the GPU across multiple cells, and precise driver configurations.
See full treatment in Chapter 20
Key Takeaway
Use conda install pytorch pytorch-cuda=XX.X -c pytorch -c nvidia
for the simplest GPU setup. Always verify with
torch.cuda.is_available() and nvidia-smi. The #1 setup
failure is version mismatch between driver, CUDA toolkit,
and Python library builds.
cuDNN
NVIDIA's GPU-accelerated library for deep learning primitives (convolution, pooling, normalization, RNNs), required by PyTorch and TensorFlow.
Related: CUDA Toolkit
CUDA Toolkit
NVIDIA's SDK including the CUDA compiler (nvcc), runtime libraries, and tools for developing GPU-accelerated applications.
Related: cuDNN
ROCm
AMD's open-source platform for GPU computing, providing HIP (a CUDA-compatible API), compiler toolchain, and math libraries for AMD GPUs.
Related: CUDA Toolkit