Setting Up the GPU Environment

The GPU Software Stack Is Deep

Getting a GPU working for scientific Python requires several software layers that must be version-compatible: kernel driver, CUDA toolkit, cuDNN, and Python libraries. Mismatched versions are the #1 cause of GPU setup failures.

The good news: conda has largely solved this problem with metapackages that install compatible CUDA+cuDNN+library bundles automatically. But understanding the stack helps when things go wrong.

Definition:
NVIDIA GPU Driver

The NVIDIA driver is the kernel-level software that communicates directly with the GPU hardware. It provides:

Hardware initialization and power management
Memory management (allocating GPU memory)
The CUDA Driver API (low-level interface)

Each driver version has a maximum supported CUDA version. You can run any CUDA toolkit version up to this maximum. The driver is backward compatible — a newer driver supports older CUDA applications.

# Check driver version and supported CUDA
nvidia-smi

The output shows the driver version (e.g., 545.23.08) and the maximum CUDA version (e.g., CUDA 12.3).

Definition:
CUDA Toolkit

The CUDA Toolkit includes:

nvcc: The NVIDIA CUDA compiler
Runtime libraries: libcudart, libcublas, libcufft, etc.
cuda-gdb: GPU debugger
nsight: Profiling and debugging tools
Header files for CUDA C/C++ development

For Python-only workflows, you often do not need the full toolkit installed system-wide. Libraries like CuPy and PyTorch bundle their own CUDA runtime via cudatoolkit conda packages.

# Check installed CUDA toolkit version
nvcc --version

The CUDA toolkit version and the driver's reported CUDA version can differ. The driver reports the maximum supported CUDA; the toolkit reports the installed CUDA runtime. They do not need to match — the runtime just must not exceed the driver's maximum.

Definition:
cuDNN (CUDA Deep Neural Network Library)

cuDNN is NVIDIA's GPU-accelerated library for deep learning primitives:

Convolution (forward, backward data, backward filter)
Pooling, normalization, activation functions
RNN and attention layers

cuDNN is required by PyTorch and TensorFlow for GPU-accelerated training and inference. It is not needed for CuPy or basic CUDA computing.

cuDNN versions must be compatible with the CUDA toolkit version: e.g., cuDNN 8.9.x works with CUDA 11.x and 12.x.

Example: Setting Up a Complete GPU Environment with Conda

Create a conda environment with PyTorch (CUDA 12.1), CuPy, and common scientific libraries. Verify that all libraries can see the GPU.

Solution

Create the environment

conda create -n gpu-sci python=3.11 -y
conda activate gpu-sci

# PyTorch with CUDA 12.1
conda install pytorch torchvision torchaudio \
    pytorch-cuda=12.1 -c pytorch -c nvidia -y

# CuPy (matches CUDA 12.x)
conda install -c conda-forge cupy cuda-version=12 -y

# Scientific stack
conda install numpy scipy matplotlib jupyterlab -y

Verify GPU access

# PyTorch
import torch
print(f"PyTorch CUDA: {torch.cuda.is_available()}")
print(f"GPU: {torch.cuda.get_device_name(0)}")
print(f"CUDA version: {torch.version.cuda}")

# CuPy
import cupy as cp
print(f"CuPy CUDA: {cp.cuda.runtime.runtimeGetVersion()}")
x = cp.ones(1000)
print(f"CuPy works: {float(cp.sum(x))}")

Common output

PyTorch CUDA: True
GPU: NVIDIA A100-SXM4-80GB
CUDA version: 12.1
CuPy CUDA: 12010
CuPy works: 1000.0

Example: Diagnosing Common GPU Setup Failures

torch.cuda.is_available() returns False on a machine with an NVIDIA GPU. Diagnose and fix the issue.

Solution

Check driver

nvidia-smi

If this fails: the NVIDIA driver is not installed or the GPU is not detected. Install the driver:

# Ubuntu
sudo apt install nvidia-driver-545
sudo reboot

Check CUDA version compatibility

Compare the driver's CUDA version (from nvidia-smi) with PyTorch's required CUDA version:

import torch
print(torch.version.cuda)  # e.g., "12.1"

If the driver supports CUDA 11.8 but PyTorch was built for CUDA 12.1, install the matching PyTorch:

conda install pytorch pytorch-cuda=11.8 -c pytorch -c nvidia

Check environment isolation

Common cause: the wrong conda environment is active, or a CPU-only PyTorch was installed. Verify:

python -c "import torch; print(torch.__file__)"
conda list | grep pytorch

Ensure the pytorch-cuda package appears in the listing.

Example: GPU Computing with Docker

Run GPU-accelerated Python code inside a Docker container for reproducible environments.

Solution

Install NVIDIA Container Toolkit

# Ubuntu
distribution= $(. /etc/os-release; echo$ ID$VERSION_ID)
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \
    sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container.gpg
sudo apt update && sudo apt install nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

Use NVIDIA's base images

FROM nvidia/cuda:12.1.1-cudnn8-runtime-ubuntu22.04

RUN apt-get update && apt-get install -y python3-pip
RUN pip install torch cupy-cuda12x numpy scipy

COPY my_script.py /app/
WORKDIR /app
CMD ["python3", "my_script.py"]

Run with GPU access

docker build -t gpu-sci .
docker run --gpus all gpu-sci

The --gpus all flag passes all GPUs to the container.

GPU Environment Setup Methods Compared

Method	Pros	Cons	Best For
conda install	Version management, easy rollback, cross-platform	Large downloads, conda channel complexity	Local development, research
pip + system CUDA	Lightweight, fast install	Manual CUDA management, version conflicts	Servers with pre-installed CUDA
Docker + nvidia-container	Perfect reproducibility, isolation	Docker overhead, disk space	Production, shared clusters
Cloud (Colab, SageMaker)	Zero setup, free/pay-per-use	Limited control, session limits	Prototyping, teaching

NVIDIA CUDA vs AMD ROCm Ecosystem

Aspect	NVIDIA CUDA	AMD ROCm
Hardware	GeForce, Quadro, Tesla, A100, H100	Radeon, Instinct MI200, MI300
Compiler	nvcc	hipcc
Runtime	CUDA Runtime API	HIP Runtime API
Python Libraries	CuPy, PyTorch, TensorFlow (native)	PyTorch (ROCm build), TensorFlow
Profiling	Nsight Systems/Compute, nvprof	rocprof, omniperf
Ecosystem Maturity	Dominant, 15+ years	Growing, improving rapidly
Cloud Availability	AWS, GCP, Azure, all providers	Azure (MI300X), some providers

Quick Check

Your nvidia-smi shows 'CUDA Version: 12.3' but you install PyTorch with CUDA 11.8. Will it work?

No — the CUDA versions must match exactly

Yes — the driver supports CUDA 11.8 since it supports up to 12.3

Only with a compatibility shim

Correction:

Yes — the driver supports CUDA 11.8 since it supports up to 12.3

The driver's CUDA version is the maximum supported. Any runtime version <= 12.3 works.

Common Mistake: Accidentally Installing CPU-Only PyTorch

Mistake:

Running pip install torch or conda install pytorch without specifying the CUDA channel, which installs the CPU-only build:

pip install torch  # CPU-only on most systems!

Correction:

Always specify the CUDA version explicitly:

# pip (from PyTorch website)
pip install torch --index-url https://download.pytorch.org/whl/cu121

# conda
conda install pytorch pytorch-cuda=12.1 -c pytorch -c nvidia

Verify with: python -c "import torch; print(torch.cuda.is_available())"

Historical Note: Evolution of the CUDA Ecosystem

2007-2024

CUDA 1.0 (2007) supported only C and required extensive boilerplate. CUDA 4.0 (2011) introduced unified virtual addressing (UVA), eliminating explicit host/device pointer tracking. CUDA 6.0 (2014) added Unified Memory, allowing the runtime to automatically migrate pages between host and device. CUDA 12.0 (2022) introduced lazy module loading and improved graph APIs. Each generation has simplified GPU programming while adding hardware capabilities.

Why This Matters: GPU Acceleration in 5G Base Stations

NVIDIA's Aerial SDK uses GPUs for real-time 5G signal processing. The software-defined baseband runs LDPC decoding, OFDM processing, and MIMO detection entirely on the GPU, replacing dedicated ASIC hardware. This enables flexible, software-upgradeable base stations that can adapt to new radio standards. The setup requires specific CUDA toolkit versions, MPS (Multi-Process Service) for sharing the GPU across multiple cells, and precise driver configurations.

See full treatment in Chapter 20

Key Takeaway

Use conda install pytorch pytorch-cuda=XX.X -c pytorch -c nvidia for the simplest GPU setup. Always verify with torch.cuda.is_available() and nvidia-smi. The #1 setup failure is version mismatch between driver, CUDA toolkit, and Python library builds.

cuDNN

NVIDIA's GPU-accelerated library for deep learning primitives (convolution, pooling, normalization, RNNs), required by PyTorch and TensorFlow.

Related: CUDA Toolkit

CUDA Toolkit

NVIDIA's SDK including the CUDA compiler (nvcc), runtime libraries, and tools for developing GPU-accelerated applications.

Related: cuDNN

ROCm

AMD's open-source platform for GPU computing, providing HIP (a CUDA-compatible API), compiler toolchain, and math libraries for AMD GPUs.

Related: CUDA Toolkit

CUDA Programming Model (Conceptual)Profiling GPU Code

Setting Up the GPU Environment

The GPU Software Stack Is Deep

Definition: NVIDIA GPU Driver

Definition: CUDA Toolkit

Definition: cuDNN (CUDA Deep Neural Network Library)

Example: Setting Up a Complete GPU Environment with Conda

Create the environment

Verify GPU access

Common output

Example: Diagnosing Common GPU Setup Failures

Check driver

Check CUDA version compatibility

Check environment isolation

Example: GPU Computing with Docker

Install NVIDIA Container Toolkit

Use NVIDIA's base images

Run with GPU access

GPU Environment Setup Methods Compared

NVIDIA CUDA vs AMD ROCm Ecosystem

Quick Check

Common Mistake: Accidentally Installing CPU-Only PyTorch

Historical Note: Evolution of the CUDA Ecosystem

Why This Matters: GPU Acceleration in 5G Base Stations

Key Takeaway

cuDNN

CUDA Toolkit

ROCm

Definition:
NVIDIA GPU Driver

Definition:
CUDA Toolkit

Definition:
cuDNN (CUDA Deep Neural Network Library)