Interoperability: NumPy, CuPy, PyTorch

Definition:

Zero-Copy NumPy <-> PyTorch Conversion

CPU tensors and NumPy arrays can share the same memory:

import numpy as np
import torch

# NumPy -> PyTorch (zero-copy, shared memory)
a = np.array([1.0, 2.0, 3.0])
t = torch.from_numpy(a)        # shares memory
t[0] = 99.0
print(a[0])                     # 99.0 β€” mutation visible!

# PyTorch -> NumPy (zero-copy for CPU tensors)
t2 = torch.randn(5)
a2 = t2.numpy()                 # shares memory

Critical constraint: .numpy() only works on CPU tensors. For GPU tensors, call .cpu().numpy() or .detach().cpu().numpy().

Zero-copy conversion preserves the dtype. Since PyTorch defaults to float32 and NumPy to float64, the resulting array will have the dtype of the source β€” be aware of this mismatch.

Definition:

DLPack: Universal Tensor Exchange

DLPack is a protocol for zero-copy tensor sharing between frameworks, including GPU memory. Since Python 3.10 and NumPy 1.23, the __dlpack__ protocol is standardized:

import torch
import numpy as np

# PyTorch -> anything via DLPack
t = torch.randn(5, device="cuda")
capsule = t.__dlpack__()

# Anything -> PyTorch
a = np.array([1.0, 2.0])
t = torch.from_dlpack(a)         # zero-copy from NumPy

# CuPy <-> PyTorch (GPU zero-copy)
import cupy as cp
c = cp.from_dlpack(t)            # PyTorch GPU -> CuPy
t2 = torch.from_dlpack(c)        # CuPy -> PyTorch GPU

DLPack is the lingua franca of tensor libraries β€” it works across NumPy, CuPy, PyTorch, JAX, and TensorFlow.

Definition:

Writing Backend-Agnostic Code

To write code that works with NumPy, CuPy, and PyTorch tensors interchangeably, use the Array API standard (NEP 47):

import array_api_compat

def normalize(x):
    xp = array_api_compat.array_namespace(x)
    return x / xp.linalg.norm(x)

# Works with any backend:
normalize(np.array([3.0, 4.0]))        # NumPy
normalize(torch.tensor([3.0, 4.0]))    # PyTorch
normalize(cp.array([3.0, 4.0]))        # CuPy

The array_namespace function inspects the input and returns the appropriate module (numpy, torch, cupy), letting you write framework-agnostic scientific code.

The Array API standard covers a common subset of operations. For advanced features (autograd, custom CUDA kernels), you still need framework-specific code.

Theorem: Conditions for Zero-Copy Conversion

A zero-copy conversion between NumPy and PyTorch is possible if and only if:

  1. The tensor resides on CPU.
  2. The data type is supported by both libraries.
  3. The tensor is contiguous in memory (C or Fortran order).

If any condition fails, torch.from_numpy or .numpy() either raises an error or performs an implicit copy.

Both NumPy and PyTorch use the strided memory model, so they can interpret each other's memory layouts directly β€” as long as the data is physically accessible (CPU) and laid out compatibly.

Example: SciPy + PyTorch Mixed Computation Pipeline

Use SciPy for sparse matrix assembly (which PyTorch lacks), convert to a dense PyTorch tensor, compute eigenvalues on GPU, and convert back to NumPy for plotting.

Example: Zero-Copy GPU Transfer Between CuPy and PyTorch

Demonstrate zero-copy GPU memory sharing between CuPy and PyTorch using DLPack.

Interoperability Transfer Benchmark

Compare the time and memory cost of different conversion methods (from_numpy, DLPack, explicit copy) across array sizes.

Parameters

Tensor Conversion Methods

MethodCopy?GPU?Notes
torch.from_numpy(a)No (shared)CPU onlyFastest for CPU NumPy arrays
torch.tensor(a)Yes (always)AnyCreates independent copy
torch.as_tensor(a)No (if possible)CPU only for NumPySmart: avoids copy when safe
t.numpy()No (shared)CPU onlyRequires .cpu() for GPU tensors
torch.from_dlpack(x)No (shared)YesUniversal protocol, works with CuPy/JAX
cp.from_dlpack(t)No (shared)YesPyTorch GPU -> CuPy GPU

Quick Check

After t = torch.from_numpy(a), you modify t[0] = 99. What happens to a[0]?

a[0] remains unchanged

a[0] becomes 99

A RuntimeError is raised

It depends on the dtype

Common Mistake: Calling .numpy() on GPU Tensors

Mistake:

Calling .numpy() on a CUDA tensor:

t = torch.randn(5, device="cuda")
a = t.numpy()   # RuntimeError!

Correction:

Move to CPU first, and detach from the graph if needed:

a = t.detach().cpu().numpy()

Historical Note: The DLPack Standard

2010s-2020s

DLPack was proposed in 2017 by the DMLC (Distributed Machine Learning Community) to solve the growing fragmentation between tensor libraries. By 2022, it was adopted by NumPy, CuPy, PyTorch, JAX, and TensorFlow. The Python Array API consortium (2020-present) built on DLPack to define a standard set of array operations across all these libraries.

Key Takeaway

NumPy, CuPy, and PyTorch can share memory zero-copy on CPU (via from_numpy/.numpy()) and on GPU (via DLPack). Use the Array API standard for backend-agnostic code. Never call .numpy() on GPU tensors β€” always .cpu() first.

DLPack

A protocol for zero-copy tensor exchange between deep learning frameworks, supporting both CPU and GPU memory.

Related: Tensor

Array API Standard

A specification (NEP 47) defining a common subset of array operations that NumPy, PyTorch, CuPy, and JAX all implement, enabling backend-agnostic code.

Related: DLPack

NumPy-CuPy-PyTorch Interoperability

python
Complete examples of converting between NumPy, CuPy, and PyTorch with zero-copy methods, DLPack, and the Array API.
# Code from: ch12/python/interop.py
# Load from backend supplements endpoint

Backend-Agnostic Scientific Computing

python
Patterns for writing scientific code that runs on NumPy, CuPy, or PyTorch without modification.
# Code from: ch12/python/backend_agnostic.py
# Load from backend supplements endpoint