Interoperability: NumPy, CuPy, PyTorch

Definition:
Zero-Copy NumPy <-> PyTorch Conversion

CPU tensors and NumPy arrays can share the same memory:

import numpy as np
import torch

# NumPy -> PyTorch (zero-copy, shared memory)
a = np.array([1.0, 2.0, 3.0])
t = torch.from_numpy(a)        # shares memory
t[0] = 99.0
print(a[0])                     # 99.0 — mutation visible!

# PyTorch -> NumPy (zero-copy for CPU tensors)
t2 = torch.randn(5)
a2 = t2.numpy()                 # shares memory

Critical constraint: .numpy() only works on CPU tensors. For GPU tensors, call .cpu().numpy() or .detach().cpu().numpy().

Zero-copy conversion preserves the dtype. Since PyTorch defaults to float32 and NumPy to float64, the resulting array will have the dtype of the source — be aware of this mismatch.

Definition:
DLPack: Universal Tensor Exchange

DLPack is a protocol for zero-copy tensor sharing between frameworks, including GPU memory. Since Python 3.10 and NumPy 1.23, the __dlpack__ protocol is standardized:

import torch
import numpy as np

# PyTorch -> anything via DLPack
t = torch.randn(5, device="cuda")
capsule = t.__dlpack__()

# Anything -> PyTorch
a = np.array([1.0, 2.0])
t = torch.from_dlpack(a)         # zero-copy from NumPy

# CuPy <-> PyTorch (GPU zero-copy)
import cupy as cp
c = cp.from_dlpack(t)            # PyTorch GPU -> CuPy
t2 = torch.from_dlpack(c)        # CuPy -> PyTorch GPU

DLPack is the lingua franca of tensor libraries — it works across NumPy, CuPy, PyTorch, JAX, and TensorFlow.

Definition:
Writing Backend-Agnostic Code

To write code that works with NumPy, CuPy, and PyTorch tensors interchangeably, use the Array API standard (NEP 47):

import array_api_compat

def normalize(x):
    xp = array_api_compat.array_namespace(x)
    return x / xp.linalg.norm(x)

# Works with any backend:
normalize(np.array([3.0, 4.0]))        # NumPy
normalize(torch.tensor([3.0, 4.0]))    # PyTorch
normalize(cp.array([3.0, 4.0]))        # CuPy

The array_namespace function inspects the input and returns the appropriate module (numpy, torch, cupy), letting you write framework-agnostic scientific code.

The Array API standard covers a common subset of operations. For advanced features (autograd, custom CUDA kernels), you still need framework-specific code.

Theorem: Conditions for Zero-Copy Conversion

A zero-copy conversion between NumPy and PyTorch is possible if and only if:

The tensor resides on CPU.
The data type is supported by both libraries.
The tensor is contiguous in memory (C or Fortran order).

If any condition fails, torch.from_numpy or .numpy() either raises an error or performs an implicit copy.

Both NumPy and PyTorch use the strided memory model, so they can interpret each other's memory layouts directly — as long as the data is physically accessible (CPU) and laid out compatibly.

Example: SciPy + PyTorch Mixed Computation Pipeline

Use SciPy for sparse matrix assembly (which PyTorch lacks), convert to a dense PyTorch tensor, compute eigenvalues on GPU, and convert back to NumPy for plotting.

Solution

Implementation

import numpy as np
import scipy.sparse as sp
import torch

# Step 1: Build sparse Laplacian in SciPy
n = 100
L = sp.diags([-1, 2, -1], [-1, 0, 1], shape=(n, n), format="csr")

# Step 2: Convert to dense PyTorch tensor
L_dense = torch.from_numpy(L.toarray()).to(torch.float64)

# Step 3: Compute eigenvalues (on GPU if available)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
L_dev = L_dense.to(device)
eigvals = torch.linalg.eigvalsh(L_dev)

# Step 4: Back to NumPy for analysis
eigvals_np = eigvals.cpu().numpy()
print(f"Smallest eigenvalue: {eigvals_np[0]:.6f}")
print(f"Largest eigenvalue:  {eigvals_np[-1]:.6f}")
print(f"Condition number:    {eigvals_np[-1] / eigvals_np[0]:.2f}")

When This Pattern Is Useful

SciPy has the best sparse matrix support in Python. PyTorch has the best GPU linear algebra. Combining them gives you the best of both worlds for problems like PDE discretization followed by eigenvalue analysis.

Example: Zero-Copy GPU Transfer Between CuPy and PyTorch

Demonstrate zero-copy GPU memory sharing between CuPy and PyTorch using DLPack.

Solution

Implementation

import torch
import cupy as cp

# Create on PyTorch GPU
t = torch.randn(1000, 1000, device="cuda", dtype=torch.float64)

# Zero-copy to CuPy
c = cp.from_dlpack(t)
print(f"Same pointer: {c.data.ptr == t.data_ptr()}")  # True

# Modify in CuPy — visible in PyTorch
c[0, 0] = 42.0
print(f"PyTorch sees change: {t[0, 0].item() == 42.0}")  # True

# Use CuPy's unique features (e.g., custom kernels)
result_cp = cp.fft.fft2(c)

# Back to PyTorch — zero-copy
result_pt = torch.from_dlpack(result_cp)

Why This Matters

CuPy provides custom CUDA kernels and cuFFT wrappers that PyTorch may not expose directly. DLPack lets you mix frameworks without any data copies on GPU.

Interoperability Transfer Benchmark

Compare the time and memory cost of different conversion methods (from_numpy, DLPack, explicit copy) across array sizes.

Parameters

Tensor Conversion Methods

Method	Copy?	GPU?	Notes
torch.from_numpy(a)	No (shared)	CPU only	Fastest for CPU NumPy arrays
torch.tensor(a)	Yes (always)	Any	Creates independent copy
torch.as_tensor(a)	No (if possible)	CPU only for NumPy	Smart: avoids copy when safe
t.numpy()	No (shared)	CPU only	Requires .cpu() for GPU tensors
torch.from_dlpack(x)	No (shared)	Yes	Universal protocol, works with CuPy/JAX
cp.from_dlpack(t)	No (shared)	Yes	PyTorch GPU -> CuPy GPU

Quick Check

After t = torch.from_numpy(a), you modify t[0] = 99. What happens to a[0]?

a[0] remains unchanged

a[0] becomes 99

A RuntimeError is raised

It depends on the dtype

Correction:

a[0] becomes 99

Zero-copy sharing means both point to the same memory.

Common Mistake: Calling .numpy() on GPU Tensors

Mistake:

Calling .numpy() on a CUDA tensor:

t = torch.randn(5, device="cuda")
a = t.numpy()   # RuntimeError!

Correction:

Move to CPU first, and detach from the graph if needed:

a = t.detach().cpu().numpy()

Historical Note: The DLPack Standard

2010s-2020s

DLPack was proposed in 2017 by the DMLC (Distributed Machine Learning Community) to solve the growing fragmentation between tensor libraries. By 2022, it was adopted by NumPy, CuPy, PyTorch, JAX, and TensorFlow. The Python Array API consortium (2020-present) built on DLPack to define a standard set of array operations across all these libraries.

Key Takeaway

NumPy, CuPy, and PyTorch can share memory zero-copy on CPU (via from_numpy/.numpy()) and on GPU (via DLPack). Use the Array API standard for backend-agnostic code. Never call .numpy() on GPU tensors — always .cpu() first.

DLPack

A protocol for zero-copy tensor exchange between deep learning frameworks, supporting both CPU and GPU memory.

Related: Tensor

Array API Standard

A specification (NEP 47) defining a common subset of array operations that NumPy, PyTorch, CuPy, and JAX all implement, enabling backend-agnostic code.

Related: DLPack

NumPy-CuPy-PyTorch Interoperability

python

Complete examples of converting between NumPy, CuPy, and PyTorch with zero-copy methods, DLPack, and the Array API.

# Code from: ch12/python/interop.py
# Load from backend supplements endpoint

Backend-Agnostic Scientific Computing

python

Patterns for writing scientific code that runs on NumPy, CuPy, or PyTorch without modification.

# Code from: ch12/python/backend_agnostic.py
# Load from backend supplements endpoint

PyTorch Linear Algebra Chapter Summary

Interoperability: NumPy, CuPy, PyTorch

Definition: Zero-Copy NumPy <-> PyTorch Conversion

Definition: DLPack: Universal Tensor Exchange

Definition: Writing Backend-Agnostic Code

Theorem: Conditions for Zero-Copy Conversion

Example: SciPy + PyTorch Mixed Computation Pipeline

Implementation

When This Pattern Is Useful

Example: Zero-Copy GPU Transfer Between CuPy and PyTorch

Implementation

Why This Matters

Interoperability Transfer Benchmark

Parameters

Tensor Conversion Methods

Quick Check

Common Mistake: Calling .numpy() on GPU Tensors

Historical Note: The DLPack Standard

Key Takeaway

DLPack

Array API Standard

NumPy-CuPy-PyTorch Interoperability

Backend-Agnostic Scientific Computing

Definition:
Zero-Copy NumPy <-> PyTorch Conversion

Definition:
DLPack: Universal Tensor Exchange

Definition:
Writing Backend-Agnostic Code