Tensors vs. NumPy Arrays

Definition:

PyTorch Tensor

A PyTorch tensor is a multi-dimensional array stored in contiguous memory that supports:

  1. Device placement β€” CPU or GPU (cuda, mps).
  2. Automatic differentiation β€” optional gradient tracking.
  3. NumPy-compatible API β€” slicing, broadcasting, @ operator.
import torch

x = torch.tensor([1.0, 2.0, 3.0])            # from list
y = torch.zeros(3, 4, dtype=torch.float64)    # explicit dtype
z = torch.randn(2, 3, device="cuda")          # directly on GPU

Internally, a tensor is a view into a Storage object with a shape, stride, and offset β€” the same strided-memory model as NumPy.

Unlike NumPy arrays, tensors default to float32, not float64. This is intentional: single precision is sufficient for most deep learning and is 2x faster on GPUs. For scientific computing, you may want to explicitly request torch.float64.

Definition:

Device Placement and Transfer

Every tensor lives on a specific device. Moving data between devices is explicit:

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
x_cpu = torch.randn(1000)
x_gpu = x_cpu.to(device)        # CPU -> GPU copy
x_back = x_gpu.cpu()            # GPU -> CPU copy
x_gpu2 = x_gpu.cuda(1)          # GPU 0 -> GPU 1

Key rule: all operands in a computation must reside on the same device. PyTorch raises RuntimeError if you try to combine tensors on different devices. This is a deliberate design choice β€” implicit transfers hide performance bugs.

Use torch.cuda.synchronize() before timing GPU operations; CUDA launches are asynchronous, so CPU-side timers undercount without synchronization.

Definition:

Tensor Data Types

PyTorch provides a rich set of data types:

Category Types
Float float16, bfloat16, float32, float64
Integer int8, int16, int32, int64
Complex complex64, complex128
Boolean bool

Casting is explicit via .to(dtype) or convenience methods:

x = torch.tensor([1, 2, 3])          # int64
y = x.float()                         # -> float32
z = x.to(torch.complex128)            # -> complex128

Promotion rules: PyTorch follows NumPy-like type promotion but defaults to float32 (not float64) for floating-point literals.

Definition:

In-Place Operations

Operations suffixed with _ modify the tensor in place:

x = torch.randn(3)
x.add_(1.0)        # x += 1.0, no new allocation
x.mul_(2.0)        # x *= 2.0
x.zero_()           # fill with zeros
x.clamp_(0, 1)     # clamp between 0 and 1

In-place ops save memory but have two caveats:

  • They break autograd if the tensor is needed for gradient computation.
  • They can silently corrupt shared views (just like NumPy).

The convention is simple: if a method name ends with _, it is in-place.

In scientific computing, in-place ops are useful for iterative algorithms where you update a state tensor repeatedly (e.g., gradient descent steps). But avoid them inside autograd-tracked computations.

Definition:

Tensor Creation Functions

PyTorch mirrors NumPy's creation API with slightly different names:

NumPy PyTorch
np.zeros(shape) torch.zeros(shape)
np.ones(shape) torch.ones(shape)
np.eye(n) torch.eye(n)
np.arange(n) torch.arange(n)
np.linspace(a,b,n) torch.linspace(a, b, n)
np.random.randn(n) torch.randn(n)
np.empty(shape) torch.empty(shape)

The *_like family copies shape, dtype, and device:

y = torch.zeros_like(x)   # same shape/dtype/device as x

Theorem: Contiguity and Stride Formula

A tensor with shape (d0,d1,…,dnβˆ’1)(d_0, d_1, \ldots, d_{n-1}) is contiguous (C-order) if and only if its strides satisfy:

stridek=∏i=k+1nβˆ’1di\text{stride}_k = \prod_{i=k+1}^{n-1} d_i

A transposed tensor typically has non-contiguous strides. Calling .contiguous() copies data into a new contiguous block.

Just like NumPy, PyTorch tensors are views into flat memory. The stride tells you how many elements to skip in each dimension. Transpose merely swaps the strides without moving data β€” this is O(1) but makes the layout non-contiguous.

Dtype Performance Benchmark

Compare the speed of matrix multiplication across dtypes (float16, float32, float64) and matrix sizes.

Parameters

Example: Creating and Moving Tensors Between Devices

Create a 100Γ—100100 \times 100 random matrix on CPU, move it to GPU (if available), compute its matrix product with its transpose, and move the result back. Time each step.

Example: In-Place Gradient Descent on a Quadratic

Minimize f(x)=12xTAxf(\mathbf{x}) = \frac{1}{2} \mathbf{x}^T \mathbf{A} \mathbf{x} where A=diag(1,2,3)\mathbf{A} = \text{diag}(1, 2, 3) using in-place tensor operations for the update step x←xβˆ’Ξ±βˆ‡f\mathbf{x} \leftarrow \mathbf{x} - \alpha \nabla f.

Example: Views vs. Copies in PyTorch

Demonstrate the difference between reshape (returns a view when possible), clone (always copies), and contiguous (copies only when needed).

Tensor Memory Layout Explorer

Visualize how different shapes and strides map tensor elements to physical memory positions. Observe how transpose swaps strides without copying data.

Parameters

PyTorch Tensor Ecosystem

PyTorch Tensor Ecosystem
The PyTorch tensor sits at the center of the ecosystem: it shares memory layout with NumPy arrays, can live on CPU or GPU, and supports automatic differentiation through the autograd engine.

Quick Check

What is the default floating-point dtype for torch.randn(5)?

float16

float32

float64

bfloat16

Common Mistake: Silent Precision Loss with Default float32

Mistake:

Performing scientific computations with the default float32 without realizing the precision loss compared to NumPy's float64.

Correction:

Explicitly set dtype=torch.float64 for computations requiring high precision (e.g., numerical integration, eigenvalue problems):

x = torch.randn(100, dtype=torch.float64)
torch.set_default_dtype(torch.float64)  # or change the global default

Common Mistake: Cross-Device Operation Error

Mistake:

Trying to add a CPU tensor to a GPU tensor:

a = torch.randn(5)                    # CPU
b = torch.randn(5, device="cuda")     # GPU
c = a + b   # RuntimeError!

Correction:

Move both tensors to the same device before operating:

c = a.to(b.device) + b

Historical Note: From Torch to PyTorch

2000s-2010s

PyTorch (2017) descends from Torch7, a Lua-based tensor library created at NYU by Ronan Collobert and collaborators in 2002. Facebook AI Research (FAIR) rewrote it in Python, keeping the C/CUDA backend. The name "tensor" in this context follows the tradition of multilinear algebra, where a tensor is a multi-indexed quantity that transforms according to specific rules under change of basis β€” though in practice, PyTorch tensors are simply multi-dimensional arrays.

Historical Note: Define-by-Run vs. Define-and-Run

2010s

PyTorch popularized the define-by-run paradigm (dynamic computational graphs), where the graph is built during execution rather than compiled beforehand. This was a departure from TensorFlow 1.x's static graph approach and proved more natural for researchers. Chainer (2015) pioneered this idea; PyTorch adopted and refined it.

Tensor

A multi-dimensional array in PyTorch that supports GPU acceleration and automatic differentiation. Analogous to NumPy's ndarray.

Related: Device, dtype

Device

The hardware location where a tensor's data resides: cpu, cuda (NVIDIA GPU), or mps (Apple Silicon GPU).

Related: Tensor

dtype

The data type of tensor elements (e.g., torch.float32, torch.complex128, torch.int64).

Related: Tensor

Stride

A tuple indicating how many elements to skip in memory to advance one position along each dimension. Determines whether a tensor is contiguous.

In-Place Operation

A tensor operation suffixed with _ that modifies the tensor's data without allocating new memory (e.g., x.add_(1)).

NumPy vs. PyTorch API Comparison

FeatureNumPyPyTorch
Default float dtypefloat64float32
GPU supportNo (need CuPy)Built-in (.cuda())
AutogradNoBuilt-in (requires_grad)
In-place conventionout= parameterMethod ending with _
Random creationnp.random.randn(n)torch.randn(n)
Matrix multiplyA @ BA @ B (identical)
Complex supportnp.complex128torch.complex128
View/reshapenp.reshape (view when possible)torch.reshape (view when possible)

Key Takeaway

PyTorch tensors are NumPy arrays with superpowers: GPU placement, automatic differentiation, and in-place operations. The API is deliberately similar to NumPy, but watch out for the default dtype (float32 vs. float64) and the requirement that all operands share the same device.

Tensor Basics and Device Management

python
Complete examples of tensor creation, dtype casting, device management, in-place operations, views, strides, and CPU vs GPU benchmarks.
# Code from: ch12/python/tensor_basics.py
# Load from backend supplements endpoint