Chapter Summary
Chapter Summary
Key Points
- 1.
PyTorch tensors are NumPy arrays with GPU and autograd. The API is deliberately similar, but the default dtype is
float32(notfloat64). Usetorch.float64explicitly for high-precision scientific computing. All operands must reside on the same device. - 2.
Autograd computes exact gradients via reverse-mode AD. Set
requires_grad=Trueon parameters, call.backward()on a scalar loss, and read gradients from.grad. Gradients accumulate by default — zero them between iterations. Usetorch.no_grad()for inference and.detach()to sever graph connections. - 3.
Complex tensors use Wirtinger calculus. PyTorch returns the conjugate Wirtinger derivative , which is the steepest-descent direction for real-valued losses of complex parameters. This makes gradient descent on complex parameters work identically to the real case.
- 4.
Batch your linear algebra.
torch.linalgmirrors NumPy's API but adds batched operations over leading dimensions and GPU acceleration. Processing 10,000 matrices at once is vastly faster than looping — this is essential for MIMO-OFDM and Monte Carlo simulations. - 5.
Use zero-copy conversion between frameworks.
torch.from_numpyand.numpy()share memory on CPU. DLPack enables zero-copy GPU sharing between PyTorch, CuPy, and JAX. The Array API standard lets you write framework-agnostic code. - 6.
Avoid in-place ops in autograd, NaN gradients from degenerate decompositions, and
.numpy()on GPU tensors. These are the three most common sources of bugs when using PyTorch for scientific computing.
Looking Ahead
Chapter 13 moves to SciPy's optimization module, where PyTorch's autograd can provide exact gradients to optimizers that traditionally require finite-difference approximations. The interoperability patterns from Section 12.5 let you combine SciPy's solvers with PyTorch's GPU-accelerated gradient computation.