Control Flow and Comprehensions

Lazy Evaluation for Large Datasets

Scientific computing often processes datasets too large to fit in memory all at once β€” millions of channel realizations, terabytes of radar returns, or streaming sensor data. Python's generators and comprehensions provide lazy evaluation: producing values on demand instead of materializing entire collections. This section covers the tools that make this possible.

Definition:

Generators and yield

A generator function uses yield instead of return. When called, it returns a generator object β€” a lazy iterator that produces values one at a time:

def fibonacci(n: int):
    """Generate first n Fibonacci numbers lazily."""
    a, b = 0, 1
    for _ in range(n):
        yield a
        a, b = b, a + b

# No computation until iteration begins
for fib in fibonacci(1_000_000):
    if fib > 1000:
        break

The generator pauses at each yield and resumes when the next value is requested. Memory usage is O(1)O(1) regardless of nn.

Theorem: Generator Space Complexity

A generator function that yields nn values uses O(s)O(s) memory, where ss is the size of the generator's local state (stack frame), independent of nn. In contrast, materializing the same sequence as a list requires O(nβ‹…e)O(n \cdot e) memory, where ee is the size of each element.

A generator suspends its stack frame between yields. Only the local variables (not the yielded values) are kept in memory. The consumer processes one value at a time.

Definition:

List, Dict, and Set Comprehensions

Comprehensions are concise syntax for building collections from iterables:

# List comprehension
squares = [x**2 for x in range(10)]

# Dict comprehension
snr_linear = {db: 10**(db/10) for db in range(-5, 25, 5)}

# Set comprehension
unique_lengths = {len(word) for word in vocabulary}

# Generator expression (lazy β€” note parentheses, not brackets)
total = sum(x**2 for x in range(1_000_000))  # O(1) memory

Generator expressions look like list comprehensions but use parentheses. They produce values lazily and are ideal inside functions like sum(), max(), any(), all().

Definition:

itertools β€” The Iterator Toolkit

The itertools module provides efficient building blocks for iteration:

import itertools

# Product: all combinations of parameters (grid search)
snrs = [0, 5, 10, 15, 20]
n_antennas = [2, 4, 8, 16]
for snr, nt in itertools.product(snrs, n_antennas):
    run_simulation(snr, nt)

# Chain: concatenate multiple iterables
all_results = itertools.chain(train_results, val_results, test_results)

# islice: take first n items from any iterator
first_100 = list(itertools.islice(huge_generator(), 100))

# combinations: all k-element subsets
# Useful for enumerating antenna pairs
for ant_i, ant_j in itertools.combinations(range(N), 2):
    compute_correlation(ant_i, ant_j)

Definition:

functools β€” Higher-Order Functions

Key functions from functools:

from functools import partial, lru_cache, reduce

# partial: freeze some arguments
simulate_rayleigh = partial(simulate, channel_model="rayleigh")
simulate_rayleigh(snr_db=10)  # channel_model is already set

# lru_cache: memoize expensive computations
@lru_cache(maxsize=128)
def steering_vector(theta: float, n_elements: int) -> tuple:
    """Cache steering vectors for repeated angles."""
    return tuple(np.exp(1j * np.pi * np.arange(n_elements) * np.sin(theta)))

# reduce: cumulative application
from operator import mul
factorial_10 = reduce(mul, range(1, 11))  # 3628800

generator

A function that uses yield to produce values lazily, one at a time. Generators are iterators with O(1)O(1) memory overhead regardless of the number of values produced.

Related: iterator

iterator

An object implementing the __next__ method, which returns the next value or raises StopIteration. All generators are iterators, but not all iterators are generators.

Example: Parameter Sweep with itertools.product

You want to run a simulation for every combination of SNR ∈{0,5,10,15,20}\in \{0, 5, 10, 15, 20\} dB, antenna count ∈{2,4,8}\in \{2, 4, 8\}, and channel model ∈{rayleigh,ricean}\in \{\text{rayleigh}, \text{ricean}\}. Write a clean parameter sweep using itertools.product.

Example: Streaming Data with Generators

Write a generator that reads a large CSV file of measurement data line by line, yielding only rows where the SNR exceeds a threshold. The file may be larger than RAM.

Generators and Itertools Patterns

python
Demonstrates generators, generator expressions, itertools.product, itertools.chain, functools.partial, and lru_cache with benchmarks.
# Code from: ch01/python/generators_demo.py
# Load from backend supplements endpoint

Comprehension Patterns for Scientific Computing

python
List, dict, set comprehensions and generator expressions for scientific computing: parameter conversions, matrix construction, and the walrus operator.
# Code from: ch01/python/comprehensions_patterns.py
# Load from backend supplements endpoint

Parameter Sweep Framework

python
A complete parameter sweep framework using itertools.product, dataclasses, CSV/JSON output, and progress reporting.
# Code from: ch01/python/itertools_parameter_sweep.py
# Load from backend supplements endpoint

Generator vs. List Memory Usage

Compare memory consumption of a list comprehension vs. a generator expression as the number of elements grows. The list stores all values in memory; the generator produces them on demand.

Parameters
1000000

Common Mistake: Generators Are Single-Use

Mistake:

Trying to iterate over a generator twice:

gen = (x**2 for x in range(10))
first_pass = list(gen)   # [0, 1, 4, 9, ...]
second_pass = list(gen)  # [] β€” empty! Generator exhausted.

Correction:

If you need multiple passes, either:

  1. Materialize to a list: data = list(gen) and iterate data
  2. Use a generator function and call it again:
def squares(n):
    return (x**2 for x in range(n))
first = list(squares(10))
second = list(squares(10))

Historical Note: The Walrus Operator (:=)

2019

Python 3.8 (2019) introduced the assignment expression := (nicknamed the "walrus operator" for its resemblance to a walrus lying on its side). It allows assignment within expressions:

# Without walrus:
line = f.readline()
while line:
    process(line)
    line = f.readline()

# With walrus:
while line := f.readline():
    process(line)

PEP 572 was one of the most controversial Python proposals ever, ultimately leading to Guido van Rossum stepping down as BDFL.

Quick Check

What is the memory usage of sum(x**2 for x in range(10_000_000))?

O(1)O(1) β€” the generator expression produces values lazily

O(n)O(n) β€” all 10 million squares are stored in memory

O(n)O(\sqrt{n}) β€” Python optimizes large generators

Key Takeaway

Use generators and generator expressions for large data. They provide O(1)O(1) memory usage regardless of dataset size. Use itertools.product for parameter sweeps, functools.partial for currying, and lru_cache for memoization.