Control Flow and Comprehensions
Lazy Evaluation for Large Datasets
Scientific computing often processes datasets too large to fit in memory all at once β millions of channel realizations, terabytes of radar returns, or streaming sensor data. Python's generators and comprehensions provide lazy evaluation: producing values on demand instead of materializing entire collections. This section covers the tools that make this possible.
Definition: Generators and yield
Generators and yield
A generator function uses yield instead of return. When
called, it returns a generator object β a lazy iterator that
produces values one at a time:
def fibonacci(n: int):
"""Generate first n Fibonacci numbers lazily."""
a, b = 0, 1
for _ in range(n):
yield a
a, b = b, a + b
# No computation until iteration begins
for fib in fibonacci(1_000_000):
if fib > 1000:
break
The generator pauses at each yield and resumes when the next
value is requested. Memory usage is regardless of .
Theorem: Generator Space Complexity
A generator function that yields values uses memory, where is the size of the generator's local state (stack frame), independent of . In contrast, materializing the same sequence as a list requires memory, where is the size of each element.
A generator suspends its stack frame between yields. Only the local variables (not the yielded values) are kept in memory. The consumer processes one value at a time.
Generator memory model
When a generator function is called, Python allocates a single frame object containing the function's local variables, the bytecode instruction pointer, and a reference to the enclosing scope. This frame has fixed size independent of how many values will be yielded.
Comparison with list
A list comprehension [f(x) for x in range(n)] allocates:
- The list header:
- pointers to elements:
- element objects:
Total: .
Conclusion
The generator uses memory regardless of .
For sum(x**2 for x in range(n)), the generator version
uses memory while the list version
sum([x**2 for x in range(n)]) uses .
This difference becomes critical for , where the list may exceed available RAM.
Definition: List, Dict, and Set Comprehensions
List, Dict, and Set Comprehensions
Comprehensions are concise syntax for building collections from iterables:
# List comprehension
squares = [x**2 for x in range(10)]
# Dict comprehension
snr_linear = {db: 10**(db/10) for db in range(-5, 25, 5)}
# Set comprehension
unique_lengths = {len(word) for word in vocabulary}
# Generator expression (lazy β note parentheses, not brackets)
total = sum(x**2 for x in range(1_000_000)) # O(1) memory
Generator expressions look like list comprehensions but use
parentheses. They produce values lazily and are ideal inside
functions like sum(), max(), any(), all().
Definition: itertools β The Iterator Toolkit
itertools β The Iterator Toolkit
The itertools module provides efficient building blocks for iteration:
import itertools
# Product: all combinations of parameters (grid search)
snrs = [0, 5, 10, 15, 20]
n_antennas = [2, 4, 8, 16]
for snr, nt in itertools.product(snrs, n_antennas):
run_simulation(snr, nt)
# Chain: concatenate multiple iterables
all_results = itertools.chain(train_results, val_results, test_results)
# islice: take first n items from any iterator
first_100 = list(itertools.islice(huge_generator(), 100))
# combinations: all k-element subsets
# Useful for enumerating antenna pairs
for ant_i, ant_j in itertools.combinations(range(N), 2):
compute_correlation(ant_i, ant_j)
Definition: functools β Higher-Order Functions
functools β Higher-Order Functions
Key functions from functools:
from functools import partial, lru_cache, reduce
# partial: freeze some arguments
simulate_rayleigh = partial(simulate, channel_model="rayleigh")
simulate_rayleigh(snr_db=10) # channel_model is already set
# lru_cache: memoize expensive computations
@lru_cache(maxsize=128)
def steering_vector(theta: float, n_elements: int) -> tuple:
"""Cache steering vectors for repeated angles."""
return tuple(np.exp(1j * np.pi * np.arange(n_elements) * np.sin(theta)))
# reduce: cumulative application
from operator import mul
factorial_10 = reduce(mul, range(1, 11)) # 3628800
generator
A function that uses yield to produce values lazily, one at a time.
Generators are iterators with memory overhead regardless of
the number of values produced.
Related: iterator
iterator
An object implementing the __next__ method, which returns the
next value or raises StopIteration. All generators are iterators,
but not all iterators are generators.
Example: Parameter Sweep with itertools.product
You want to run a simulation for every combination of
SNR dB, antenna count ,
and channel model .
Write a clean parameter sweep using itertools.product.
Define parameter grid
import itertools
snrs = [0, 5, 10, 15, 20]
antennas = [2, 4, 8]
models = ["rayleigh", "ricean"]
configs = list(itertools.product(snrs, antennas, models))
print(f"Total configurations: {len(configs)}")
# Total configurations: 30
Run the sweep
results = {}
for snr, nt, model in itertools.product(snrs, antennas, models):
ber = run_simulation(snr_db=snr, n_antennas=nt, channel=model)
results[(snr, nt, model)] = ber
This replaces three nested for loops with a single flat loop,
making the code cleaner and the parameter space explicit.
Example: Streaming Data with Generators
Write a generator that reads a large CSV file of measurement data line by line, yielding only rows where the SNR exceeds a threshold. The file may be larger than RAM.
Generator with filtering
import csv
from typing import Iterator
def filter_measurements(
filepath: str,
min_snr: float
) -> Iterator[dict]:
"""Yield rows with SNR above threshold, one at a time."""
with open(filepath, 'r') as f:
reader = csv.DictReader(f)
for row in reader:
if float(row['snr_db']) >= min_snr:
yield {
'snr_db': float(row['snr_db']),
'ber': float(row['ber']),
'timestamp': row['timestamp'],
}
# Process millions of rows with O(1) memory
high_snr = filter_measurements('measurements.csv', min_snr=15.0)
for measurement in high_snr:
process(measurement)
Generators and Itertools Patterns
# Code from: ch01/python/generators_demo.py
# Load from backend supplements endpointComprehension Patterns for Scientific Computing
# Code from: ch01/python/comprehensions_patterns.py
# Load from backend supplements endpointParameter Sweep Framework
# Code from: ch01/python/itertools_parameter_sweep.py
# Load from backend supplements endpointGenerator vs. List Memory Usage
Compare memory consumption of a list comprehension vs. a generator expression as the number of elements grows. The list stores all values in memory; the generator produces them on demand.
Parameters
Common Mistake: Generators Are Single-Use
Mistake:
Trying to iterate over a generator twice:
gen = (x**2 for x in range(10))
first_pass = list(gen) # [0, 1, 4, 9, ...]
second_pass = list(gen) # [] β empty! Generator exhausted.
Correction:
If you need multiple passes, either:
- Materialize to a list:
data = list(gen)and iteratedata - Use a generator function and call it again:
def squares(n):
return (x**2 for x in range(n))
first = list(squares(10))
second = list(squares(10))
Historical Note: The Walrus Operator (:=)
2019Python 3.8 (2019) introduced the assignment expression :=
(nicknamed the "walrus operator" for its resemblance to a walrus
lying on its side). It allows assignment within expressions:
# Without walrus:
line = f.readline()
while line:
process(line)
line = f.readline()
# With walrus:
while line := f.readline():
process(line)
PEP 572 was one of the most controversial Python proposals ever, ultimately leading to Guido van Rossum stepping down as BDFL.
Quick Check
What is the memory usage of sum(x**2 for x in range(10_000_000))?
β the generator expression produces values lazily
β all 10 million squares are stored in memory
β Python optimizes large generators
Generator expressions yield one value at a time. Only the running sum and current value are in memory.
Key Takeaway
Use generators and generator expressions for large data.
They provide memory usage regardless of dataset size.
Use itertools.product for parameter sweeps, functools.partial
for currying, and lru_cache for memoization.