Core Data Structures Refresher

Choosing the Right Container

Scientific code spends most of its time in NumPy/PyTorch. But the glue around those hot paths β€” configuration parsing, result aggregation, bookkeeping β€” uses standard Python data structures. Choosing the wrong container turns O(1)O(1) lookups into O(n)O(n) scans and makes code fragile. This section sharpens your intuition for when to reach for each tool.

Definition:

Time Complexity of Core Operations

The performance of Python's built-in containers:

Operation list dict set deque
Index [i] O(1)O(1) O(1)O(1) N/A O(n)O(n)
Append/push right O(1)βˆ—O(1)^* β€” β€” O(1)O(1)
Prepend/push left O(n)O(n) β€” β€” O(1)O(1)
Search x in O(n)O(n) O(1)O(1) O(1)O(1) O(n)O(n)
Delete by value O(n)O(n) O(1)O(1) O(1)O(1) O(n)O(n)
Sort O(nlog⁑n)O(n \log n) β€” β€” β€”

βˆ—^* Amortized β€” occasional resizes cost O(n)O(n) but average out.

Theorem: Amortized Cost of List Append

Python's list.append() has amortized O(1)O(1) time complexity. Specifically, a sequence of nn append operations on an initially empty list takes O(n)O(n) total time, even though individual appends occasionally cost O(n)O(n) due to reallocation.

CPython doubles the underlying array capacity when it runs out of space. This means resizes happen at append 1,2,4,8,16,…1, 2, 4, 8, 16, \ldots, and the total work across all resizes is 1+2+4+β‹―+n=2nβˆ’1=O(n)1 + 2 + 4 + \cdots + n = 2n - 1 = O(n).

Theorem: Expected-Case Hash Table Lookup

Under the simple uniform hashing assumption (each key is equally likely to hash to any slot), lookup in a hash table with nn elements and mm slots has expected time O(1+n/m)O(1 + n/m). When the load factor Ξ±=n/m\alpha = n/m is kept bounded (Python keeps Ξ±<2/3\alpha < 2/3), this simplifies to expected O(1)O(1).

A hash table maps keys to slots via a hash function. With a good hash function and a bounded load factor, the expected number of keys per slot (chain length) is constant, so lookup takes constant expected time.

Definition:

collections.defaultdict

A dict subclass that calls a factory function for missing keys:

from collections import defaultdict

# Group results by SNR level
results = defaultdict(list)
for snr, ber in measurements:
    results[snr].append(ber)

Without defaultdict, you would need if snr not in results: results[snr] = [] before every append β€” verbose and error-prone.

Definition:

collections.Counter

A dict subclass for counting hashable objects:

from collections import Counter

# Count bit errors
errors = Counter(detected_bits ^ true_bits)
print(errors[1])  # Number of bit errors (1s in XOR result)

# Most common elements
symbol_counts = Counter(received_symbols)
print(symbol_counts.most_common(5))

Definition:

collections.deque

A double-ended queue supporting O(1)O(1) append and pop from both ends:

from collections import deque

# Sliding window of recent measurements
window = deque(maxlen=100)
for sample in data_stream:
    window.append(sample)
    # window automatically discards oldest when full
    moving_avg = sum(window) / len(window)

Use deque whenever you need a fixed-size buffer, a queue, or efficient prepending.

Definition:

collections.namedtuple and dataclasses

Named tuples provide lightweight, immutable records:

from collections import namedtuple
Point = namedtuple('Point', ['x', 'y', 'z'])
p = Point(1.0, 2.0, 3.0)
print(p.x, p.y)  # 1.0 2.0

Dataclasses (Python 3.7+) are mutable and more flexible:

from dataclasses import dataclass

@dataclass
class SimConfig:
    snr_db: float = 10.0
    n_antennas: int = 4
    n_realizations: int = 1000
    channel_model: str = "rayleigh"

The @dataclass decorator auto-generates __init__, __repr__, and __eq__. Prefer dataclasses for parameter containers.

dataclass

A Python class decorated with @dataclass that auto-generates __init__, __repr__, __eq__, and optionally __hash__. Ideal for parameter containers in scientific code.

Related: dunder method

Example: Choosing the Right Data Structure

You are running a Monte Carlo simulation that generates 10,000 channel realizations. For each realization, you need to: (a) store the BER result, (b) check if a particular channel matrix has been seen before (exact match), and (c) maintain a running window of the last 50 BER values. Which data structures should you use for each task?

Data Structures Performance Benchmark

python
Benchmarks list vs. deque for prepend, dict vs. list for lookup, and Counter for frequency analysis. Prints timing results.
# Code from: ch01/python/data_structures_benchmark.py
# Load from backend supplements endpoint

Dataclass Patterns for Simulations

python
Demonstrates dataclass patterns: nested configs, results containers, frozen (immutable) configs, and factory methods for antenna arrays.
# Code from: ch01/python/dataclass_patterns.py
# Load from backend supplements endpoint

Data Structure Operation Costs

Compare the empirical time of common operations (append, lookup, delete) across Python's built-in data structures as the container size grows.

Parameters
100000

Dynamic Array Growth Animation

Watch Python's list grow as elements are appended. The animation shows capacity doublings, element copies during resizes, and the amortized cost converging to O(1)O(1) per operation.

Parameters
64

Python Operator Dispatch Mechanism

Python Operator Dispatch Mechanism
How Python dispatches a + b: check a.__add__, fall back to b.__radd__, raise TypeError if neither works.

Historical Note: Dictionaries Became Ordered in Python 3.7

2018

Before Python 3.7, dict iteration order was implementation-dependent. CPython 3.6 made insertion order preservation an implementation detail; Python 3.7 made it a language guarantee. This means you can now rely on for k, v in params.items() iterating in insertion order, which is useful for ordered parameter sweeps and reproducible experiment configs.

Why This Matters: Dataclasses as Simulation Configs

In wireless simulation, experiments typically involve sweeping over many parameters (SNR, number of antennas, channel model, coding rate). A @dataclass provides a clean, typed, self-documenting container for these parameters β€” much better than passing loose **kwargs or raw dictionaries. We will use this pattern throughout Books 1 and 2 whenever we configure simulations.

See full treatment in Classes and Inheritance

Quick Check

Which data structure provides O(1) append AND O(1) prepend?

collections.deque

list

dict

tuple

Key Takeaway

Match the data structure to the access pattern. Lists for ordered sequences, sets for membership testing, dicts for key-value lookup, deques for double-ended access. Dataclasses for parameter containers. The wrong choice turns O(1)O(1) into O(n)O(n) and makes code fragile.