Ferkans — Interactive Telecom Tutor

Choosing the Right Container

Scientific code spends most of its time in NumPy/PyTorch. But the glue around those hot paths — configuration parsing, result aggregation, bookkeeping — uses standard Python data structures. Choosing the wrong container turns $O(1)$ lookups into $O(n)$ scans and makes code fragile. This section sharpens your intuition for when to reach for each tool.

Definition:
Time Complexity of Core Operations

The performance of Python's built-in containers:

Operation	`list`	`dict`	`set`	`deque`
Index `[i]`	$O(1)$	$O(1)$	N/A	$O(n)$
Append/push right	$O(1)^*$	—	—	$O(1)$
Prepend/push left	$O(n)$	—	—	$O(1)$
Search `x in`	$O(n)$	$O(1)$	$O(1)$	$O(n)$
Delete by value	$O(n)$	$O(1)$	$O(1)$	$O(n)$
Sort	$O(n \log n)$	—	—	—

$^*$ Amortized — occasional resizes cost $O(n)$ but average out.

Theorem: Amortized Cost of List Append

Python's list.append() has amortized $O(1)$ time complexity. Specifically, a sequence of $n$ append operations on an initially empty list takes $O(n)$ total time, even though individual appends occasionally cost $O(n)$ due to reallocation.

CPython doubles the underlying array capacity when it runs out of space. This means resizes happen at append $1, 2, 4, 8, 16, \ldots$ , and the total work across all resizes is $1 + 2 + 4 + \cdots + n = 2n - 1 = O(n)$ .

Show Hint

Consider the total cost of all resize operations over $n$ appends.

The resize at capacity $k$ copies $k$ elements. Sum the geometric series.

Proof

Setup

Let $T(n)$ be the total cost of $n$ append operations. Each append costs $O(1)$ unless a resize is triggered, in which case it costs $O(k)$ where $k$ is the current capacity.

Counting resize costs

With a doubling strategy, resizes occur when the list size reaches powers of 2: $1, 2, 4, 8, \ldots, 2^{\lfloor\log_2 n\rfloor}$ . The total resize cost is $C_{\text{resize}} = \sum_{i=0}^{\lfloor\log_2 n\rfloor} 2^i = 2^{\lfloor\log_2 n\rfloor + 1} - 1 < 2n.$

Total cost

The total cost is $T(n) = n \cdot O(1) + C_{\text{resize}} < n + 2n = 3n = O(n)$ . Therefore the amortized cost per append is $T(n)/n = O(1)$ . $\square$

Theorem: Expected-Case Hash Table Lookup

Under the simple uniform hashing assumption (each key is equally likely to hash to any slot), lookup in a hash table with $n$ elements and $m$ slots has expected time $O(1 + n/m)$ . When the load factor $\alpha = n/m$ is kept bounded (Python keeps $\alpha < 2/3$ ), this simplifies to expected $O(1)$ .

A hash table maps keys to slots via a hash function. With a good hash function and a bounded load factor, the expected number of keys per slot (chain length) is constant, so lookup takes constant expected time.

Proof

Expected chain length

Under simple uniform hashing, each of the $n$ keys is placed in one of $m$ slots independently and uniformly. The expected number of keys in any given slot is $\alpha = n/m$ .

Lookup cost

An unsuccessful lookup examines the entire chain in the target slot, costing $O(1 + \alpha)$ . A successful lookup examines, on average, $1 + \alpha/2 - \alpha/(2n) = O(1 + \alpha)$ elements.

Bounded load factor

Python's dict resizes when $\alpha > 2/3$ , keeping $\alpha < 2/3$ at all times. Therefore the expected lookup cost is $O(1 + 2/3) = O(1)$ . $\square$

Definition:
`collections.defaultdict`

A dict subclass that calls a factory function for missing keys:

from collections import defaultdict

# Group results by SNR level
results = defaultdict(list)
for snr, ber in measurements:
    results[snr].append(ber)

Without defaultdict, you would need if snr not in results: results[snr] = [] before every append — verbose and error-prone.

Definition:
`collections.Counter`

A dict subclass for counting hashable objects:

from collections import Counter

# Count bit errors
errors = Counter(detected_bits ^ true_bits)
print(errors[1])  # Number of bit errors (1s in XOR result)

# Most common elements
symbol_counts = Counter(received_symbols)
print(symbol_counts.most_common(5))

Definition:
`collections.deque`

A double-ended queue supporting $O(1)$ append and pop from both ends:

from collections import deque

# Sliding window of recent measurements
window = deque(maxlen=100)
for sample in data_stream:
    window.append(sample)
    # window automatically discards oldest when full
    moving_avg = sum(window) / len(window)

Use deque whenever you need a fixed-size buffer, a queue, or efficient prepending.

Definition:
`collections.namedtuple` and `dataclasses`

Named tuples provide lightweight, immutable records:

from collections import namedtuple
Point = namedtuple('Point', ['x', 'y', 'z'])
p = Point(1.0, 2.0, 3.0)
print(p.x, p.y)  # 1.0 2.0

Dataclasses (Python 3.7+) are mutable and more flexible:

from dataclasses import dataclass

@dataclass
class SimConfig:
    snr_db: float = 10.0
    n_antennas: int = 4
    n_realizations: int = 1000
    channel_model: str = "rayleigh"

The @dataclass decorator auto-generates __init__, __repr__, and __eq__. Prefer dataclasses for parameter containers.

dataclass

A Python class decorated with @dataclass that auto-generates __init__, __repr__, __eq__, and optionally __hash__. Ideal for parameter containers in scientific code.

Related: dunder method

Example: Choosing the Right Data Structure

You are running a Monte Carlo simulation that generates 10,000 channel realizations. For each realization, you need to: (a) store the BER result, (b) check if a particular channel matrix has been seen before (exact match), and (c) maintain a running window of the last 50 BER values. Which data structures should you use for each task?

Solution

BER storage: use a list

A list is the natural choice for ordered, append-only results:

ber_results: list[float] = []
for trial in range(10_000):
    ber = run_simulation(trial)
    ber_results.append(ber)  # O(1) amortized

Duplicate detection: use a set

For $O(1)$ membership testing, use a set. Since NumPy arrays are not hashable, convert to a hashable form:

seen_channels: set[bytes] = set()
h_bytes = H.tobytes()
if h_bytes not in seen_channels:  # O(1)
    seen_channels.add(h_bytes)

Sliding window: use a deque

from collections import deque
recent = deque(maxlen=50)
for ber in ber_results:
    recent.append(ber)  # O(1), auto-drops oldest
    avg = sum(recent) / len(recent)

Data Structures Performance Benchmark

python

Benchmarks list vs. deque for prepend, dict vs. list for lookup, and Counter for frequency analysis. Prints timing results.

# Code from: ch01/python/data_structures_benchmark.py
# Load from backend supplements endpoint

Dataclass Patterns for Simulations

python

Demonstrates dataclass patterns: nested configs, results containers, frozen (immutable) configs, and factory methods for antenna arrays.

# Code from: ch01/python/dataclass_patterns.py
# Load from backend supplements endpoint

Data Structure Operation Costs

Compare the empirical time of common operations (append, lookup, delete) across Python's built-in data structures as the container size grows.

Parameters

Operation

Max Container Size100000

Dynamic Array Growth Animation

Watch Python's list grow as elements are appended. The animation shows capacity doublings, element copies during resizes, and the amortized cost converging to $O(1)$ per operation.

Parameters

Number of Appends64

Python Operator Dispatch Mechanism — How Python dispatches `a + b`: check `a.__add__`, fall back to `b.__radd__`, raise TypeError if neither works.

Common Mistake: Using a List for Membership Testing

Mistake:

Checking if x in my_list inside a loop over $n$ elements, giving $O(n^2)$ total time. Common in code that accumulates "seen" items.

Correction:

Use a set for membership testing — if x in my_set is $O(1)$ . Convert the list to a set if you need repeated lookups:

seen = set(my_list)  # O(n) once
for item in other_items:
    if item in seen:  # O(1) per check
        ...

Historical Note: Dictionaries Became Ordered in Python 3.7

2018

Before Python 3.7, dict iteration order was implementation-dependent. CPython 3.6 made insertion order preservation an implementation detail; Python 3.7 made it a language guarantee. This means you can now rely on for k, v in params.items() iterating in insertion order, which is useful for ordered parameter sweeps and reproducible experiment configs.

Why This Matters: Dataclasses as Simulation Configs

In wireless simulation, experiments typically involve sweeping over many parameters (SNR, number of antennas, channel model, coding rate). A @dataclass provides a clean, typed, self-documenting container for these parameters — much better than passing loose **kwargs or raw dictionaries. We will use this pattern throughout Books 1 and 2 whenever we configure simulations.

See full treatment in Classes and Inheritance

Quick Check

Which data structure provides O(1) append AND O(1) prepend?

collections.deque

list

dict

tuple

Correction:

collections.deque

deque (double-ended queue) supports O(1) operations at both ends.

Key Takeaway

Match the data structure to the access pattern. Lists for ordered sequences, sets for membership testing, dicts for key-value lookup, deques for double-ended access. Dataclasses for parameter containers. The wrong choice turns $O(1)$ into $O(n)$ and makes code fragile.

Core Data Structures Refresher

Choosing the Right Container

Definition: Time Complexity of Core Operations

Theorem: Amortized Cost of List Append

Setup

Counting resize costs

Total cost

Theorem: Expected-Case Hash Table Lookup

Expected chain length

Lookup cost

Bounded load factor

Definition: collections.defaultdict

Definition: collections.Counter

Definition: collections.deque

Definition: collections.namedtuple and dataclasses

dataclass

Example: Choosing the Right Data Structure

BER storage: use a list

Duplicate detection: use a set

Sliding window: use a deque

Data Structures Performance Benchmark

Dataclass Patterns for Simulations

Data Structure Operation Costs

Parameters

Dynamic Array Growth Animation

Parameters

Python Operator Dispatch Mechanism

Common Mistake: Using a List for Membership Testing

Historical Note: Dictionaries Became Ordered in Python 3.7

Why This Matters: Dataclasses as Simulation Configs

Quick Check

Key Takeaway

Definition:
Time Complexity of Core Operations

Definition:
`collections.defaultdict`

Definition:
`collections.Counter`

Definition:
`collections.deque`

Definition:
`collections.namedtuple` and `dataclasses`