Pandas for Tabular Experiment Data

Why Pandas for Simulation Data

Simulation campaigns produce tabular data: each row is one experiment (SNR, modulation, channel, seed) and each column is a metric (BER, throughput, latency). Pandas DataFrames handle this naturally with groupby, pivot, merge, and built-in plotting.

Definition:
DataFrame — The Core Pandas Object

A DataFrame is a 2D table with labeled rows (index) and columns:

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'snr_db': [0, 5, 10, 15, 20],
    'modulation': ['BPSK'] * 5,
    'ber': [0.08, 0.006, 1.3e-4, 8e-7, 3e-11],
    'throughput': [0.92, 0.99, 1.0, 1.0, 1.0],
})
print(df.dtypes)      # column types
print(df.shape)        # (5, 4)
print(df.describe())   # summary statistics

Always check df.dtypes after loading data. Columns may be strings when you expect numbers.

Definition:
groupby — Split-Apply-Combine

groupby splits the DataFrame by a column, applies a function to each group, and combines the results:

# Mean BER by modulation
df.groupby('modulation')['ber'].mean()

# Multiple aggregations
df.groupby('modulation').agg(
    mean_ber=('ber', 'mean'),
    std_ber=('ber', 'std'),
    n_trials=('ber', 'count'),
)

# Group by multiple columns
df.groupby(['modulation', 'channel'])['throughput'].median()

Definition:
pivot_table and melt — Reshape Data

pivot_table reshapes long-form data into a matrix:

table = df.pivot_table(
    values='ber', index='modulation', columns='snr_db',
    aggfunc='mean'
)

melt does the reverse — wide to long:

long = table.reset_index().melt(
    id_vars='modulation', var_name='snr_db', value_name='ber'
)

Seaborn and Plotly expect "tidy" (long-form) data. Use melt to convert wide tables to tidy format.

Theorem: Pandas Vectorization

Pandas operations on columns are vectorized (implemented in C/NumPy). For a DataFrame with $N$ rows:

Vectorized: df['ber_db'] = 10 * np.log10(df['ber']) — $O(N)$ in C
apply: df['ber_db'] = df['ber'].apply(lambda x: 10*np.log10(x)) — $O(N)$ in Python
iterrows: for _, row in df.iterrows() — $O(N)$ in Python, 10-100x slower

Always prefer vectorized operations over apply over iterrows.

Vectorized operations bypass Python's interpreter, operating directly on the underlying NumPy arrays.

Example: Analyzing Simulation Results with Pandas

Load BER simulation results from CSV, compute summary statistics grouped by modulation, and create a pivot table.

Solution

Implementation

import pandas as pd
import numpy as np

# Load results
df = pd.read_csv('ber_results.csv')
print(f"Loaded {len(df)} experiments")
print(df.head())

# Summary by modulation
summary = df.groupby('modulation').agg(
    mean_ber=('ber', 'mean'),
    min_ber=('ber', 'min'),
    max_ber=('ber', 'max'),
    n_trials=('ber', 'count'),
)
print(summary)

# Pivot: BER matrix
pivot = df.pivot_table(
    values='ber', index='modulation', columns='snr_db'
)
print(pivot.to_string(float_format='%.2e'))

Example: Merging Simulation and Configuration Data

Merge a simulation results DataFrame with a configuration DataFrame to analyze how hardware parameters affect performance.

Solution

Implementation

# Results table
results = pd.DataFrame({
    'config_id': [1, 1, 2, 2, 3, 3],
    'snr_db': [10, 20, 10, 20, 10, 20],
    'ber': [1e-3, 1e-5, 5e-4, 2e-6, 2e-3, 8e-5],
})

# Config table
configs = pd.DataFrame({
    'config_id': [1, 2, 3],
    'n_antennas': [2, 4, 2],
    'modulation': ['QPSK', 'QPSK', '16-QAM'],
})

# Merge
full = results.merge(configs, on='config_id')
print(full)

# Now you can group by hardware config
full.groupby('n_antennas')['ber'].mean()

Example: Pandas Method Chaining

Use method chaining to filter, transform, and summarize in one expression.

Solution

Implementation

(df
 .query('snr_db >= 5')
 .assign(ber_db=lambda x: 10 * np.log10(x['ber']))
 .groupby('modulation')
 .agg(mean_ber_db=('ber_db', 'mean'))
 .sort_values('mean_ber_db')
)

Pandas Operations Explorer

See how groupby, pivot, and filtering work on simulation data.

Parameters

Pandas Operation Performance

Operation	Speed	Use When
Vectorized (df['a'] + df['b'])	Fastest (C/NumPy)	Simple arithmetic, comparisons
.apply(func)	Medium (Python per-group)	Complex per-row/group logic
.iterrows()	Slowest (Python per-row)	Last resort only
.groupby().agg()	Fast (C aggregations)	Summary statistics
.merge()	Fast (hash join)	Combining tables

Why This Matters: Managing Simulation Campaigns with Pandas

A typical wireless system simulation campaign sweeps over 5+ parameters (SNR, modulation, coding rate, antenna config, channel model) with multiple seeds per point. This generates thousands of rows. Pandas groupby + agg computes mean BER, confidence intervals, and throughput percentiles across seeds, while pivot_table creates the result matrices that go directly into papers via to_latex().

Common Mistake: SettingWithCopyWarning

Mistake:

Assigning to a slice of a DataFrame: df[df['snr'] > 10]['ber'] = 0 — this may modify a copy, not df.

Correction:

Use .loc for assignment: df.loc[df['snr'] > 10, 'ber'] = 0

Key Takeaway

Store simulation results as tidy DataFrames: one row per observation. Use groupby + agg for summary statistics, pivot_table for result matrices, and merge to join configuration and results. Never loop over rows — vectorize.

Quick Check

What does df.groupby('modulation')['ber'].mean() return?

A DataFrame with all columns

A Series with the mean BER for each modulation

A single number (the overall mean)

An error because groupby requires two arguments

Correction:

A Series with the mean BER for each modulation

groupby splits by modulation, selects 'ber', and computes the mean per group.

DataFrame

A 2D labeled data structure in Pandas with rows (index) and columns, similar to a spreadsheet or SQL table.

groupby

A Pandas operation that splits a DataFrame into groups by one or more columns, applies a function to each group, and combines the results.

Tidy Data

A data format where each row is one observation and each column is one variable, enabling easy analysis with groupby, plotting, and machine learning.

Effective Notebook Patterns Converting Notebooks to Scripts and Back

Pandas for Tabular Experiment Data

Why Pandas for Simulation Data

Definition: DataFrame — The Core Pandas Object

Definition: groupby — Split-Apply-Combine

Definition: pivot_table and melt — Reshape Data

Theorem: Pandas Vectorization

Example: Analyzing Simulation Results with Pandas

Implementation

Example: Merging Simulation and Configuration Data

Implementation

Example: Pandas Method Chaining

Implementation

Pandas Operations Explorer

Parameters

Pandas Operation Performance

Why This Matters: Managing Simulation Campaigns with Pandas

Common Mistake: SettingWithCopyWarning

Key Takeaway

Quick Check

DataFrame

groupby

Tidy Data

Definition:
DataFrame — The Core Pandas Object

Definition:
groupby — Split-Apply-Combine

Definition:
pivot_table and melt — Reshape Data