Pandas for Tabular Experiment Data
Why Pandas for Simulation Data
Simulation campaigns produce tabular data: each row is one experiment (SNR, modulation, channel, seed) and each column is a metric (BER, throughput, latency). Pandas DataFrames handle this naturally with groupby, pivot, merge, and built-in plotting.
Definition: DataFrame β The Core Pandas Object
DataFrame β The Core Pandas Object
A DataFrame is a 2D table with labeled rows (index) and columns:
import pandas as pd
import numpy as np
df = pd.DataFrame({
'snr_db': [0, 5, 10, 15, 20],
'modulation': ['BPSK'] * 5,
'ber': [0.08, 0.006, 1.3e-4, 8e-7, 3e-11],
'throughput': [0.92, 0.99, 1.0, 1.0, 1.0],
})
print(df.dtypes) # column types
print(df.shape) # (5, 4)
print(df.describe()) # summary statistics
Always check df.dtypes after loading data. Columns may be
strings when you expect numbers.
Definition: groupby β Split-Apply-Combine
groupby β Split-Apply-Combine
groupby splits the DataFrame by a column, applies a function to
each group, and combines the results:
# Mean BER by modulation
df.groupby('modulation')['ber'].mean()
# Multiple aggregations
df.groupby('modulation').agg(
mean_ber=('ber', 'mean'),
std_ber=('ber', 'std'),
n_trials=('ber', 'count'),
)
# Group by multiple columns
df.groupby(['modulation', 'channel'])['throughput'].median()
Definition: pivot_table and melt β Reshape Data
pivot_table and melt β Reshape Data
pivot_table reshapes long-form data into a matrix:
table = df.pivot_table(
values='ber', index='modulation', columns='snr_db',
aggfunc='mean'
)
melt does the reverse β wide to long:
long = table.reset_index().melt(
id_vars='modulation', var_name='snr_db', value_name='ber'
)
Seaborn and Plotly expect "tidy" (long-form) data. Use melt
to convert wide tables to tidy format.
Theorem: Pandas Vectorization
Pandas operations on columns are vectorized (implemented in C/NumPy). For a DataFrame with rows:
- Vectorized:
df['ber_db'] = 10 * np.log10(df['ber'])β in C - apply:
df['ber_db'] = df['ber'].apply(lambda x: 10*np.log10(x))β in Python - iterrows:
for _, row in df.iterrows()β in Python, 10-100x slower
Always prefer vectorized operations over apply over iterrows.
Vectorized operations bypass Python's interpreter, operating directly on the underlying NumPy arrays.
Example: Analyzing Simulation Results with Pandas
Load BER simulation results from CSV, compute summary statistics grouped by modulation, and create a pivot table.
Implementation
import pandas as pd
import numpy as np
# Load results
df = pd.read_csv('ber_results.csv')
print(f"Loaded {len(df)} experiments")
print(df.head())
# Summary by modulation
summary = df.groupby('modulation').agg(
mean_ber=('ber', 'mean'),
min_ber=('ber', 'min'),
max_ber=('ber', 'max'),
n_trials=('ber', 'count'),
)
print(summary)
# Pivot: BER matrix
pivot = df.pivot_table(
values='ber', index='modulation', columns='snr_db'
)
print(pivot.to_string(float_format='%.2e'))
Example: Merging Simulation and Configuration Data
Merge a simulation results DataFrame with a configuration DataFrame to analyze how hardware parameters affect performance.
Implementation
# Results table
results = pd.DataFrame({
'config_id': [1, 1, 2, 2, 3, 3],
'snr_db': [10, 20, 10, 20, 10, 20],
'ber': [1e-3, 1e-5, 5e-4, 2e-6, 2e-3, 8e-5],
})
# Config table
configs = pd.DataFrame({
'config_id': [1, 2, 3],
'n_antennas': [2, 4, 2],
'modulation': ['QPSK', 'QPSK', '16-QAM'],
})
# Merge
full = results.merge(configs, on='config_id')
print(full)
# Now you can group by hardware config
full.groupby('n_antennas')['ber'].mean()
Example: Pandas Method Chaining
Use method chaining to filter, transform, and summarize in one expression.
Implementation
(df
.query('snr_db >= 5')
.assign(ber_db=lambda x: 10 * np.log10(x['ber']))
.groupby('modulation')
.agg(mean_ber_db=('ber_db', 'mean'))
.sort_values('mean_ber_db')
)
Pandas Operations Explorer
See how groupby, pivot, and filtering work on simulation data.
Parameters
Pandas Operation Performance
| Operation | Speed | Use When |
|---|---|---|
| Vectorized (df['a'] + df['b']) | Fastest (C/NumPy) | Simple arithmetic, comparisons |
| .apply(func) | Medium (Python per-group) | Complex per-row/group logic |
| .iterrows() | Slowest (Python per-row) | Last resort only |
| .groupby().agg() | Fast (C aggregations) | Summary statistics |
| .merge() | Fast (hash join) | Combining tables |
Why This Matters: Managing Simulation Campaigns with Pandas
A typical wireless system simulation campaign sweeps over 5+ parameters
(SNR, modulation, coding rate, antenna config, channel model) with
multiple seeds per point. This generates thousands of rows.
Pandas groupby + agg computes mean BER, confidence intervals,
and throughput percentiles across seeds, while pivot_table creates
the result matrices that go directly into papers via to_latex().
Common Mistake: SettingWithCopyWarning
Mistake:
Assigning to a slice of a DataFrame:
df[df['snr'] > 10]['ber'] = 0 β this may modify a copy, not df.
Correction:
Use .loc for assignment:
df.loc[df['snr'] > 10, 'ber'] = 0
Key Takeaway
Store simulation results as tidy DataFrames: one row per observation.
Use groupby + agg for summary statistics, pivot_table for
result matrices, and merge to join configuration and results.
Never loop over rows β vectorize.
Quick Check
What does df.groupby('modulation')['ber'].mean() return?
A DataFrame with all columns
A Series with the mean BER for each modulation
A single number (the overall mean)
An error because groupby requires two arguments
groupby splits by modulation, selects 'ber', and computes the mean per group.
DataFrame
A 2D labeled data structure in Pandas with rows (index) and columns, similar to a spreadsheet or SQL table.
groupby
A Pandas operation that splits a DataFrame into groups by one or more columns, applies a function to each group, and combines the results.
Tidy Data
A data format where each row is one observation and each column is one variable, enabling easy analysis with groupby, plotting, and machine learning.