Seaborn for Statistical Plots
Why Seaborn for Statistical Visualization
Matplotlib is general-purpose; Seaborn is opinionated about statistics.
It creates box plots, violin plots, heatmaps, and CDFs with sensible
defaults and tight Pandas integration. When analyzing simulation
results across multiple parameter sweeps, Seaborn's hue, col, and
row mappings let you explore high-dimensional data in one line of code.
Definition: Figure-Level vs. Axes-Level Functions
Figure-Level vs. Axes-Level Functions
Seaborn provides two function tiers:
- Axes-level (
sns.boxplot,sns.violinplot,sns.heatmap): draw on a singleaxβ composable with Matplotlib subplots. - Figure-level (
sns.catplot,sns.relplot,sns.displot): create their own Figure withFacetGridβ handle multi-panel layouts automatically viacolandrowparameters.
import seaborn as sns
# Axes-level: composable
fig, ax = plt.subplots()
sns.boxplot(data=df, x='modulation', y='ber', ax=ax)
# Figure-level: self-contained
g = sns.catplot(data=df, x='snr', y='ber', col='modulation',
kind='box')
Figure-level functions return a FacetGrid, not an Axes.
Access the underlying axes with g.axes.flat.
Definition: Box Plot Anatomy
Box Plot Anatomy
A box plot summarizes a distribution with five numbers:
- Box: Q1 (25th percentile) to Q3 (75th percentile)
- Median line: Q2 (50th percentile)
- Whiskers: extend to 1.5 IQR beyond Q1/Q3
- Outliers: points beyond the whiskers
where IQR = Q3 - Q1 is the interquartile range.
sns.boxplot(data=df, x='channel', y='throughput',
hue='scheduler', palette='Set2')
Definition: Violin Plot
Violin Plot
A violin plot combines a box plot with a mirrored kernel density estimate (KDE), showing the full distribution shape:
sns.violinplot(data=df, x='antenna_config', y='capacity',
inner='quartile', cut=0)
The inner='quartile' option draws quartile lines inside the violin;
cut=0 clips the KDE at the data range.
Theorem: KDE Bandwidth Selection
A kernel density estimate with bandwidth is:
Too small overfits (jagged); too large oversmooths (loses modes).
Silverman's rule gives
for Gaussian kernels. Seaborn uses this by default in kdeplot and
violinplot.
Think of KDE as placing a small bell curve at each data point and summing them up. The bandwidth controls each bell's width.
Example: Analyzing BER Simulation Results with Seaborn
Given a DataFrame with columns snr_db, modulation, channel,
and ber, create box plots of BER by modulation for each channel type.
Data setup
import pandas as pd
import seaborn as sns
import numpy as np
rng = np.random.default_rng(42)
records = []
for mod in ['BPSK', 'QPSK', '16-QAM']:
for ch in ['AWGN', 'Rayleigh']:
for trial in range(50):
base_ber = {'BPSK': -4, 'QPSK': -3, '16-QAM': -2}[mod]
ch_penalty = 0 if ch == 'AWGN' else 1
ber = 10**(base_ber + ch_penalty + 0.3*rng.standard_normal())
records.append({'modulation': mod, 'channel': ch, 'ber': ber})
df = pd.DataFrame(records)
Plotting
g = sns.catplot(data=df, x='modulation', y='ber',
col='channel', kind='box', log_scale=True,
palette='Set2', height=4, aspect=1.2)
g.set_axis_labels('Modulation', 'BER')
g.figure.suptitle('BER by Modulation and Channel', y=1.02)
Example: Correlation Heatmap
Visualize the correlation matrix of simulation parameters using
sns.heatmap with annotated values.
Implementation
corr = df[['snr', 'ber', 'throughput', 'latency']].corr()
fig, ax = plt.subplots(figsize=(6, 5))
sns.heatmap(corr, annot=True, fmt='.2f', cmap='RdBu_r',
center=0, vmin=-1, vmax=1, ax=ax,
square=True, linewidths=0.5)
ax.set_title('Parameter Correlation Matrix')
Example: Empirical CDF Comparison
Compare the CDF of throughput across different scheduling algorithms.
Implementation
fig, ax = plt.subplots(figsize=(7, 4))
for scheduler in df['scheduler'].unique():
subset = df[df['scheduler'] == scheduler]
sns.ecdfplot(data=subset, x='throughput', ax=ax,
label=scheduler)
ax.set(xlabel='Throughput (Mbps)', ylabel='CDF',
title='Throughput CDF by Scheduler')
ax.legend()
Statistical Plot Type Explorer
Compare box, violin, strip, and swarm plots on the same simulation data to see which reveals the most information.
Parameters
Common Mistake: Mixing Figure-Level and Axes-Level Functions
Mistake:
Passing an ax parameter to a figure-level function like sns.catplot(..., ax=ax).
Figure-level functions create their own Figure and ignore the ax argument.
Correction:
Use the axes-level equivalent: sns.boxplot(data=df, ax=ax) instead
of sns.catplot(data=df, kind='box', ax=ax).
Quick Check
What information does a violin plot show that a box plot does not?
The mean value
The full distribution shape (density estimate)
The standard deviation
Outlier positions
A violin plot shows the KDE, revealing multimodality and skewness that box plots hide.
Key Takeaway
Seaborn + Pandas = effortless statistical plots. Structure your
simulation results as a tidy DataFrame with one row per observation,
then use hue, col, and row to explore multiple factors in a
single function call.
Kernel Density Estimation (KDE)
A non-parametric method to estimate the probability density function of a random variable by smoothing individual data points with a kernel function.
FacetGrid
A Seaborn class that creates a grid of subplots conditioned on data variables, enabling multi-panel exploration with one line of code.
Historical Note: Seaborn's Origin
2012Michael Waskom created Seaborn in 2012 while a PhD student in neuroscience at NYU. Frustrated by the amount of Matplotlib boilerplate needed for common statistical plots, he built Seaborn as a high-level wrapper. The name comes from a character in The West Wing, following the Python community's tradition of naming projects after Monty Python and other cultural references.