Seaborn for Statistical Plots

Why Seaborn for Statistical Visualization

Matplotlib is general-purpose; Seaborn is opinionated about statistics. It creates box plots, violin plots, heatmaps, and CDFs with sensible defaults and tight Pandas integration. When analyzing simulation results across multiple parameter sweeps, Seaborn's hue, col, and row mappings let you explore high-dimensional data in one line of code.

Definition:

Figure-Level vs. Axes-Level Functions

Seaborn provides two function tiers:

  • Axes-level (sns.boxplot, sns.violinplot, sns.heatmap): draw on a single ax β€” composable with Matplotlib subplots.
  • Figure-level (sns.catplot, sns.relplot, sns.displot): create their own Figure with FacetGrid β€” handle multi-panel layouts automatically via col and row parameters.
import seaborn as sns
# Axes-level: composable
fig, ax = plt.subplots()
sns.boxplot(data=df, x='modulation', y='ber', ax=ax)

# Figure-level: self-contained
g = sns.catplot(data=df, x='snr', y='ber', col='modulation',
                kind='box')

Figure-level functions return a FacetGrid, not an Axes. Access the underlying axes with g.axes.flat.

Definition:

Box Plot Anatomy

A box plot summarizes a distribution with five numbers:

  • Box: Q1 (25th percentile) to Q3 (75th percentile)
  • Median line: Q2 (50th percentile)
  • Whiskers: extend to 1.5 Γ—\times IQR beyond Q1/Q3
  • Outliers: points beyond the whiskers

where IQR = Q3 - Q1 is the interquartile range.

sns.boxplot(data=df, x='channel', y='throughput',
            hue='scheduler', palette='Set2')

Definition:

Violin Plot

A violin plot combines a box plot with a mirrored kernel density estimate (KDE), showing the full distribution shape:

sns.violinplot(data=df, x='antenna_config', y='capacity',
               inner='quartile', cut=0)

The inner='quartile' option draws quartile lines inside the violin; cut=0 clips the KDE at the data range.

Theorem: KDE Bandwidth Selection

A kernel density estimate with bandwidth hh is:

f^(x)=1Nhβˆ‘i=1NK ⁣(xβˆ’xih)\hat{f}(x) = \frac{1}{Nh}\sum_{i=1}^{N} K\!\left(\frac{x - x_i}{h}\right)

Too small hh overfits (jagged); too large hh oversmooths (loses modes). Silverman's rule gives hopt=1.06 σ^ Nβˆ’1/5h_{\text{opt}} = 1.06\,\hat{\sigma}\,N^{-1/5} for Gaussian kernels. Seaborn uses this by default in kdeplot and violinplot.

Think of KDE as placing a small bell curve at each data point and summing them up. The bandwidth controls each bell's width.

Example: Analyzing BER Simulation Results with Seaborn

Given a DataFrame with columns snr_db, modulation, channel, and ber, create box plots of BER by modulation for each channel type.

Example: Correlation Heatmap

Visualize the correlation matrix of simulation parameters using sns.heatmap with annotated values.

Example: Empirical CDF Comparison

Compare the CDF of throughput across different scheduling algorithms.

Statistical Plot Type Explorer

Compare box, violin, strip, and swarm plots on the same simulation data to see which reveals the most information.

Parameters

Common Mistake: Mixing Figure-Level and Axes-Level Functions

Mistake:

Passing an ax parameter to a figure-level function like sns.catplot(..., ax=ax). Figure-level functions create their own Figure and ignore the ax argument.

Correction:

Use the axes-level equivalent: sns.boxplot(data=df, ax=ax) instead of sns.catplot(data=df, kind='box', ax=ax).

Quick Check

What information does a violin plot show that a box plot does not?

The mean value

The full distribution shape (density estimate)

The standard deviation

Outlier positions

Key Takeaway

Seaborn + Pandas = effortless statistical plots. Structure your simulation results as a tidy DataFrame with one row per observation, then use hue, col, and row to explore multiple factors in a single function call.

Kernel Density Estimation (KDE)

A non-parametric method to estimate the probability density function of a random variable by smoothing individual data points with a kernel function.

FacetGrid

A Seaborn class that creates a grid of subplots conditioned on data variables, enabling multi-panel exploration with one line of code.

Historical Note: Seaborn's Origin

2012

Michael Waskom created Seaborn in 2012 while a PhD student in neuroscience at NYU. Frustrated by the amount of Matplotlib boilerplate needed for common statistical plots, he built Seaborn as a high-level wrapper. The name comes from a character in The West Wing, following the Python community's tradition of naming projects after Monty Python and other cultural references.