References & Further Reading
References
- F. Perez and B. E. Granger, IPython: A System for Interactive Scientific Computing, Computing in Science & Engineering, 2007
The foundational paper on IPython, describing the architecture of the interactive computing environment that evolved into Jupyter.
- T. Kluyver et al., Jupyter Notebooks — A Publishing Format for Reproducible Computational Workflows, ELPUB, 2016
Describes the Jupyter notebook format and its role in reproducible computational research.
- W. McKinney, Python for Data Analysis, O'Reilly, 2017
The definitive Pandas book, written by the creator of Pandas. Covers DataFrames, groupby, merge, and time series in depth.
- H. Wickham, Tidy Data, Journal of Statistical Software, 2014
Formalizes the concept of tidy data: each row is an observation, each column is a variable. Foundational for modern data analysis.
- M. Wouts, jupytext Documentation, 2024
Official documentation for jupytext, covering pairing formats, configuration, and integration with JupyterLab.
Further Reading
Advanced Pandas techniques
M. Harrison, *Effective Pandas*, 2nd ed., 2022
Covers advanced topics like MultiIndex, method chaining, window functions, and performance optimization.
Notebook best practices
J. VanderPlas, *Reproducible Data Analysis in Jupyter* (YouTube series)
Practical guidelines for structuring notebooks for reproducibility, including git integration strategies.
Interactive widgets
ipywidgets documentation: https://ipywidgets.readthedocs.io/
Complete reference for building interactive notebook interfaces with sliders, buttons, and output widgets.
papermill for production notebooks
Netflix tech blog: https://netflixtechblog.com/scheduling-notebooks-348e6c14cfd6
Netflix's approach to using papermill for production data pipelines, running hundreds of notebooks daily.