Two-line summary
2 minute read
One of my favourite things about the field of data science is the number of successful people that are self-taught. The following list consists of all the resources I've come across or have been recommended to me over the years, a list that I will update on a monthly basis.
Table of contents
- Introduction
- Statistical foundations
- Applied data science
- Experimentation
- Data visualization
- Econometrics
- Bayesian methods
- Causal inference
- Engineering
- Fun & entertaining
- Leadership & business
Introduction
When I look across the data science teams I've worked with over the years, I'm fortunate enough to have met folks with vastly differing career experiences: botanists, engineers, baristas, marketers, economists, government officials... you name it! This cultivates such a fantastic work environment because each person's training (whether formal or informal) brings unique value to the conversation.
It is also precisely why I don't believe higher education is absolutely necessary. The structured curriculum and accreditation is what lead me to pursue a master's degree, but I have smarter and more senior colleagues that got to where they are by being self-taught. The only prerequisite is discipline.
Every now and then, I'll meet someone with such mastery of a particular topic that I ask, "how do you know so much about this?" I usually get some vague variant of, "I saw a thread on Twitter" or "there's this podcast episode" or "my friend sent me a blog post". Tired of the elusiveness, I started prying for more details and crafting this post.
Whenever I come across content that could potentially improve my data science knowledge, I will update the list and slowly read my way through it. Updates will soon include free courses, blog posts, academic papers and Twitter threads as well. I hope that this can act as a digital bookshelf for those looking to break into the industry or improve their skills.
I'd love to hear any additions you would recommend in the comments below.
Statistical foundations
- Probability and Statistics for Engineering and the Sciences (by) Jay L. Devore
- How to Lie with Statistics (by) Darrell Huff
- Applied Predictive Modeling (by) Max Kuhn, Kjell Johnson
- Regression Modeling Strategies - With Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis (by) Frank E. Harrell, Jr
Applied data science
- Product Analytics - Applied Data Science for Actionable Consumer Insights (by) Joanne Rodrigues
- Introduction to Algorithmic Marketing - Artificial Intelligence for Marketing Operations (by) Ilya Katsov
- Data-driven Science and Engineering - Machine Learning, Dynamical Systems and Control (by) Steven L. Brunton, J. Nathan Kutz
- Data-driven Modeling & Scientific Computation - Methods for Complex Systems & Big Data (by) J. Nathan Kutz
- Forecasting: Principles and Practice (by) Rob J Hyndman
- Python Machine Learning - Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow (by) Sebastian Raschka & Vahid Mirjalili
Experimentation
- Hypothesis Testing - An Intuitive Guide for Making Data Driven Decisions (by) Jim Frost
- Trustworthy Online Controlled Experiments - A Practical Guide to A/B Testing (by) Ron Kohavi, Diane Tang, Ya Xu
- Design and Analysis of Experiments (by) Douglas C. Montgomery
- Statistics for Experimenters - Design, Innovation and Discovery (by) George E. P. Box, J. Stuart Hunter, William G. Hunter
Data visualization
- Envisioning Information (by) Edward R. Tufte
- Good Charts (by) Scott Berinato
- Guide to Information Graphics - The Dos & Don'ts of Presenting Data, Facts, and Figures (by) Dona M. Wong
- Information is Beautiful (by) David McCandless
- Knowledge is Beautiful (by) David McCandless
- Show Me the Numbers - Designing Tables and Graphs to Enlighten (by) Stephen Few
- Storytelling with Data - A Data Visualization Guide for Business Professionals (by) Cole Nussbaumer Knaflic
- The Visual Display of Quantitative Information (by) Edward R. Tufte
Econometrics
- Principles of Econometrics (by) R. Carter Hill, William E. Griffiths, Guay C. Lim
- Mastering 'Metrics (by) Joshua D. Angrist, Jörn-Steffen Pischke
- Mostly Harmless Econometrics (by) Joshua D. Angrist, Jörn-Steffen Pischke
Bayesian methods
- Bayesian Methods for Hackers - Probabilistic Programming and Bayesian Inference (by) Cameron Davidson-Pilon
- Bayesian Data Analysis (by) Andrew Gelmen, John B. Carlin, Hal S. Stern, David B. Dunson, Aki Vehtari, Donald B. Rubin
- Statistical Rethinking - A Bayesian Course with Examples in R and Stan (by) Richard McElreath
Causal Inference
- Causal Inference - The Mixtape (by) Scott Cunningham
- Causal Inference in Statistics: A Primer (by) Judea Pearl, Madelyn Glymour, Nicholas P. Jewell
- Elements of Causal Inference - Foundations and Learning Algorithms (by) Jonas Peters, Dominik Janzing, Bernhard Schölkopf
- The Book of Why - The New Science of Cause and Effect (by) Judea Pearl and Dana Mackenzie
- Causal Inference and Discovery in Python (by) Aleksander Molak
Engineering
- Data Pipelines with Apache Airflow (by) Bas Harenslak, Julian de Ruiter
- Python for Data Analysis - Data Wrangling with Pandas, NumPy, and IPython (by) Wes McKinney
- Python in a Nutshell - A Desktop Quick Reference (by) Alex Martelli, Anna Ravenscroft, Steve Holden
- The Art of Doing Science and Engineering - Learning to Learn (by) Richard W. Hamming
- The Data Warehousing Toolkit - The Definitive Guide to Dimensional Modeling (by) Ralph Kimball, Margy Ross
- The Elegant Puzzle - Systems of Engineering Management (by) Will Larson
- [Free courses] Learn Analytics Engineering with dbt: https://courses.getdbt.com/collections
Fun & entertaining
- Birth of a Theorem - A Mathematical Adventure (by) Cédric Villani
- Cribsheet - A Data-driven Guide to Better, More Relaxed Parenting, from Birth to Preschool (by) Emily Oster
- Dataclysm - Love, Sex, Race and Identity, What our Online Lives Tell Us about our Offline Selves
- Freakonomics - A Rogue Economist Explores the Hidden Side of Everything (by) Steven D. Levitt & Stephen J. Dubner
- Scorecasting - The Hidden Influences behind how Sports are Played and Games are Won (by) Tobias J. Moskowitz & L. Jon Wertheim
Leadership & business
- Crucial Conversations - Tools for Talking When Stakes are High (by) Kerry Patterson, Joseph Grenny, Ron McMillan, Al Switzler
- Lean in - Women, Work and the Will to Lead (by) Sheryl Sandberg
- Scale - The Universal Laws of Growth, Innovation, Sustainability, and the Pace of Life in Organisms, Cities, Economies and Companies (by) Geoffrey West
- Radical Candor - Be a Kick-ass Boss without Losing your Humanity (by) Kim Scott
- Team Toplogies: Organizing Business and Technology Teams for Fast Flow (by) Manual Pais, Matthew Skelton
- The Cold Start Problem - How to Start and Scale Network Effects (by) Andrew Chen
- The Making of a Manager (by) Julie Zhuo
- Thinking in Systems (by) Donella H. Meadows
1 comment
Great list! This really helps in building a structured approach to learning