arrow-right cart chevron-down chevron-left chevron-right chevron-up close menu minus play plus search share user email pinterest facebook instagram snapchat tumblr twitter vimeo youtube subscribe dogecoin dwolla forbrugsforeningen litecoin amazon_payments american_express bitcoin cirrus discover fancy interac jcb master paypal stripe visa diners_club dankort maestro trash

Shopping Cart


Industry & Career

The ultimate data science reading list


A curated collection of textbooks, academic papers, blog posts and other non-fiction content surrounding data science, updated monthly.
The ultimate data science reading list

by Mojan Benham

A year ago


Two-line summary

2 minute read

One of my favourite things about the field of data science is the number of successful people that are self-taught. The following list consists of all the resources I've come across or have been recommended to me over the years, a list that I will update on a monthly basis.

Table of contents

Introduction

When I look across the data science teams I've worked with over the years, I'm fortunate enough to have met folks with vastly differing career experiences: botanists, engineers, baristas, marketers, economists, government officials... you name it! This cultivates such a fantastic work environment because each person's training (whether formal or informal) brings unique value to the conversation.

It is also precisely why I don't believe higher education is absolutely necessary. The structured curriculum and accreditation is what lead me to pursue a master's degree, but I have smarter and more senior colleagues that got to where they are by being self-taught. The only prerequisite is discipline.

Every now and then, I'll meet someone with such mastery of a particular topic that I ask, "how do you know so much about this?" I usually get some vague variant of, "I saw a thread on Twitter" or "there's this podcast episode" or "my friend sent me a blog post". Tired of the elusiveness, I started prying for more details and crafting this post.

Whenever I come across content that could potentially improve my data science knowledge, I will update the list and slowly read my way through it. Updates will soon include free courses, blog posts, academic papers and Twitter threads as well. I hope that this can act as a digital bookshelf for those looking to break into the industry or improve their skills.

I'd love to hear any additions you would recommend in the comments below.

Statistical foundations

  • Probability and Statistics for Engineering and the Sciences (by) Jay L. Devore
  • How to Lie with Statistics (by) Darrell Huff
  • Applied Predictive Modeling (by) Max Kuhn, Kjell Johnson
  • Regression Modeling Strategies - With Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis (by) Frank E. Harrell, Jr

Applied data science

  • Product Analytics - Applied Data Science for Actionable Consumer Insights (by) Joanne Rodrigues
  • Introduction to Algorithmic Marketing - Artificial Intelligence for Marketing Operations (by) Ilya Katsov
  • Data-driven Science and Engineering - Machine Learning, Dynamical Systems and Control (by) Steven L. Brunton, J. Nathan Kutz
  • Data-driven Modeling & Scientific Computation - Methods for Complex Systems & Big Data (by) J. Nathan Kutz
  • Forecasting: Principles and Practice (by) Rob J Hyndman
  • Python Machine Learning - Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow (by) Sebastian Raschka & Vahid Mirjalili

Experimentation

  • Hypothesis Testing - An Intuitive Guide for Making Data Driven Decisions (by) Jim Frost
  • Trustworthy Online Controlled Experiments - A Practical Guide to A/B Testing (by) Ron Kohavi, Diane Tang, Ya Xu
  • Design and Analysis of Experiments (by) Douglas C. Montgomery
  • Statistics for Experimenters - Design, Innovation and Discovery (by) George E. P. Box, J. Stuart Hunter, William G. Hunter

Data visualization

  • Envisioning Information (by) Edward R. Tufte
  • Good Charts (by) Scott Berinato
  • Guide to Information Graphics - The Dos & Don'ts of Presenting Data, Facts, and Figures (by) Dona M. Wong
  • Information is Beautiful (by) David McCandless
  • Knowledge is Beautiful (by) David McCandless
  • Show Me the Numbers - Designing Tables and Graphs to Enlighten (by) Stephen Few
  • Storytelling with Data - A Data Visualization Guide for Business Professionals (by) Cole Nussbaumer Knaflic 
  • The Visual Display of Quantitative Information (by) Edward R. Tufte

Econometrics

  • Principles of Econometrics (by) R. Carter Hill, William E. Griffiths, Guay C. Lim
  • Mastering 'Metrics (by) Joshua D. Angrist, Jörn-Steffen Pischke
  • Mostly Harmless Econometrics (by) Joshua D. Angrist, Jörn-Steffen Pischke

Bayesian methods

  • Bayesian Methods for Hackers - Probabilistic Programming and Bayesian Inference (by) Cameron Davidson-Pilon
  • Bayesian Data Analysis (by) Andrew Gelmen, John B. Carlin, Hal S. Stern, David B. Dunson, Aki Vehtari, Donald B. Rubin
  • Statistical Rethinking - A Bayesian Course with Examples in R and Stan (by) Richard McElreath

Causal Inference

  • Causal Inference - The Mixtape (by) Scott Cunningham
  • Causal Inference in Statistics: A Primer (by) Judea Pearl, Madelyn Glymour, Nicholas P. Jewell
  • Elements of Causal Inference - Foundations and Learning Algorithms (by) Jonas Peters, Dominik Janzing, Bernhard Schölkopf
  • The Book of Why - The New Science of Cause and Effect (by) Judea Pearl and Dana Mackenzie

Engineering

  • Data Pipelines with Apache Airflow (by) Bas Harenslak, Julian de Ruiter
  • Python for Data Analysis - Data Wrangling with Pandas, NumPy, and IPython (by) Wes McKinney
  • Python in a Nutshell - A Desktop Quick Reference (by) Alex Martelli, Anna Ravenscroft, Steve Holden
  • The Art of Doing Science and Engineering - Learning to Learn (by) Richard W. Hamming
  • The Data Warehousing Toolkit - The Definitive Guide to Dimensional Modeling (by) Ralph Kimball, Margy Ross
  • The Elegant Puzzle - Systems of Engineering Management (by) Will Larson
  • [Free courses] Learn Analytics Engineering with dbt: https://courses.getdbt.com/collections

Fun & entertaining

  • Birth of a Theorem - A Mathematical Adventure (by) Cédric Villani
  • Cribsheet - A Data-driven Guide to Better, More Relaxed Parenting, from Birth to Preschool (by) Emily Oster
  • Dataclysm - Love, Sex, Race and Identity, What our Online Lives Tell Us about our Offline Selves
  • Freakonomics - A Rogue Economist Explores the Hidden Side of Everything (by) Steven D. Levitt & Stephen J. Dubner
  • Scorecasting - The Hidden Influences behind how Sports are Played and Games are Won (by) Tobias J. Moskowitz & L. Jon Wertheim

Leadership & business

  • Crucial Conversations - Tools for Talking When Stakes are High (by) Kerry Patterson, Joseph Grenny, Ron McMillan, Al Switzler
  • Lean in - Women, Work and the Will to Lead (by) Sheryl Sandberg
  • Scale - The Universal Laws of Growth, Innovation, Sustainability, and the Pace of Life in Organisms, Cities, Economies and Companies (by) Geoffrey West
  • Radical Candor - Be a Kick-ass Boss without Losing your Humanity (by) Kim Scott
  • Team Toplogies: Organizing Business and Technology Teams for Fast Flow (by) Manual Pais, Matthew Skelton
  • The Cold Start Problem - How to Start and Scale Network Effects (by) Andrew Chen
  • The Making of a Manager (by) Julie Zhuo
  • Thinking in Systems (by) Donella H. Meadows

1 comment


  • Great list! This really helps in building a structured approach to learning

    Akash Banerjee on

Leave a comment