arrow-right cart chevron-down chevron-left chevron-right chevron-up close menu minus play plus search share user email pinterest facebook instagram snapchat tumblr twitter vimeo youtube subscribe dogecoin dwolla forbrugsforeningen litecoin amazon_payments american_express bitcoin cirrus discover fancy interac jcb master paypal stripe visa diners_club dankort maestro trash

Shopping Cart

Industry & Career

The ultimate data science reading list

A curated collection of textbooks, academic papers, blog posts and other non-fiction content surrounding data science, updated monthly.
The ultimate data science reading list

by Mojan Benham

A year ago

Two-line summary

2 minute read

One of my favourite things about the field of data science is the number of successful people that are self-taught. The following list consists of all the resources I've come across or have been recommended to me over the years, a list that I will update on a monthly basis.

Table of contents


When I look across the data science teams I've worked with over the years, I'm fortunate enough to have met folks with vastly differing career experiences: botanists, engineers, baristas, marketers, economists, government officials... you name it! This cultivates such a fantastic work environment because each person's training (whether formal or informal) brings unique value to the conversation.

It is also precisely why I don't believe higher education is absolutely necessary. The structured curriculum and accreditation is what lead me to pursue a master's degree, but I have smarter and more senior colleagues that got to where they are by being self-taught. The only prerequisite is discipline.

Every now and then, I'll meet someone with such mastery of a particular topic that I ask, "how do you know so much about this?" I usually get some vague variant of, "I saw a thread on Twitter" or "there's this podcast episode" or "my friend sent me a blog post". Tired of the elusiveness, I started prying for more details and crafting this post.

Whenever I come across content that could potentially improve my data science knowledge, I will update the list and slowly read my way through it. Updates will soon include free courses, blog posts, academic papers and Twitter threads as well. I hope that this can act as a digital bookshelf for those looking to break into the industry or improve their skills.

I'd love to hear any additions you would recommend in the comments below.

Applied data science

  • Data-driven Science and Engineering - Machine Learning, Dynamical Systems and Control (by) Steven L. Brunton, J. Nathan Kutz
  • Data-driven Modeling & Scientific Computation - Methods for Complex Systems & Big Data (by) J. Nathan Kutz
  • Forecasting: Principles and Practice (by) Rob J Hyndman
  • Introduction to Algorithmic Marketing - Artificial Intelligence for Marketing Operations (by) Ilya Katsov
  • Python Machine Learning - Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow (by) Sebastian Raschka & Vahid Mirjalili
  • Trustworthy Online Controlled Experiments - A Practical Guide to A/B Testing (by) Ron Kohavi, Diane Tang, Ya Xu

Data visualization

  • Envisioning Information (by) Edward R. Tufte
  • Good Charts (by) Scott Berinato
  • Guide to Information Graphics - The Dos & Don'ts of Presenting Data, Facts, and Figures (by) Dona M. Wong
  • Information is Beautiful (by) David McCandless
  • Knowledge is Beautiful (by) David McCandless
  • Show Me the Numbers - Designing Tables and Graphs to Enlighten (by) Stephen Few
  • Storytelling with Data - A Data Visualization Guide for Business Professionals (by) Cole Nussbaumer Knaflic 
  • The Visual Display of Quantitative Information (by) Edward R. Tufte

    Economics & causality

    • Bayesian Methods for Hackers - Probabilistic Programming and Bayesian Inference (by) Cameron Davidson-Pilon
    • Causal Inference - The Mixtape (by) Scott Cunningham
    • Causal Inference in Statistics: A Primer (by) Judea Pearl, Madelyn Glymour, Nicholas P. Jewell
    • Elements of Causal Inference - Foundations and Learning Algorithms (by) Jonas Peters, Dominik Janzing, Bernhard Schölkopf
    • Principles of Econometrics (by) R. Carter Hill, William E. Griffiths, Guay C. Lim
    • Mastering 'Metrics (by) Joshua D. Angrist, Jörn-Steffen Pischke
    • Mostly Harmless Econometrics (by) Joshua D. Angrist, Jörn-Steffen Pischke
    • The Book of Why - The New Science of Cause and Effect (by) Judea Pearl and Dana Mackenzie


    • Data Pipelines with Apache Airflow (by) Bas Harenslak, Julian de Ruiter
    • Python for Data Analysis - Data Wrangling with Pandas, NumPy, and IPython (by) Wes McKinney
    • Python in a Nutshell - A Desktop Quick Reference (by) Alex Martelli, Anna Ravenscroft, Steve Holden
    • The Art of Doing Science and Engineering - Learning to Learn (by) Richard W. Hamming
    • The Data Warehousing Toolkit - The Definitive Guide to Dimensional Modeling (by) Ralph Kimball, Margy Ross
    • The Elegant Puzzle - Systems of Engineering Management (by) Will Larson
    • [Free courses] Learn Analytics Engineering with dbt:

    Fun & entertaining

    • Birth of a Theorem - A Mathematical Adventure (by) Cédric Villani
    • Cribsheet - A Data-driven Guide to Better, More Relaxed Parenting, from Birth to Preschool (by) Emily Oster
    • Dataclysm - Love, Sex, Race and Identity, What our Online Lives Tell Us about our Offline Selves
    • Freakonomics - A Rogue Economist Explores the Hidden Side of Everything (by) Steven D. Levitt & Stephen J. Dubner
    • Scorecasting - The Hidden Influences behind how Sports are Played and Games are Won (by) Tobias J. Moskowitz & L. Jon Wertheim

      Leadership & business

      • Crucial Conversations - Tools for Talking When Stakes are High (by) Kerry Patterson, Joseph Grenny, Ron McMillan, Al Switzler
      • Lean in - Women, Work and the Will to Lead (by) Sheryl Sandberg
      • Scale - The Universal Laws of Growth, Innovation, Sustainability, and the Pace of Life in Organisms, Cities, Economies and Companies (by) Geoffrey West
      • Radical Candor - Be a Kick-ass Boss without Losing your Humanity (by) Kim Scott
      • Team Toplogies: Organizing Business and Technology Teams for Fast Flow (by) Manual Pais, Matthew Skelton
      • The Cold Start Problem - How to Start and Scale Network Effects (by) Andrew Chen
      • The Making of a Manager (by) Julie Zhuo
      • Thinking in Systems (by) Donella H. Meadows


        • Applied Predictive Modeling (by) Max Kuhn, Kjell Johnson
        • Bayesian Data Analysis (by) Andrew Gelmen, John B. Carlin, Hal S. Stern, David B. Dunson, Aki Vehtari, Donald B. Rubin
        • How to Lie with Statistics (by) Darrell Huff
        • Probability and Statistics for Engineering and the Sciences (by) Jay L. Devore
        • Regression Modeling Strategies - With Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis (by) Frank E. Harrell, Jr
        • Statistical Rethinking - A Bayesian Course with Examples in R and Stan (by) Richard McElreath
        • Statistics for Experiments - Design, Innovation and Discovery (by) George E. P. Box, J. Stuart Hunter, William G. Hunter

        1 comment

        • Great list! This really helps in building a structured approach to learning

          Akash Banerjee on

        Leave a comment

        Please note, comments must be approved before they are published