Geek Christmas: 10 best books to entertain a Data Scientist

Oh, the every year’s Christmas epiphany: shopping for people we’d like to think we know is hard. To help those looking for a perfect gift for their befriended Data Scientist, or those wanting to indulge themselves with a good read, here is a roundup of books that would let no data geek down. The selection is subjective: I deliberately missed out the classics, and focused on less-obvious choices that are guaranteed to entertain and enlighten.

In no particular order:

  1. Weapons of Math Destruction. How big data increases inequality and threatens democracy, Cathy O’Neil (2016) A fascinating read on the corporate and government applications of mathematical models. Cathy O’Neil talks black-box algorithms and built-in biases that affect our everyday lives: job, credit, and education-wise. A stimulating must-read for the modern times.
  2. Dataclysm. Who we are (When We Think No One’s Looking), Christian Rudder (2014) Christian Rudder, a co-founder of OK Cupid, mined the website’s data to find what love is: in numbers. At the same time scary and fun, it’s a data-packed account of our lies, unconscious preferences, and occasional racism in internet dating. To quote Nick Paumgarten from The New Yorker: “he doesn’t wring or clap his hands over the big-data phenomenon (see N.S.A., Google ads, that sneaky Fitbit) so much as plunge them into big data and attempt to pull strange creatures from the murky depths.”
  3. The Signal and the Noise. Why So Many Predictions Fail but Some Don’t, Nate Silver (2012) Named one of the 100 Most Influential People of 2009 by the Time, Nate Silver has raised to fame after correctly predicting the 2008 US election, and killing it in 2012 (by calling 50 out of 51 states). The Signal and the Noise is a deep dive into the world of applied probability. Challenging the traditional (“frequentist”) statistics, Silver makes a case for the complexity involved in event forecasting, supporting his observations with studies of poker, the financial crash, and climate change.
  4. Fooled by Randomness. The Hidden Role of Chance in Life and in the Markets, Nassim Nicholas Taleb (2005) In the shadow of its follow-up bestseller The Black Swan, Fooled by Randomness is a little masterpiece on its own. Nassim Nicholas Taleb will leave you disillusioned about our decision making abilities: we’re ridden with biases, and tend to see patterns in largely random collections of events. More often than we’d like to admit, it’s luck, not skill or hard work, that drives success. Normative sciences get bashed too, with this memorable quote: “normative economics is like religion without the aesthetics”.
  5. Statistics Done Wrong, Alex Reinhart (2015) A cautionary tale on applied statistics, that’s both shocking and educational. Statistical mistakes are common, and are not necessarily a domain of amateurs. Alex Reinhart discusses the typical errors – some motivated by cold-blooded deception – present in statistical analysis, published literature, and peer-reviewed papers. The book leaves you questioning everything you know, but not frustrated: Statistics Done Wrong is both a research evaluation framework and a guide to creating meaningful analyses.
  6. Outliers: The Story of Success, Malcolm Gladwell (2008) Malcolm Gladwell tells absorbing stories: Outliers is his, often dubbed as autobiographical, tale of fortune and misfortune. Looking at stories of major success, he pulls statistical evidence to examine their alleged, and real contributing factors. The picture that emerges is that intelligence, or even perseverance mean little compared to what wealth, social background, or even the month of birth can achieve. Perhaps not your typical statistical read, it’s an eye-opening view to encourage inclusiveness and fairness in products we create. It’s an unusual perspective, something like Nassim Nicholas Taleb meets Cathy O’Neil in a book.
  7. Show Me the Numbers: Designing Tables and Graphs to Enlighten, Stephen Few (2004) A must-have for those excited about telling stories with data. Information visualisation is a skill, and this book is a practical selection of great advice, awful ideas to avoid, and some pragmatism helpful in mastering that skill.
  8. Bad Science, Ben Goldacre (2008) Ben Goldarce’s work is a one-man’s crusade against pseudo-science. He examines evidence behind trends in healthcare: nutritionists, homeopathy, and alternative medicine to name a few. Goldarce cautions of misleading jargon and PR tricks we fell for and vouches for transparency in research. It main focus may be the health industry, but the mindset is universally applicable: as much now as it was 10 years ago. After all, we’ve just learnt that most of the water companies in the UK use the medieval practice of water dowsing despite no scientific evidence for its effectiveness.
  9. Moneyball: The Art of Winning an Unfair Game, Michael Lewis (2004) It could just as well be called Moneyball: Database Management in Real Life. A captivating story of geeks overtaking the baseball field (not literally). Statistics have turned around scouting methods in baseball: the whole industry started to study numbers, entirely dismissing the conventional wisdom it followed for decades. It’s a universal tale of science over sentiment that any geek will appreciate.
  10. Randomness, Deborah J. Bennett (1999) Funny story: when I ordered this book, Amazon sent me another one. Perhaps they thought I wanted something random so much, my wish deserved to be granted. Randomness is a great introductory book to probability: a careful study of chance through it’s role in the history of humanity, with detours to philosophy and religion, to its modern applications in technology. Unusual for the books of its kind, mathematical problems are explained with human tongue fluency.

Comment or tweet if you know of a book that would make a good addition to this list. My Christmas shopping card needs filling!

Post Scriptum: Reddit recommended I add to the list How Not to Be Wrong: The Power of Mathematical Thinking by Jordan Ellenberg and I purchased the new Michael Lewis book – The Undoing Project – on Kahneman and Tversky. Their Heuristics and Biases sits on my shelf barely touched as I keep switching focus onto other projects but one day I’ll get to it, I promise.

Oh, and I also subscribed to Scribd to read The Post-Truth Era by Ralph Keyes that seems to be the only place to read this book (out of print now) and not spend a fortune on it.