Geek Christmas: 10 best books to entertain a Data Scientist

Joost Swarte for New Yorker, August 2015 (detail)

Oh, the every year’s Christmas epiphany: shopping for people we’d like to think we know is hard. To help those looking for a perfect gift for their befriended Data Scientist, or those wanting to indulge themselves with a good read, here is a roundup of books that would let no data geek down. The selection is subjective: I deliberately missed out the classics, and focused on less-obvious choices that are guaranteed to entertain and enlighten.

Continue reading “Geek Christmas: 10 best books to entertain a Data Scientist”

Colourful nonsense: what does your data visualisation actually say?

The democratisation of data visualisation tools brought us two major advancements: we can make great analytical products, faster. We can deceive easier, too.

Previously a domain ruled by statisticians and IT departments, analytics have now opened up to anyone with a laptop. For marketers and managers alone, BI apps such as Tableau, QlikView, or MS Excel have become a commodity. Tools have matured too: programming fluency was overruled by drag-and-drop interfaces. A visually stunning chart is literally a click away. The software intelligently picks the graph type and the colour scheme for us. For the more ambitious users the adjustment options are plenty, although within the range of pre-programmed configurations. While some of these visual endeavours lead to great analytical products, some result in colourful nonsense.

Continue reading “Colourful nonsense: what does your data visualisation actually say?”

About managers who love aggregations a bit too much

Summary: Business instinct | When sums add up | Data-driven decision patching 

This is a story about companies who like aggregations a bit too much. Data-driven decision making seems to be the new holy grail in management, but can the numbers always be trusted? What is key in data-savvy businesses: the people, the right technology, or – spoiler alert – is it something more fundamental? These questions become particularly urgent in the new economy as failing to embrace data can be a major growth impediment or worse, a dead sentence to the business.

Continue reading “About managers who love aggregations a bit too much”

Harmful statistics: John Oliver on forensic science

John Oliver goes John Oliver on forensic science and it’s a must watch. Forensic science brought statistics to courts, but itself it’s only rarely exposed to scientific scrutiny. Justice system’s failure to question forensics methodology and common fear of scientific jargon has led to convicting innocent people for crimes they haven’t commited. Courts have given life sentences solely based on bite marks, partial fingerprints, and hair collected at crime scene despite of no proof these methods are infallible.

Watch John Oliver’s Last Week Tonight episode below and make sure to read the excellent investigative article by Jordan Smith on the Intercept that the show was based on.


The search is long and the goal is elusive: Data Scientist (Desperately) Wanted

Summary: Wanted: Data Scientist | A bird in the hand is worth two in the bush | A little stir | The infallible art of taking steps back

In this article I will look at how organisations can engineer their own Data Science team without loosing their mind in the process nor spending big money. As more and more companies want to be data-driven, they join the frantic search for the right staff to fuel these initiatives. Finding a data-fluent resource is not easy: according to McKinsey, “by 2018, the United States alone could face a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and analysts with the know-how to use the analysis of big data to make effective decisions.” The hunt is on for a skill set that is still relatively new to the market, and it is only starting to be taught at the universities. My belief is that fishing for a Data Scientist ‘superstar’ is often counterproductive and inevitably leads to a realisation that one person cannot do it all. Instead, investing in appropriate training of the current staff can lead to long-lasting benefits for the company.

Continue reading “The search is long and the goal is elusive: Data Scientist (Desperately) Wanted”

Taking down the NPS Score: KO by Probability

Summary: Intro | Measuring the importance of gossip | Too good to be true | Arbitrary measures produce arbitrary results | tl;dr

Disclaimer: The probability computation of the article is a recap of a talk delivered by Professor Mark Whitehorn at the University of Dundee in 2015, and at PASS Business Analytics Conference in San Jose, CA in 2014. Opinions expressed are my own.

Aren’t we post-NPS hype yet? Such was my thinking until a random article came up on my feed: as one of its core objectives, a tech giant was planning to improve its Net Promoter Score by 2020. A quick internet search told me there are some companies very excited about increasing their NPS. Google Trends suggests the Net Promoter methodology is on the steady growth rate since 2004; a mortal blow to my presumption. There is something problematic about the Net Promoter methodology that I’d like to talk about: on one hand an indicator of an outstanding business delivery, on the other a possibly dangerous framework for workforce assessment. This article decomposes the NPS algorithm, reviews its criticism, and tests its validity from the probability perspective. I have based the scenario and the probability computation on an excellent talk delivered by professor Mark Whitehorn. If you happen to be a manager, a person whose performance is scored with NPS, you are into probability computations, or simply you like debunking managerial fads then this is a tale for you. 

Continue reading “Taking down the NPS Score: KO by Probability”

It’s gonna be #phun

I will be at Oracle Code conference on the 6th of June in Brussels, redefining fun with Python on Oracle database. The idea is to live demo working with tables, handling JSON files, (new) spatial data queries, data visualisation, and a couple of good practices of database application development with Python.

There will be snakes, ridiculous maps, and misbehaving queries. Come if you’re in town, it’s gonna be #phun


Butterfly effect: OECD’s data visualisation fail leads to media panic

Summary: Intro | A case of tl;dr | Where was the graph police? | A quick fix

This is a short story about a graph that could have been done better and an article that has gone awry. The Organisation for Economic Co-operation and Development (OECD), one of the most powerful research bodies there is, has published an excellent report on the influence of robotics on the job market: conclusion of which was misinterpreted in a Polish influential newspaper, Krytyka Polityczna, and in other media. In this post I will analyze both the article and the report, to then theorize on what has gone wrong and who (or what) is to blame.

Getting Philosophical About a Line Chart. Data Visualisation from Scratch P.3

Today’s “from scratch” example with D3 is a must-have element of any data visualisation portfolio: a line chart. Line charts are great of visualizing changes in data over time. Just as in the previous posts in the series, my visualisation is a variation of a piece of code I found on the web. I started with a basic template created by Mike Bostock and then re-worked some of its elements to boost its usability & readability. As with the previous examples, all code can be downloaded, reused, adjusted, and it scales up and down to include extra data series or to remove one.

Continue reading “Getting Philosophical About a Line Chart. Data Visualisation from Scratch P.3”