Summary: Business instinct | When sums add up | Data-driven decision patching
This is a story about companies who like aggregations a bit too much. Data-driven decision making seems to be the new holy grail in management, but can the numbers always be trusted? What is key in data-savvy businesses: the people, the right technology, or – spoiler alert – is it something more fundamental? These questions become particularly urgent in the new economy as failing to embrace data can be a major growth impediment or worse, a dead sentence to the business.
Continue reading “About managers who love aggregations a bit too much”
John Oliver goes John Oliver on forensic science and it’s a must watch. Forensic science brought statistics to courts, but itself it’s only rarely exposed to scientific scrutiny. Justice system’s failure to question forensics methodology and common fear of scientific jargon has led to convicting innocent people for crimes they haven’t commited. Courts have given life sentences solely based on bite marks, partial fingerprints, and hair collected at crime scene despite of no proof these methods are infallible.
Watch John Oliver’s Last Week Tonight episode below and make sure to read the excellent investigative article by Jordan Smith on the Intercept that the show was based on.
Summary: Wanted: Data Scientist | A bird in the hand is worth two in the bush | A little stir | The infallible art of taking steps back
In this article I will look at how organisations can engineer their own Data Science team without loosing their mind in the process nor spending big money. As more and more companies want to be data-driven, they join the frantic search for the right staff to fuel these initiatives. Finding a data-fluent resource is not easy: according to McKinsey, “by 2018, the United States alone could face a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and analysts with the know-how to use the analysis of big data to make effective decisions.” The hunt is on for a skill set that is still relatively new to the market, and it is only starting to be taught at the universities. My belief is that fishing for a Data Scientist ‘superstar’ is often counterproductive and inevitably leads to a realisation that one person cannot do it all. Instead, investing in appropriate training of the current staff can lead to long-lasting benefits for the company.
Continue reading “The search is long and the goal is elusive: Data Scientist (Desperately) Wanted”
Summary: Intro | Measuring the importance of gossip | Too good to be true | Arbitrary measures produce arbitrary results | tl;dr
Disclaimer: The probability computation of the article is a recap of a talk delivered by Professor Mark Whitehorn at the University of Dundee in 2015, and at PASS Business Analytics Conference in San Jose, CA in 2014. Opinions expressed are my own.
Aren’t we post-NPS hype yet? Such was my thinking until a random article came up on my feed: as one of its core objectives, a tech giant was planning to improve its Net Promoter Score by 2020. A quick internet search told me there are some companies very excited about increasing their NPS. Google Trends suggests the Net Promoter methodology is on the steady growth rate since 2004; a mortal blow to my presumption. There is something problematic about the Net Promoter methodology that I’d like to talk about: on one hand an indicator of an outstanding business delivery, on the other a possibly dangerous framework for workforce assessment. This article decomposes the NPS algorithm, reviews its criticism, and tests its validity from the probability perspective. I have based the scenario and the probability computation on an excellent talk delivered by professor Mark Whitehorn. If you happen to be a manager, a person whose performance is scored with NPS, you are into probability computations, or simply you like debunking managerial fads then this is a tale for you.
Continue reading “Taking down the NPS Score: KO by Probability”
I will be at Oracle Code conference on the 6th of June in Brussels, redefining fun with Python on Oracle database. The idea is to live demo working with tables, handling JSON files, (new) spatial data queries, data visualisation, and a couple of good practices of database application development with Python.
There will be snakes, ridiculous maps, and misbehaving queries. Come if you’re in town, it’s gonna be #phun
It’s said that the majority of every analytical project’s is taken up by data preparation tasks. But is this estimation trustworthy?
Continue reading “Why is data preparation so hard and are we getting worse at it?”
Summary: Intro | A case of tl;dr | Where was the graph police? | A quick fix
This is a short story about a graph that could have been done better and an article that has gone awry. The Organisation for Economic Co-operation and Development
(OECD), one of the most powerful research bodies there is, has published an excellent report on the influence of robotics on the job market: conclusion of which was misinterpreted in a Polish influential newspaper, Krytyka Polityczna, and in other media. In this post I will analyze both the article and the report, to then theorize on what has gone wrong and who (or what) is to blame.
Today’s “from scratch” example with D3 is a must-have element of any data visualisation portfolio: a line chart. Line charts are great of visualizing changes in data over time. Just as in the previous posts in the series, my visualisation is a variation of a piece of code I found on the web. I started with a basic template created by Mike Bostock and then re-worked some of its elements to boost its usability & readability. As with the previous examples, all code can be downloaded, reused, adjusted, and it scales up and down to include extra data series or to remove one.
Continue reading “Getting Philosophical About a Line Chart. Data Visualisation from Scratch P.3”
Summary: Intro | A Simple Bar Chart | A Multi-Series Bar Chart
If this post was a painting, it would probably be one of Mark Ryden’s works: it seems I have just gone and done a one detailed blog post. The funny thing is that it’s about bar charts, and everything has already been said about bar charts. In fact a bar chart is a graph so simple, this post should never have been written: yet, the simpleness of a bar chart is actually it’s most dangerous trap. It’s very easy to overdo, and with so few elements it’s tempting to tweak or enhance at least some of them. So this blog post is, above all, about resistance. I will look at what – and why – constitutes as a good bar chart, what are the best practices, and how to fight the horror vacui of a simple plot. We will use D3.js and the blank canvas we have built with zero coding skills in the last post to create a reusable template of a simple bar graph, and then of a multi-series bar graph. This is part of a data visualisation with D3 series, throughout which we will create a set of graphics that can be easily re-purposed for data visualisation projects.
Continue reading “Stories from a Bar (Chart). Data Visualisation from Scratch P.2”
Summary: Intro | About D3.js | Initial Setup & Python Server | Canvas Setup
In the following series I will cover the basics of data visualisation. There are many data visualisation tools available (free & paid versions) on the market, so for an everyday analyst the knowledge of how to build graphs from scratch is not essential. However, most (if not all) of these pre-built tools fall short as soon as any customisation is required: it could be a graph type that is not supported, or the design that cannot be adjusted to follow the company branding guidelines. Therefore, there are cases when the knowledge of how to build something yourself is essential.
Continue reading “D3 Canvas Setup with 0 Coding Skills. Data Visualisation from Scratch P.1”