Butterfly effect: OECD’s data visualisation fail leads to media panic

Summary: Intro | A case of tl;dr | Where was the graph police? | A quick fix

This is a short story about a graph that could have been done better and an article that has gone awry. The Organisation for Economic Co-operation and Development (OECD), one of the most powerful research bodies there is, has published an excellent report on the influence of robotics on the job market: conclusion of which was misinterpreted in a Polish influential newspaper, Krytyka Polityczna, and in other media. In this post I will analyze both the article and the report, to then theorize on what has gone wrong and who (or what) is to blame.

Robot

You call it cute, but maybe its your future boss | Photo by Joseph Chan

The future of robotics is an immensely interesting subject. We are on a brink of our generation’s industrial revolution, a social and economic effect of which is unknown and dreaded by many. The article I’ve come across, “Robots are coming, jobs are going away” by Jaroslav Fiala, looks at the implications of robots in the job market. The article was originally written for a Czech portal, then translated and published in a popular Polish opinion magazine, Krytyka Polityczna. Naturally, I got excited by the click-bait title and read it (I’m native Polish).

A case of tl;dr

The article opens with the following statistic: According to the recent OECD study, developed countries with the highest risk of jobs going extinct are the Czech Republic and Slovakia (with Poland being right behind as #6). That means the citizens of these countries could lose their jobs in the near future (…)

Roughly translated from: Jak pokazuje niedawne badanie OECD, najwyższe ryzyko masowego zaniku miejsc pracy pośród krajów rozwiniętych występuje… w Czechach i na Słowacji [Polska jest tuż za podium – na 6. miejscu – przyp. red.]. Oznacza to, że ludzie mogą wkrótce stracić pracę (…)

It’s a scary prospect. Luckily, it’s not true, at least according to the linked OECD publication it isn’t. OECD researchers found that while many current jobs could be fully or partially automated (but not, technically speaking, made extinct), the situation in Slovakia and the Czech Republic is comparably better than in Spain or Germany. The report literally says that. Busted! Yet, I’d argue it’s unfair to dismiss the article as poorly researched. Although the author (and the original publishing website, aaand the Polish website…) certainly wasn’t (weren’t) thorough enough, I think it was easy to misinterpret the OECD publication. It has to do with how we humans read information (and yes, perhaps robotics can fix that too); in other words, it’s a classic case of tl;dr. The author has scanned the report, found the main graph, analysed it, and came to a wrong conclusion.

To illustrate my point, I’ve cut and pasted the graph from the report. Completely context-free, here it is:

Oecd

The risk of job loss because of automation is less substantial than sometimes claimed but many jobs will see radical change | Source: OECD

I recommend spending a minute to read the graph before continuing with the post.

Where was the graph police?

The OECD report’s authors make their case perfectly clear in writing. But someone has made some great harm to the graph illustrating their main finding. A number of things seem off about this visualisation:

The title. What does this caption actually say? “The risk of job loss because of automation is less substantial than sometimes claimed but many jobs will see radical change“. Spoiler: once you read it three times, it sinks in. This is not the main problem, but one of its roots: the graph’s caption says job loss risk, but the study talks about the degree of task automation (and so do, rightfully, the data series). Perhaps a less ambiguous title would help us tl;dr crowd.
Percentages everywhere. There are percentages on the Y axis, and percentages in the legend. The Y axis values correspond to the percentage of workers affected by task automation, while the values in the legend describe the degree of this automation. Yet these are not called degrees, they go by Change in tasks (50-70% risk) and Automatable (>70% risk). Couldn’t we just call them medium and high?
What’s the medium risk rate for Slovak Republic? 36% or 46%? You don’t know? It’s a stacked bar chart, so the values are added, ergo Slovakia has 36% of jobs facing a medium degree of automation. It’s statisticians’ insider knowledge; for anyone else this graph could just as well read 10% high risk, 46% medium risk because the graph gives us no guidance on how to decode it.
Unfortunate colour choices. What’s that greyish band in between the bars? Is it a shade? I’ll tell you: it’s a visual illusion. It’s a trick the light plays on us: white looks greyish surrounded by dark blue.
Data points ordering. This is why the graph has been so badly misread: the caption says job loss risk, the bars are in a descending order, so the intuitive conclusion is that the countries are listed from the highest degree of potential job loss in the future to the lowest. This why the Czech Republic and the Slovak Republic are singled out in the article. That conclusion is wrong: the bars represent all degrees of automation; only the grey part of the bar stands for the high risk. If ordered by the highest risk of automation, Germany, Austria, and Spain have much more to worry about. The Czech Republic and Slovakia have the highest rate of workers that will be affected by automation.

All these points taken separately are problematic; all together they are destructive for readability.

A quick fix

Enough complaining and pointing fingers. Here I propose a template that could have been a better visualisation for the dataset in question:

Risk is calulated as the percentage of tasks that are automatable. For jobs in high risk this is more than 70% of tasks, for jobs in medium more than 50%.

The bars have a separate axis each. Hovering over a bar displays it’s corresponding value. No ambiguity in the data series scale, no extra calculation effort. I’ve also simplified the chart’s name and the labels of the data series. The average (mean) is now distinguished from the rest of the series (did you notice it was there?) by it’s styling.

It feels horrible to pick on OECD because they do a fantastic job in general. The report is a great piece of research too. This case is not about their work, though: it is about the extend of responsibility high-profile institutions have for the content they produce. Look at the impact of this report: a reprint in a high-visibility opinion magazines; stirring concern for the wrong problem; possibly causing the public to resent/block innovation instead of embracing it. As OECD has great standards already, imagine what could be the scale of the problem for institutions with less rigorous communication principles (think: most government reports).

Post scriptum

I practice tl;dr daily (emails, Facebook feed, media), so I can sympathize. Yes, it is the role of a journalist to quadruple check their source of information, but mistakes like this can be justified. Yet, I fear some subtleties the OECD publication were simply ignored in the article: the author’s thesis was to scare, not to educate. The cited reports are not as pessimistic as what he makes of them. The OECD study presents a holistic vision of a workplace where some tasks are automated, and new job positions emerge. Even more, it states that “some estimates suggest that for each job created by the high-tech industry, around five additional, complementary jobs are created.” The other cited report indeed projects 47% of jobs in the US going into extinction in the next 20 years. However, the alarming finding of the report is not that the market will change, but that we might not be able cover the unemployment and wages gap fast enough as it was the case post-industrial revolution. If this journalism type does not qualify as fiction yet, should we call it #post-truth?

-Eve-

Follow @EveTheAnalyst