Point-in-polygon is a textbook problem in geographical analysis: given a list of geocoordinates return those that fall within a boundary of an area. You could feed the algorithm a list of all European cities and it will recognise which of them belong to Sri Lanka and which to a completely random shape you drew on planet Earth. It applies to many scenarios: analyses that aren’t based on administrative boundaries, situations in which polygons change over time, or problems that aren’t geographical at all, like computer graphics. Not so long ago, I turned to point-in-polygon to generate a set of towns and villages to plot on a map of Poland from 1933. Such list has not been made available on the web and I wasn’t super keen on typing out thousands of locations. Instead, I used that mathematical cookie-cutter to extract only those locations from today’s Poland, Ukraine, Belarus, and Russia that were present within the interwar Poland boundaries. In this post I will show how to perform a point-in-polygon analysis in R and possibly automate a significant chunk of data preparation for map visualisations.Continue reading “R | Point-in-polygon, a mathematical cookie-cutter”
More often than not geographical data visualisation is performed on a a single country or a cluster of countries rather than on all 195 of them. Just as typically, acquired datasets have more features than what’s needed for the analysis. While D3.js allows for filtering the datasets so that we have full control over the visualisation’s output, the size of original datasets can slow down your website load times. To reduce this impact, datasets can be cropped beforehand. This post will explain how to shrink a standard Eurostat geographical dataset to just a handful of countries with OGR2OGR.Continue reading “Extracting countries from GeoJSON with OGR2OGR”
Summary: Intro | Measuring the importance of gossip | Too good to be true | Arbitrary measures produce arbitrary results | tl;dr
Disclaimer: The probability computation of the article is a recap of a talk delivered by Professor Mark Whitehorn at the University of Dundee in 2015, and at PASS Business Analytics Conference in San Jose, CA in 2014. Opinions expressed are my own.
Aren’t we post-NPS hype yet? Such was my thinking until a random article came up on my feed: as one of its core objectives, a tech giant was planning to improve its Net Promoter Score by 2020. A quick internet search told me there are some companies very excited about increasing their NPS. Google Trends suggests the Net Promoter methodology is on the steady growth rate since 2004; a mortal blow to my presumption. There is something problematic about the Net Promoter methodology that I’d like to talk about: on one hand an indicator of an outstanding business delivery, on the other a possibly dangerous framework for workforce assessment. This article decomposes the NPS algorithm, reviews its criticism, and tests its validity from the probability perspective. I have based the scenario and the probability computation on an excellent talk delivered by professor Mark Whitehorn. If you happen to be a manager, a person whose performance is scored with NPS, you are into probability computations, or simply you like debunking managerial fads then this is a tale for you.
Summary: Intro | A case of tl;dr | Where was the graph police? | A quick fix
Today’s “from scratch” example with D3 is a must-have element of any data visualisation portfolio: a line chart. Line charts are great of visualizing changes in data over time. Just as in the previous posts in the series, my visualisation is a variation of a piece of code I found on the web. I started with a basic template created by Mike Bostock and then re-worked some of its elements to boost its usability & readability. As with the previous examples, all code can be downloaded, reused, adjusted, and it scales up and down to include extra data series or to remove one.
Summary: Intro | A Simple Bar Chart | A Multi-Series Bar Chart
If this post was a painting, it would probably be one of Mark Ryden’s works: it seems I have just gone and done a one detailed blog post. The funny thing is that it’s about bar charts, and everything has already been said about bar charts. In fact a bar chart is a graph so simple, this post should never have been written: yet, the simpleness of a bar chart is actually it’s most dangerous trap. It’s very easy to overdo, and with so few elements it’s tempting to tweak or enhance at least some of them. So this blog post is, above all, about resistance. I will look at what – and why – constitutes as a good bar chart, what are the best practices, and how to fight the horror vacui of a simple plot. We will use D3.js and the blank canvas we have built with zero coding skills in the last post to create a reusable template of a simple bar graph, and then of a multi-series bar graph. This is part of a data visualisation with D3 series, throughout which we will create a set of graphics that can be easily re-purposed for data visualisation projects.
Summary: Intro | About D3.js | Initial Setup & Python Server | Canvas Setup
In the following series I will cover the basics of data visualisation. There are many data visualisation tools available (free & paid versions) on the market, so for an everyday analyst the knowledge of how to build graphs from scratch is not essential. However, most (if not all) of these pre-built tools fall short as soon as any customisation is required: it could be a graph type that is not supported, or the design that cannot be adjusted to follow the company branding guidelines. Therefore, there are cases when the knowledge of how to build something yourself is essential.