More often than not geographical data visualisation is performed on a a single country or a cluster of countries rather than on all 195 of them. Just as typically, acquired datasets have more features than what’s needed for the analysis. While D3.js allows for filtering the datasets so that we have full control over the visualisation’s output, the size of original datasets can slow down your website load times. To reduce this impact, datasets can be cropped beforehand. This post will explain how to shrink a standard Eurostat geographical dataset to just a handful of countries with OGR2OGR.
For my upcoming project I need a cluster of neighbouring European countries. I downloaded the world boundary data in GeoJSON from Eurostat. Eurostat is an official publishing body of geopolitical data and the datasets they offer come in different formats and in varying levels of detail. I chose a 1 to 10M crop that’s about 30MB – it matches the level of detail I was after. For comparison, scale of 1:1M is incredibly detailed but it takes over 300MB of disk space.
To extract the countries, we will use a command-line utility, OGR2OGR. There are probably similar libraries available in Python or R, but OGR2OGR has some prebuilt functions that make working with any sort of geographical data pretty simple. The University of California published a great guide of how to make OGR2OGR work on Windows. Beside downloading the right packages the installation involves setting up some environmental variables so it’s worth taking a look.
Once you get OGR2OGR working, selecting your set of countries can be done in one command.
Let’s extract Portugal. Open the command line, navigate to the folder with the original file, and run:
ogr2ogr -where FID='PT' pt.geojson CNTR_RG_10M_2016_4326.geojson
Where FID is the property of a feature (country) we’ll base our extract on, PT is the Portugal’s identifier in the dataset, pt.geojson is the output field, and CNTR_RG_10M_2016_4326.geojson is the original file from Eurostat.
Here we’ll extract Portugal and Spain. In CMD, run the following:
ogr2ogr -where "FID in ('PT','ES')" ptes.geojson CNTR_RG_10M_2016_4326.geojson
You’d notice the syntax is a tad different as the array of countries needs to be specified in the brackets. Geographical data analysis is fascinating but initial processing of datasets can be discouraging. Extracting a single or multiple features is one of the basic preparatory tasks and OGR2OGR is an excellent tool to ease that process.Follow @EveTheAnalyst