List of articles published in the category

Articles in category Tutorial

Posted on Aug 8, 2020 in data science d3 tutorial

Handling Raster Map Backgrounds in D3.js

One of the most popular blog post on the blog is a tutorial on making a map in d3.js. I thought it’s time to revisit the post and propose some visual improvements to it, starting with realistic backgrounds. My all time favorite interactive article from the NYT is their 2014’s How the Air Campaign Against ISIS Grew. The subtle use of terrain formation in the map’s background not only makes the graphic more interesting visually, but places the depicted countries in a real geographical context. That’s a more fitting visual vehicle for geo-political events than plain, almost cartoonish, vector-based maps. I thought my original map would benefit from this treatment too: in the following sections I will take you through generating a raster terrain background for my base vector map.

Continue reading →

Posted on Nov 1, 2019 in d3 data science tutorial

Making an Interactive Line Chart in D3.js v.5

Static graphs are a big improvement over no graphs but we can all agree that static information is not particularly engaging. On the web there is no presenter to talk over a picture. It is the role of a visualisation to grab the reader’s attention and get its point across. Making a graph interactive is a good step towards increasing its understandability. This post in an addendum to the previous tutorial on how to make a line chart. It will explore two techniques of making the previous project interactive.

Continue reading →

Posted on Oct 28, 2019 in data science d3 tutorial

Making a Line Chart in D3.js v.5

The time has come to step up our game and create a line chart from scratch. And not just any line chart: a multi-series graph that can accommodate any number of lines. Besides handling multiple lines, we will work with time and linear scales, axes, and labels – or rather, have them work for us. There is plenty to do, so I suggest you fire off your D3 server and let’s get cracking.

Continue reading →

Posted on Oct 20, 2019 in data science d3 tutorial

Advanced Bar Chart in D3.js v.5

Or should I say more advanced than the construction from the previous post. This part of the tutorial will cover scales and axes. Let the fun begin!

Continue reading →

Posted on Oct 12, 2019 in data science d3 tutorial

Simple Bar Chart in D3.js v.5

This is actually happening! I’ve put myself together (the key to more time is less Netflix, people) and wrote up a couple of examples in D3.js version 5 (yes, version 5!) that should get people started in the transition over to the tricky number 5. The guide assumes that you have some basics in D3 (you have an idea about SVG, DOM, HTML, and CSS), or better yet that you come from an earlier version. In this chapter we’ll create a simple bar chart. The objectives of the day are: data upload from a csv, data format setup, and drawing the data. As basic as this! Next time we will tackle scales and grids.

Make sure to check out my library for more fun examples!

Continue reading →

Posted on Jul 8, 2019 in data science d3 geo tutorial

Merging Historical Maps in D3.js v.5

Spoiling you as usual, I have another exciting D3 example for today: merging historical maps! I’ve been meaning to cover this topic ever since I developed a similar project for my Master’s thesis 3 years ago. Merging maps is challenge-worthy for every D3 enthusiast as it requires a number of things to be aligned: the data format should be compatible with D3.js, the maps should be drawn in the same projection, and cover the same time period as country or regional boundaries are far from static. I will demonstrate the idea by mashing up two maps: a digitalised map of II Polish Republic from 1934 with European boundaries from 1939.

Continue reading →

Posted on Apr 19, 2019 in data science d3 tutorial

Creating fun shapes in D3.js

I’m happy to announce that more SVG fun is coming! I’ve been blown away by the stats on my previous D3-related posts and it really motivated me to keep going with this series. I’ve fell in love with D3.js for the way it transforms storytelling. I want to get better with advanced D3 graphics so I figured I will start by getting the basics right. So today you will see me doodling around with some basic SVG elements. The goal is to create a canvas and add onto it a rectangle, a line, and a radial shape.

Continue reading →

Posted on Apr 14, 2019 in data science d3 tutorial

Drawing radial shapes in D3.js

This post demystifies one of the most feared vector functions available in D3.js: the radial line, or d3.radialLine(). Radial lines are constructed with only two attributes: an angle and a radius. The product of the function is a line, but unlike the basic line function, there are no x and y co-ordinates. I fundamentally misunderstood the radial line logic the first time I used it – in fact I had to bring in my boyfriend one late Thursday evening to help me get it right. This guide should help you avoid my mistakes.

Continue reading →

Posted on Oct 28, 2018 in data science d3 geo tutorial

Making a Map in D3.js v.5

A pretty specific title, huh? The versioning is key in this map-making how-to. D3.js version 5 has gotten serious with the Promise class which resulted in some subtle syntax changes that proven big enough to cause confusion among the D3.js old dogs and the newcomers. This post guides you through creating a simple map in this specific version of the library. If you’d rather dive deeper into the art of making maps in D3 try the classic guides produced by Mike Bostock.

Continue reading →

Posted on Sep 1, 2018 in data science geo tutorial

R | Point-in-polygon, a mathematical cookie-cutter

Point-in-polygon is a textbook problem in geographical analysis: given a list of geocoordinates return those that fall within a boundary of an area. You could feed the algorithm a list of cities across the globe and it will recognise which of them belong to Sri Lanka and which to a completely random shape you drew on planet Earth. It applies to many scenarios: analyses that aren’t based on administrative boundaries, situations in which polygons change over time, or problems that aren’t geographical at all, like computer graphics. Not so long ago, I turned to point-in-polygon to generate a set of towns and villages to plot on a map of Poland from 1933. Such list has not been made available on the web and I wasn’t super keen on typing out thousands of locations. Instead, I used that mathematical cookie-cutter to extract only those locations from today’s Poland, Ukraine, Belarus, and Russia that were present within the interwar Poland boundaries. In this post I will show how to perform a point-in-polygon analysis in R and possibly automate a significant chunk of data preparation for map visualisations.

Continue reading →

Posted on Aug 23, 2018 in data science geo tutorial

Changing dataset projection with OGR2OGR

Previously we used OGR2OGR to extract a couple of features from a large geographical dataset. OGR2OGR can do so much more – today we’ll look at its reprojecting capabilities. Reprojection is a mathematical translation of a dataset’s coordinate reference system to another one, like Albers to Mercator. Sometimes the geographical data we receive has to be reprojected to conform to our other datasets before further use. My first encounter with mismatching projections came about during the works on my master thesis project. I struggled with a point-in-polygon function that was supposed to filter a set of points based on a geographical boundary and stubbornly just wouldn’t return anything. I soon found out that my map was digitalised in EPSG:3857 (AKA web mercator used by Google maps) projection and my village coordinates used WGS84 coordinate system. That’s how OGR2OGR and me met.

Continue reading →

Posted on Aug 19, 2018 in data science geo tutorial

Extracting countries from GeoJSON with OGR2OGR

More often than not geographical data visualisation is performed on a a single country or a cluster of countries rather than on all 195 of them. Just as typically, acquired datasets have more features than what’s needed for the analysis. While D3.js allows for filtering the datasets so that we have full control over the visualisation’s output, the size of original datasets can slow down your website load times. To reduce this impact, datasets can be cropped beforehand. This post will explain how to shrink a standard Eurostat geographical dataset to just a handful of countries with OGR2OGR.

Continue reading →

Posted on Aug 15, 2018 in data science d3 tutorial

D3.js v5: Promise syntax & examples

Here is my take on promises, the latest addition to D3.js syntax. The other week I attempted to brush up on my D3.js skills and got stuck at the most basic task of printing csv data on my html webpage. This is how I learnt that version 5 of D3.js substituted asynchronous callbacks with promises – and irreversibly changed the way we used to work with data sets. Getting your head around promises can take time, especially if you – like me – aren’t a JavaScript programming pro. In this post I’ll share my lessons learnt and provide some guidance for the ones lost in the world of promises.

Continue reading →

Posted on Jul 28, 2018 in data science tutorial

Automate the Boring Stuff with CMD

25 years after being introduced to the Windows’ toolkit, CMD still has it. This post collects a couple of every day file manipulation scenarios that can be accomplished with the command-line interpreter.

Windows’ command prompt is a command-line interface...

Continue reading →

Posted on Jan 20, 2017 in linux tutorial

One Magical Configuration Proved to Solve Every VMs Proxy Problems

Summary: Intro | Obtaining the proxy server address | Configuring proxy settings – checklist | Other tools

*Fedora / RHEL / CentOS

Initially, I wanted to title this post as Welcome to Proxy Hell, because – at least at first – getting the proxy settings right on a VM can feel like a nightmare. Especially, if you have no idea about were to start or, more depressingly, when none of your attempts to fix the problem seem to be successful. Nearly inevitably, if working in an office, you have come across proxies. It has become a standard for companies to guard their network traffic with a proxy server. The idea is that the server acts as an intermediary between the private company network and the internet, which both hides the web traffic from the outside eyes and can serve as a base for implementing access authentication and bandwidth control.

Continue reading →

Posted on Jan 15, 2017 in data science hadoop tutorial

Your First DIY Hadoop Cluster

Summary: Intro | Linux VM Setup | VM Networking | Extending a Hadoop Cluster

At times I wish I had started my journey with Big Data earlier so that I could enter the market in 2008-2009. Though Hadoopmania is still going strong in IT, these years were a gold era for Hadoop professionals. With any sort of Hadoop experience you could be considered for a £80,000 position. There was such shortage of Hadoop skills in the job market that even a complete beginner could land a wonderfully overpaid job. Today you can’t just wing it at the interview; the market has matured and there are many talented and qualified people pursuing careers in Big Data. That said, after years, the demand for Hadoop knowledge is still on the rise, making it a profitable career choice for the foreseeable future.

Continue reading →

Posted on Jan 8, 2017 in data science hadoop tutorial

My Computer AKA My First Big Data Machine

Summary: Intro | Virtualisation Software | Cloudera’s QuickStart VM | Importing a VM

In this post, I will introduce Virtual Machines: the core platform of every data scientist. If you, like me, get to experiment with different technologies at work, you are familiar with Virtual Machines. VMs are the best way of getting to test something out without having to install it on your computer and risking messing up your working environment. In its essence, a VM is like a mini (virtual!) computer you put on your computer; that computer has its own environment, like Windows, Linux or MacOS, and it would usually come with a bunch of pre-installed and configured tools, so that you don’t have too worry about any (or much) setup. So you might have a Windows machine installed on your actual Windows machine, and while these two share computing resources and space, they are separate instances of Windows. Plus, the virtual machine you can delete or change as you please, you can have many and, by definition, this has no impact on your original working environment.

Continue reading →