Static graphs are a big improvement over no graphs but we can all agree that static information is not particularly engaging. On the web there is no presenter to talk over a picture. It is the role of a visualisation to grab the reader’s attention and get its point across. Making a graph interactive is a good step towards increasing its understandability. This post in an addendum to the previous tutorial on how to make a line chart. It will explore two techniques of making the previous project interactive.
The time has come to step up our game and create a line chart from scratch. And not just any line chart: a multi-series graph that can accommodate any number of lines. Besides handling multiple lines, we will work with time and linear scales, axes, and labels – or rather, have them work for us. There is plenty to do, so I suggest you fire off your D3 server and let’s get cracking.
Or should I say more advanced than the construction from the previous post. This part of the tutorial will cover scales and axes. Let the fun begin!
This is actually happening! I’ve put myself together (the key to more time is less Netflix, people) and wrote up a couple of examples in D3.js version 5 (yes, version 5!) that should get people started in the transition over to the tricky number 5. The guide assumes that you have some basics in D3 (you have an idea about SVG, DOM, HTML, and CSS), or better yet that you come from an earlier version. In this chapter we’ll create a simple bar chart. The objectives of the day are: data upload from a csv, data format setup, and drawing the data. As basic as this! Next time we will tackle scales and grids.
Make sure to check out my library for more fun examples!
Spoiling you as usual, I have another exciting D3 example for today: merging historical maps! I’ve been meaning to cover this topic ever since I developed a similar project for my Master’s thesis 3 years ago. Merging maps is challenge-worthy for every D3 enthusiast as it requires a number of things to be aligned: the data format should be compatible with D3.js, the maps should be drawn in the same projection, and cover the same time period as country or regional boundaries are far from static. I will demonstrate the idea by mashing up two maps: a digitalised map of II Polish Republic from 1934 with European boundaries from 1939.
I’m happy to announce that more SVG fun is coming! I’ve been blown away by the stats on my previous D3-related posts and it really motivated me to keep going with this series. I’ve fell in love with D3.js for the way it transforms storytelling. I want to get better with advanced D3 graphics so I figured I will start by getting the basics right. So today you will see me doodling around with some basic SVG elements. The goal is to create a canvas and add onto it a rectangle, a line, and a radial shape.
This post demystifies one of the most feared vector functions available in D3.js: the radial line, or d3.radialLine(). Radial lines are constructed with only two attributes: an angle and a radius. The product of the function is a line, but unlike the basic line function, there are no x and y co-ordinates. I fundamentally misunderstood the radial line logic the first time I used it – in fact I had to bring in my boyfriend one late Thursday evening to help me get it right. This guide should help you avoid my mistakes.
A pretty specific title, huh? The versioning is key in this map-making how-to. D3.js version 5 has gotten serious with the Promise class which resulted in some subtle syntax changes that proven big enough to cause confusion among the D3.js old dogs and the newcomers. This post guides you through creating a simple map in this specific version of the library. If you’d rather dive deeper into the art of making maps in D3 try the classic guides produced by Mike Bostock.
Point-in-polygon is a textbook problem in geographical analysis: given a list of geocoordinates return those that fall within a boundary of an area. You could feed the algorithm a list of cities across the globe and it will recognise which of them belong to Sri Lanka and which to a completely random shape you drew on planet Earth. It applies to many scenarios: analyses that aren’t based on administrative boundaries, situations in which polygons change over time, or problems that aren’t geographical at all, like computer graphics. Not so long ago, I turned to point-in-polygon to generate a set of towns and villages to plot on a map of Poland from 1933. Such list has not been made available on the web and I wasn’t super keen on typing out thousands of locations. Instead, I used that mathematical cookie-cutter to extract only those locations from today’s Poland, Ukraine, Belarus, and Russia that were present within the interwar Poland boundaries. In this post I will show how to perform a point-in-polygon analysis in R and possibly automate a significant chunk of data preparation for map visualisations.
Previously we used OGR2OGR to extract a couple of features from a large geographical dataset. OGR2OGR can do so much more – today we’ll look at its reprojecting capabilities. Reprojection is a mathematical translation of a dataset’s coordinate reference system to another one, like Albers to Mercator. Sometimes the geographical data we receive has to be reprojected to conform to our other datasets before further use. My first encounter with mismatching projections came about during the works on my master thesis project. I struggled with a point-in-polygon function that was supposed to filter a set of points based on a geographical boundary and stubbornly just wouldn’t return anything. I soon found out that my map was digitalised in EPSG:3857 (AKA web mercator used by Google maps) projection and my village coordinates used WGS84 coordinate system. That’s how OGR2OGR and me met.
More often than not geographical data visualisation is performed on a a single country or a cluster of countries rather than on all 195 of them. Just as typically, acquired datasets have more features than what’s needed for the analysis. While D3.js allows for filtering the datasets so that we have full control over the visualisation’s output, the size of original datasets can slow down your website load times. To reduce this impact, datasets can be cropped beforehand. This post will explain how to shrink a standard Eurostat geographical dataset to just a handful of countries with OGR2OGR.
25 years after being introduced to the Windows’ toolkit, CMD still has it. This post collects a couple of every day file manipulation scenarios that can be accomplished with the command-line interpreter.
Windows’ command prompt is a command-line interface...
Summary: Intro | Obtaining the proxy server address | Configuring proxy settings – checklist | Other tools
*Fedora / RHEL / CentOS
Initially, I wanted to title this post as Welcome to Proxy Hell, because – at least at first – getting the proxy settings right on a VM can feel like a nightmare. Especially, if you have no idea about were to start or, more depressingly, when none of your attempts to fix the problem seem to be successful. Nearly inevitably, if working in an office, you have come across proxies. It has become a standard for companies to guard their network traffic with a proxy server. The idea is that the server acts as an intermediary between the private company network and the internet, which both hides the web traffic from the outside eyes and can serve as a base for implementing access authentication and bandwidth control.
Summary: Intro | Linux VM Setup | VM Networking | Extending a Hadoop Cluster
At times I wish I had started my journey with Big Data earlier so that I could enter the market in 2008-2009. Though Hadoopmania is still going strong in IT, these years were a gold era for Hadoop professionals. With any sort of Hadoop experience you could be considered for a £80,000 position. There was such shortage of Hadoop skills in the job market that even a complete beginner could land a wonderfully overpaid job. Today you can’t just wing it at the interview; the market has matured and there are many talented and qualified people pursuing careers in Big Data. That said, after years, the demand for Hadoop knowledge is still on the rise, making it a profitable career choice for the foreseeable future.
Summary: Intro | Virtualisation Software | Cloudera’s QuickStart VM | Importing a VM
In this post, I will introduce Virtual Machines: the core platform of every data scientist. If you, like me, get to experiment with different technologies at work, you are familiar with Virtual Machines. VMs are the best way of getting to test something out without having to install it on your computer and risking messing up your working environment. In its essence, a VM is like a mini (virtual!) computer you put on your computer; that computer has its own environment, like Windows, Linux or MacOS, and it would usually come with a bunch of pre-installed and configured tools, so that you don’t have too worry about any (or much) setup. So you might have a Windows machine installed on your actual Windows machine, and while these two share computing resources and space, they are separate instances of Windows. Plus, the virtual machine you can delete or change as you please, you can have many and, by definition, this has no impact on your original working environment.