Eve the Analyst's Adventures in the Data Wonderland

Posted on Jan 23, 2019 in data science data security event

Recordings: Data Security Webinar Series for Protegrity

A couple of posts back I shared a sneak-peak of my webinar on data security for data monetization initiatives. That was one of many sessions we’ve ran for APAC as Protegrity entered that market in 2018. All sessions had been recorded and now I’m happy to say we’ve published excerpts of those videos on youtube. There are bits on Big Data protection, best security practices for analytical workflows, sessions on hybrid- and multi-cloud environments, as well as spotlights on specific technologies (AWS S3, OneDrive, Salesforce, Elastic Map Reduce, Tableau). The videos are 2- to 7-minute-long and go straight to the demo. I do sympathise with those of you that don’t have the time or patience to sit through a one hour webinar – I hope you like this compact format. Check out the playlist below and get to know the superpowers of Protegrity tech!

Continue reading →

Posted on Dec 22, 2018 in opinion

Professional sum up of 2018

I have many thoughts running through my head as I sit in my sun-clad apartment and look at the empty suitcase in front of me. There’s only a week left of 2018 and my flight to Poland is leaving in a couple of hours. Outside, the Andalusian bees go about their flower business. Unbeknownst to them the rest of Europe is scrubbing snow off their cars and porches. The set up is perfect to reflect on this year’s happenings.

Continue reading →

Posted on Oct 28, 2018 in data science event

Recording: Introduction to Hadoop for Glolent community meetup

Last Thursday evening I had the opportunity to talk about Hadoop at a Glolent Global Talent community virtual meetup. Glolent connects remote IT workers across the globe and facilitates skill-sharing sessions that any member can join or present at...

Continue reading →

Posted on Oct 28, 2018 in data science d3 geo tutorial

Making a Map in D3.js v.5

A pretty specific title, huh? The versioning is key in this map-making how-to. D3.js version 5 has gotten serious with the Promise class which resulted in some subtle syntax changes that proven big enough to cause confusion among the D3.js old dogs and the newcomers. This post guides you through creating a simple map in this specific version of the library. If you’d rather dive deeper into the art of making maps in D3 try the classic guides produced by Mike Bostock.

Continue reading →

Posted on Sep 1, 2018 in data science geo tutorial

R | Point-in-polygon, a mathematical cookie-cutter

Point-in-polygon is a textbook problem in geographical analysis: given a list of geocoordinates return those that fall within a boundary of an area. You could feed the algorithm a list of cities across the globe and it will recognise which of them belong to Sri Lanka and which to a completely random shape you drew on planet Earth. It applies to many scenarios: analyses that aren’t based on administrative boundaries, situations in which polygons change over time, or problems that aren’t geographical at all, like computer graphics. Not so long ago, I turned to point-in-polygon to generate a set of towns and villages to plot on a map of Poland from 1933. Such list has not been made available on the web and I wasn’t super keen on typing out thousands of locations. Instead, I used that mathematical cookie-cutter to extract only those locations from today’s Poland, Ukraine, Belarus, and Russia that were present within the interwar Poland boundaries. In this post I will show how to perform a point-in-polygon analysis in R and possibly automate a significant chunk of data preparation for map visualisations.

Continue reading →

Posted on Aug 23, 2018 in data science geo tutorial

Changing dataset projection with OGR2OGR

Previously we used OGR2OGR to extract a couple of features from a large geographical dataset. OGR2OGR can do so much more – today we’ll look at its reprojecting capabilities. Reprojection is a mathematical translation of a dataset’s coordinate reference system to another one, like Albers to Mercator. Sometimes the geographical data we receive has to be reprojected to conform to our other datasets before further use. My first encounter with mismatching projections came about during the works on my master thesis project. I struggled with a point-in-polygon function that was supposed to filter a set of points based on a geographical boundary and stubbornly just wouldn’t return anything. I soon found out that my map was digitalised in EPSG:3857 (AKA web mercator used by Google maps) projection and my village coordinates used WGS84 coordinate system. That’s how OGR2OGR and me met.

Continue reading →

Posted on Aug 19, 2018 in data science geo tutorial

Extracting countries from GeoJSON with OGR2OGR

More often than not geographical data visualisation is performed on a a single country or a cluster of countries rather than on all 195 of them. Just as typically, acquired datasets have more features than what’s needed for the analysis. While D3.js allows for filtering the datasets so that we have full control over the visualisation’s output, the size of original datasets can slow down your website load times. To reduce this impact, datasets can be cropped beforehand. This post will explain how to shrink a standard Eurostat geographical dataset to just a handful of countries with OGR2OGR.

Continue reading →

Posted on Aug 15, 2018 in data science d3 tutorial

D3.js v5: Promise syntax & examples

Here is my take on promises, the latest addition to D3.js syntax. The other week I attempted to brush up on my D3.js skills and got stuck at the most basic task of printing csv data on my html webpage. This is how I learnt that version 5 of D3.js substituted asynchronous callbacks with promises – and irreversibly changed the way we used to work with data sets. Getting your head around promises can take time, especially if you – like me – aren’t a JavaScript programming pro. In this post I’ll share my lessons learnt and provide some guidance for the ones lost in the world of promises.

Continue reading →

Posted on Jul 28, 2018 in data science tutorial

Automate the Boring Stuff with CMD

25 years after being introduced to the Windows’ toolkit, CMD still has it. This post collects a couple of every day file manipulation scenarios that can be accomplished with the command-line interpreter.

Windows’ command prompt is a command-line interface...

Continue reading →

Posted on Jun 4, 2018 in data science data security opinion

GDPR in 10 Steps: a Guide for Small Businesses

By now every business owner in Europe would have heard about GDPR: if it didn’t hit them on the news or through social circles, the swarm of pop-ups and emails announcing policy updates would have been telling enough. GDPR awareness might be mainstream, but it comes a tad too late to believe its practice is correspondingly widespread. Timing aside, putting GDPR to action proves confusing as the regulators provide little guidance in GDPR’s practical application. Among the most puzzled are small companies. GDPR dictates they bear the same responsibilities as governments or corporations, pressuring them to make do with less subject-matter knowledge and fewer budget for the lawyers to get their heads round the regulation.

This checklist summarises the principles behind GDPR from which each business can derive their data protection strategy. I should note that I am not a lawyer but a data security consultant: nevertheless it is my belief that abiding to these principles should guarantee that a business operates legally and securely.

Continue reading →