More often than not geographical data visualisation is performed on a a single country or a cluster of countries rather than on all 195 of them. Just as typically, acquired datasets have more features than what’s needed for the analysis. While D3.js allows for filtering the datasets so that we have full control over the visualisation’s output, the size of original datasets can slow down your website load times. To reduce this impact, datasets can be cropped beforehand. This post will explain how to shrink a standard Eurostat geographical dataset to just a handful of countries with OGR2OGR.Continue reading “Extracting countries from GeoJSON with OGR2OGR”
Protegrity webinar: Building secure analytical models – Registration Link
Wednesday 6th August, 10AM CET / 4PM Singapore time
The replay will available shortly after the session.
25 years after being introduced to the Windows’ toolkit, CMD still has it. This post collects a couple of every day file manipulation scenarios that can be accomplished with the command-line interpreter.
Windows’ command prompt is a command-line interface for file and process management on Windows. A big deal in the 90’s, today the tool is not overwhelmingly popular among data scientists, or any Windows user for that matter. But this old school tool still proves useful for basic file manipulation. It might not have the capabilities of, say, Python, but in a situation when you cannot use a programming language or you are looking for a challenge, CMD will always be there for you. Recently, I helped a friend automate some tedious copy and paste operations and reduced his workload by days. Our collaboration is documented below, along with some code snippets.
*The title is a reference to the awesome Automate the Boring Stuff with Python by Al Sweigart.
By now every business owner in Europe would have heard about GDPR: if it didn’t hit them on the news or through social circles, the swarm of pop-ups and emails announcing policy updates would have been telling enough. GDPR awareness might be mainstream, but it comes a tad too late to believe its practice is correspondingly widespread. Timing aside, putting GDPR to action proves confusing as the regulators provide little guidance in GDPR’s practical application. Among the most puzzled are small companies. GDPR dictates they bear the same responsibilities as governments or corporations, pressuring them to make do with less subject-matter knowledge and fewer budget for the lawyers to get their heads round the regulation.
This checklist summarises the principles behind GDPR from which each business can derive their data protection strategy. I should note that I am not a lawyer but a data security consultant: nevertheless it is my belief that abiding to these principles should guarantee that a business operates legally and securely.
Earlier this month I had a chance to speak with a Polish magazine Przegląd about today’s data economy, the marketing evolution over the last couple of decades: from database invention to machine learning, and how it all relates to Cambridge Analytica scandal from March. The article is available in Polish on Przegląd’s website (behind a paywall), loosely based on an article I previously published in English on the blog.
I’m just back from Scotland where – besides having some lovely time hanging out with friends – I attended the annual Lands of Loyal (LOL) Data Science conference in Alyth. The event is an informal get together for the Data Science & Business Intelligence graduates from the Dundee University. This time my talk was among those selected for the day: I decided to rework my last blog post on Personally Identifiable Data (and not!) to a 20-minute presentation and packed it with questions (some of which I cannot answer) in regard to our identity on the web.
Summary: Protected by law (*when there is a law) | Many faces of PII | Here be dragons: data outside the PII realm
There is a silver lining in the Cambridge Analytica + Facebook scandal in that it started a debate about our privacy rights online. Our virtual house was invaded: the government came in and took our identities away. Putting aside the question whether it was us who invited the aggressor*, today we will examine the core of the scandal: the idea of identity on the web. What is it exactly that bugs us about this case? What is it that we are standing for by deleting Facebook? To channel our outrage, let’s review what constitutes personal data in the light of law, what slipped regulation, and if our online footprint should have us worried.
* Watch this Level1 podcast to know the answer
The conversation around the Right to Explanation reminded me of Mandela Effect. Just as Mandela’s death is believed by many to have happened before his real time of death, Right to Explanation is falsely attributed to GDPR’s collection of laws. An offshoot from early GDPR conversations, the rule has now developed its own literature on the internet. Posts suggesting that the law threatens Artificial Intelligence have flooded Google (examples here, here, and here), while uncertainty-fueled paranoia has taken over LinkedIn. Is it misinformation spread on the internet in its finest or is there more to the discussion? I suggest we review what a Right to Explanation is and why an absent law is causing so much stir on the world wide web.
Oh, the every year’s Christmas epiphany: shopping for people we’d like to think we know is hard. To help those looking for a perfect gift for their befriended Data Scientist, or those wanting to indulge themselves with a good read, here is a roundup of books that would let no data geek down. The selection is subjective: I deliberately missed out the classics, and focused on less-obvious choices that are guaranteed to entertain and enlighten.