I have many thoughts running through my head as I sit in my sun-clad apartment and look at the empty suitcase in front of me. There’s only a week left of 2018 and my flight to Poland is leaving in a couple of hours. Outside, the Andalusian bees go about their flower business. Unbeknownst to them the rest of Europe is scrubbing snow off their cars and porches. The set up is perfect to reflect on this year’s happenings.
By now every business owner in Europe would have heard about GDPR: if it didn’t hit them on the news or through social circles, the swarm of pop-ups and emails announcing policy updates would have been telling enough. GDPR awareness might be mainstream, but it comes a tad too late to believe its practice is correspondingly widespread. Timing aside, putting GDPR to action proves confusing as the regulators provide little guidance in GDPR’s practical application. Among the most puzzled are small companies. GDPR dictates they bear the same responsibilities as governments or corporations, pressuring them to make do with less subject-matter knowledge and fewer budget for the lawyers to get their heads round the regulation.
This checklist summarises the principles behind GDPR from which each business can derive their data protection strategy. I should note that I am not a lawyer but a data security consultant: nevertheless it is my belief that abiding to these principles should guarantee that a business operates legally and securely.
Earlier this month I had a chance to speak with a Polish magazine Przegląd about today’s data economy, the marketing evolution over the last couple of decades: from database invention to machine learning, and how it all relates to Cambridge Analytica scandal from March. The article is available in Polish on Przegląd’s website (behind a paywall), loosely based on an article I previously published in English on the blog.
Summary: Protected by law (*when there is a law) | Many faces of PII | Here be dragons: data outside the PII realm
There is a silver lining in the Cambridge Analytica + Facebook scandal in that it started a debate about our privacy rights online. Our virtual house was invaded: the government came in and took our identities away. Putting aside the question whether it was us who invited the aggressor*, today we will examine the core of the scandal: the idea of identity on the web. What is it exactly that bugs us about this case? What is it that we are standing for by deleting Facebook? To channel our outrage, let’s review what constitutes personal data in the light of law, what slipped regulation, and if our online footprint should have us worried.
The conversation around the Right to Explanation reminded me of the Mandela Effect. Just as Mandela’s death is believed by many to have happened before his real time of death, Right to Explanation is falsely attributed to GDPR’s collection of laws. An offshoot from early GDPR conversations, the rule has now developed its own literature on the internet. Posts suggesting that the law threatens Artificial Intelligence have flooded Google (examples here, here, and here), while uncertainty-fueled paranoia has taken over LinkedIn. Is it misinformation spread on the internet in its finest or is there more to the discussion? I suggest we review what a Right to Explanation is and why an absent law is causing so much stir on the world wide web.
Oh, the every year’s Christmas epiphany: shopping for people we’d like to think we know is hard. To help those looking for a perfect gift for their befriended Data Scientist, or those wanting to indulge themselves with a good read, here is a roundup of books that would let no data geek down. The selection is subjective: I deliberately missed out the classics, and focused on less-obvious choices that are guaranteed to entertain and enlighten.
The democratisation of data visualisation tools brought us two major advancements: we can make great analytical products, faster. We can deceive easier, too.
Previously a domain ruled by statisticians and IT departments, analytics have now opened up to anyone with a laptop. For marketers and managers alone, BI apps such as Tableau, QlikView, or MS Excel have become a commodity. Tools have matured too: programming fluency was overruled by drag-and-drop interfaces. A visually stunning chart is literally a click away. The software intelligently picks the graph type and the colour scheme for us. For the more ambitious users the adjustment options are plenty, although within the range of pre-programmed configurations. While some of these visual endeavours lead to great analytical products, some result in colourful nonsense.
Summary: Business instinct | When sums add up | Data-driven decision patching
This is a story about companies who like aggregations a bit too much. Data-driven decision making seems to be the new holy grail in management, but can the numbers always be trusted? What is key in data-savvy businesses: the people, the right technology, or – spoiler alert – is it something more fundamental? These questions become particularly urgent in the new economy as failing to embrace data can be a major growth impediment or worse, a dead sentence to the business.
John Oliver goes John Oliver on forensic science and it’s a must watch. Forensic science brought statistics to courts, but itself it’s only rarely exposed to scientific scrutiny. Justice system’s failure to question forensics methodology and common fear of scientific jargon has led to convicting innocent people for crimes they haven’t commited. Courts have given life sentences solely based on bite marks, partial fingerprints, and hair collected at crime scene despite of no proof these methods are infallible.
Summary: Wanted: Data Scientist | A bird in the hand is worth two in the bush | A little stir | The infallible art of taking steps back
In this article I will look at how organisations can engineer their own Data Science team without loosing their mind in the process nor spending big money. As more and more companies want to be data-driven, they join the frantic search for the right staff to fuel these initiatives. Finding a data-fluent resource is not easy: according to McKinsey, “by 2018, the United States alone could face a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and analysts with the know-how to use the analysis of big data to make effective decisions.” The hunt is on for a skill set that is still relatively new to the market, and it is only starting to be taught at the universities. My belief is that fishing for a Data Scientist ‘superstar’ is often counterproductive and inevitably leads to a realisation that one person cannot do it all. Instead, investing in appropriate training of the current staff can lead to long-lasting benefits for the company.
Summary: Intro | Measuring the importance of gossip | Too good to be true | Arbitrary measures produce arbitrary results | tl;dr
Disclaimer: The probability computation of the article is a recap of a talk delivered by Professor Mark Whitehorn at the University of Dundee in 2015, and at PASS Business Analytics Conference in San Jose, CA in 2014. Opinions expressed are my own.
Aren’t we post-NPS hype yet? Such was my thinking until a random article came up on my feed: as one of its core objectives, a tech giant was planning to improve its Net Promoter Score by 2020. A quick internet search told me there are some companies very excited about increasing their NPS. Google Trends suggests the Net Promoter methodology is on the steady growth rate since 2004; a mortal blow to my presumption. There is something problematic about the Net Promoter methodology that I’d like to talk about: on one hand an indicator of an outstanding business delivery, on the other a possibly dangerous framework for workforce assessment. This article decomposes the NPS algorithm, reviews its criticism, and tests its validity from the probability perspective. I have based the scenario and the probability computation on an excellent talk delivered by professor Mark Whitehorn. If you happen to be a manager, a person whose performance is scored with NPS, you are into probability computations, or simply you like debunking managerial fads then this is a tale for you.
It’s said that the majority of every analytical project’s is taken up by data preparation tasks. But is this estimation trustworthy?
Summary: Intro | A case of tl;dr | Where was the graph police? | A quick fix
This is a short story about a graph that could have been done better and an article that has gone awry. The Organisation for Economic Co-operation and Development (OECD), one of the most powerful research bodies there is, has published an excellent report on the influence of robotics on the job market: conclusion of which was misinterpreted in a Polish influential newspaper, Krytyka Polityczna, and in other media. In this post I will analyze both the article and the report, to then theorize on what has gone wrong and who (or what) is to blame.
A traditional first post, then!
I want this space to become a journal of my wanderings in the world of data analysis. While I’ve been working as a consultant for a few years now, there is a magnitude of topics I have never tackled and technologies I know nothing about. The idea for this learning space is to tackle a different problem every week-two weeks.
I am no writer, and my previous experience is in creating functional specification where every term had to be precise, and the sentences kept short and simple. In my Business Analyst beginnings the documentation I produced was poor, but with time I saw my writing quality go up. With that in mind I think it will be a learning curve to learn ‘blogging’ (but I still keep the positive outlook).