I’m just back from Scotland where – besides having some lovely time hanging out with friends – I attended the annual Lands of Loyal (LOL) Data Science conference in Alyth. The event is an informal get together for the Data Science & Business Intelligence graduates from the Dundee University. This time my talk was among those selected for the day: I decided to rework my last blog post on Personally Identifiable Data (and not!) to a 20-minute presentation and packed it with questions (some of which I cannot answer) in regard to our identity on the web.
Below is a short recap of the talk by Mary Whitehorn:
“We are data points: identity on the web and beyond, was a discussion of personal identifiable information (PII) from data that most of us cannot help but expose (name, address, etc) to geospatial data broadcast by our devices which not only indicates where we are but where we aren’t. Legal protection of PII only goes so far in preventing identification: given sufficient non-PII data, it may well be possible to identify an individual. Certain data we can change with varying degrees of effort, but some, like biometric data, we cannot. This thought-provoking talk was a great start to the day.”
Summary: Protected by law (*when there is a law) | Many faces of PII | Here be dragons: data outside the PII realm
There is a silver lining in the Cambridge Analytica + Facebook scandal in that it started a debate about our privacy rights online. Our virtual house was invaded: the government came in and took our identities away. Putting aside the question whether it was us who invited the aggressor*, today we will examine the core of the scandal: the idea of identity on the web. What is it exactly that bugs us about this case? What is it that we are standing for by deleting Facebook? To channel our outrage, let’s review what constitutes personal data in the light of law, what slipped regulation, and if our online footprint should have us worried.
The conversation around the Right to Explanation reminded me of Mandela Effect. Just as Mandela’s death is believed by many to have happened before his real time of death, Right to Explanation is falsely attributed to GDPR’s collection of laws. An offshoot from early GDPR conversations, the rule has now developed its own literature on the internet. Posts suggesting that the law threatens Artificial Intelligence have flooded Google (examples here, here, and here), while uncertainty-fueled paranoia has taken over LinkedIn. Is it misinformation spread on the internet in its finest or is there more to the discussion? I suggest we review what a Right to Explanation is and why an absent law is causing so much stir on the world wide web.
Oh, the every year’s Christmas epiphany: shopping for people we’d like to think we know is hard. To help those looking for a perfect gift for their befriended Data Scientist, or those wanting to indulge themselves with a good read, here is a roundup of books that would let no data geek down. The selection is subjective: I deliberately missed out the classics, and focused on less-obvious choices that are guaranteed to entertain and enlighten.
The democratisation of data visualisation tools brought us two major advancements: we can make great analytical products, faster. We can deceive easier, too.
Previously a domain ruled by statisticians and IT departments, analytics have now opened up to anyone with a laptop. For marketers and managers alone, BI apps such as Tableau, QlikView, or MS Excel have become a commodity. Tools have matured too: programming fluency was overruled by drag-and-drop interfaces. A visually stunning chart is literally a click away. The software intelligently picks the graph type and the colour scheme for us. For the more ambitious users the adjustment options are plenty, although within the range of pre-programmed configurations. While some of these visual endeavours lead to great analytical products, some result in colourful nonsense.
Summary: Business instinct | When sums add up | Data-driven decision patching
This is a story about companies who like aggregations a bit too much. Data-driven decision making seems to be the new holy grail in management, but can the numbers always be trusted? What is key in data-savvy businesses: the people, the right technology, or – spoiler alert – is it something more fundamental? These questions become particularly urgent in the new economy as failing to embrace data can be a major growth impediment or worse, a dead sentence to the business.
John Oliver goes John Oliver on forensic science and it’s a must watch. Forensic science brought statistics to courts, but itself it’s only rarely exposed to scientific scrutiny. Justice system’s failure to question forensics methodology and common fear of scientific jargon has led to convicting innocent people for crimes they haven’t commited. Courts have given life sentences solely based on bite marks, partial fingerprints, and hair collected at crime scene despite of no proof these methods are infallible.
Summary: Wanted: Data Scientist | A bird in the hand is worth two in the bush | A little stir | The infallible art of taking steps back
In this article I will look at how organisations can engineer their own Data Science team without loosing their mind in the process nor spending big money. As more and more companies want to be data-driven, they join the frantic search for the right staff to fuel these initiatives. Finding a data-fluent resource is not easy: according to McKinsey, “by 2018, the United States alone could face a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and analysts with the know-how to use the analysis of big data to make effective decisions.” The hunt is on for a skill set that is still relatively new to the market, and it is only starting to be taught at the universities. My belief is that fishing for a Data Scientist ‘superstar’ is often counterproductive and inevitably leads to a realisation that one person cannot do it all. Instead, investing in appropriate training of the current staff can lead to long-lasting benefits for the company.
Summary: Intro | Measuring the importance of gossip | Too good to be true | Arbitrary measures produce arbitrary results | tl;dr
Disclaimer: The probability computation of the article is a recap of a talk delivered by Professor Mark Whitehorn at the University of Dundee in 2015, and at PASS Business Analytics Conference in San Jose, CA in 2014. Opinions expressed are my own.
Aren’t we post-NPS hype yet? Such was my thinking until a random article came up on my feed: as one of its core objectives, a tech giant was planning to improve its Net Promoter Score by 2020. A quick internet search told me there are some companies very excited about increasing their NPS. Google Trends suggests the Net Promoter methodology is on the steady growth rate since 2004; a mortal blow to my presumption. There is something problematic about the Net Promoter methodology that I’d like to talk about: on one hand an indicator of an outstanding business delivery, on the other a possibly dangerous framework for workforce assessment. This article decomposes the NPS algorithm, reviews its criticism, and tests its validity from the probability perspective. I have based the scenario and the probability computation on an excellent talk delivered by professor Mark Whitehorn. If you happen to be a manager, a person whose performance is scored with NPS, you are into probability computations, or simply you like debunking managerial fads then this is a tale for you.
I will be at Oracle Code conference on the 6th of June in Brussels, redefining fun with Python on Oracle database. The idea is to live demo working with tables, handling JSON files, (new) spatial data queries, data visualisation, and a couple of good practices of database application development with Python.
There will be snakes, ridiculous maps, and misbehaving queries. Come if you’re in town, it’s gonna be #phun