We are data points: identity on the web post-Cambridge Analytica scandal

Summary: Protected by law (*when there is a law) | Many faces of PII | Here be dragons: data outside the PII realm

There is a silver lining in the Cambridge Analytica + Facebook scandal in that it started a debate about our privacy rights online. Our virtual house was invaded: the government came in and took our identities away. Putting aside the question whether it was us who invited the aggressor*, today we will examine the core of the scandal: the idea of identity on the web. What is it exactly that bugs us about this case? What is it that we are standing for by deleting Facebook? To channel our outrage, let’s review what constitutes personal data in the light of law, what slipped regulation, and if our online footprint should have us worried.

*Watch this Level1 podcast to know the answer

Protected by law (*when there is a law)

The most comprehensive definition of personal data is provided by information security under the label of Personally Identifiable Information. PII refers to information that can identify a specific person. This definition encompasses such pointers as any identification number (passport, social security number, driver’s license), location, contact details, face, fingerprint, and genetics. You will notice the concept of identifiable is more narrow than the idea of personal: our personality, behaviour, and beliefs (not necessarily religious) are not shielded by the PII umbrella.

In the shadow of PII hides its uglier brother: non-specific information that still points to a specific person. For example, a person’s gender is not particularly revealing as there is a high chance another couple of billion people share it. However, if you’re the only woman in a male-dominated workplace, even this nothing of an information will give your identity away.

PII is a prevalent nomenclature in data privacy regulations, however PII-centered legislation differs depending on each country’s interpretation of it. Not strictly PII personal data is often ignored or little guidance is provided for ensuring its protection. The Cambridge Analytica scandal (and previously exposed NSA practices) highlighted how weakly regulated the non-PII data is in the US. As Americans are waking up in the bed they made, UE citizens are cheering up for the upcoming General Data Protection Regulation. On the other side of the globe China is embracing PII and non-PII the way only China would: by experimenting with rating systems for its citizens. African countries, in majority lacking any regulation, have their citizen data freely exposed – and you bet they have their own Cambridge Analytica story.

Many faces of PII

The notion of protecting PII is present across global legislation, but in reality it’s not aggressively enforced. European Union’s GDPR is the strongest up to date attempt to control PII misuse by having every organisation that stores UE citizens’ data be it safe keeper. The firms will be fined should they fail to abide. This is a significant collective effort for Europeans as personal information comes in many forms: besides name, age, and bank information, companies have to secure all localisation information and any traits of biometric data.

PII leaks are the powerhouse of illicit impersonation acts: criminals use our stolen identity to apply for loans, access our bank accounts, or stalk us. Location data alone is a Pandora’s box in terms of information it can reveal. Imagine Apple leaked our daily routes meticulously stored on our iPhones. Where you go, live, work, and hang out becomes exposed. Worse, your absence can be predicted and your house potentially robbed. Location is not only the domain of GPS devices: every modern marketing effort would involve collecting customer localisation repository. These are not administrative regions that – due to being strictly regulatory – fail to represent the economic status and the interests of their population. Modern marketing regions are defined by sensors, beacons, IP translation, and Facebook check-ins: call it data-driven geography. Biometric data – data that uniquely distinguishes our biological traits – is similarly ubiquitous asset in contemporary information collection. The variety of entities mandated to do their security share range from voice-controlled devices, social media servers that keep our pictures and videos, to physical shop owners operating CCTVs.

Finally, a word about genomics and PII. The Human Genome Project achieved to sequence and map all of the genes, opening the doors to breathtaking advancement of molecular biology and medicine. Genomic data is unlike any other in the PII context: while other PII attributes can be changed, a DNA set is forever unique. Once it’s out in the public, it’s out; the harm is irreversible. It’s more challenging to regulate, too. Unlike other data types, it cannot be successfully anonymised or pseudonymised without losing its value. It needs to be available for analysis in its entirety. This calls for a need of additional, specialised regulation before genomic treatment becomes mainstream.

Here be dragons: data outside the PII realm

If PII is a regulatory headache, how should we handle the data outside the PII definition? Here be dragons: non-PII information (other than financial data) is mostly an unexplored territory by law. Modern legislation protects us only as far as our social security number, address, and genes go, leaving our behavioural data outside its scope. Stealing PII is criminal – stealing non-PII data is (let’s agree) ethically not cool, but nobody will be jailed for it.

We are what we do and marketers know it best. Netflix’s business model is based on behavioural data: a show’s length is optimised for its target audience so that we don’t lose interest; cliff hangers are dictated by conversion rates, rather than solely their artistic value. A web cookie follows us in our internet adventures as we feed it with our interests, shopping preferences, and average decision time – then it reports it on to data brokers. We continuously learn about the superiority of behavioural information to our reported interests. Netflix followed our clicks and knows we don’t love foreign movies and documentaries as much as we say we do. Christian Rudder’s Dataclysm dedicated chapters to our lies on the web, as exposed by the OKCupid data.

Each of these collections of clicks and behavioural patterns is an offshoot of a human profile, but its not specific enough to single out a person. It’s therefore – by definition – outside the PII paradigm. The collection and analysis of behavioural data is precisely the backbone of Cambridge Analytica data hoarding exercise: and we just learnt how much it hurts to have it leaked.

The data economy is built on our personal behavioural data. Our clicks (and lacks of clicks) are monetized, sold and re-sold, and analyzed by participants of that economy. As reported by Alec Ross in The Industries of the Future, “private companies now collect and sell 75,000 individual data points about the average American consumer.” Our data feeds marketing efforts and enables launching new services. It feels almost dehumanizing to see our lives quantified in bits. But paranoia aside, don’t we actually love it when Amazon suggests us great deals based on our interests?


Multidimensionality of personal data creates multiple issues: to us, to regulators, and to the whole economy. We are not only identified by our name and address, but also our biology and behaviour. We are repositories of what we think, what we do, and what we think but don’t do: and all of this is being uploaded to the internet. The Cambridge Analytica goof emphasized how little awareness we have about our online footprint, and how inadequate our laws are. Regulators need to move forward from the pre-internet mentality in which our thoughts and doings were known only to limited circles of family and friends. GDPR is a fantastic step towards a more conscious handling of our personal information that has a potential to inspire other countries. We have very little control over our data as we depend not only on what we share, but what our friends, services we use, doctors, workplaces, and governments manage to keep secret. We need laws to protect us.

Follow me on Twitter for more data stories!