On Big Data

Tank Green/ August 3, 2019/ Thoughts

I was given this domain name as a gift in 1999 and I promptly set about learning HTML in order to build the website. Finally, no one could tell me to shut up! I launched myself into the internet with abandon and in the early days, this website housed a very successful and regularly updated blog. That all feels like a very long time ago now and the internet I remember from then – before the rise of the social media giants and before newspapers had really clocked what was happening – feels like a much more varied place. (Does anyone remember that beautiful word-linkage website blather?)

At first I eagerly used all of the social media things as they were released: MySpace, Google/Gmail, Facebook, Twitter, Instagram and so forth. Sometimes I would get fatigued and want to unplug, but mainly I participated to a high degree and was glad of the opportunities these platforms provided. Especially because I was a wanderer and had lived in London, then Philadelphia, then rural France, and back again: this wonderful thing called the internet allowed me to keep in touch with people I dearly love.

But then Facebook introduced Timeline, and the sudden rigidity and permanence of utterances I had taken to be ephemeral was made clear. All those flippancies had fallen off the bottom of the page and into a vast reservoir of a past that was now laid bare for anyone to sift through and organise by date. I was now subject to an administration system and had become infinitely knowable, and in that possibility of the totality of knowing, I felt a horror. In that knowing, I felt a demand for a continuity and an inference of coherence which felt, somehow, antithetical to what it meant to be human. The ability for anyone to follow the trajectory of my life, from any point in time, back or forth, felt like a cage of permanence which inhibited my very human irrationalities. It felt like a curtailment of any possibility of redefinition, the cessation of an open-ended future of my choosing; a continuity with my past would now dictate all.

I suppose I can say that Facebook’s Timeline made me confront the existence of Big Data for the first time: my data, when stored with billions of other people’s constantly updating data, thereby rendered Big. And in that confrontation, a spontaneous resistance was invoked: I deleted my account. A few years later I came across Viktor Mayer-Schönberger’s wonderful book Delete, and proceeded to delete all my other social media accounts. I almost deleted this place too, but thankfully did not.

Nowadays I work in a Big Data research cluster at a world leading university and I’m daily immersed in the realities of it all. I’ll be honest: it’s alarming. It’s alarming not necessarily because of the nature of the researchers, but because of the capabilities of their skills when coupled with the digital fodder we provide them with minute by minute, hour by hour. You can now be tracked over the course of decades and between house moves by the linkage of datasets. Your age, gender and ethnicity is inferred from your name (arguably people do this informally anyway). Even your ‘anonymous’ social media accounts betray you with astonishing accuracy (96.7%) out of a group of 10,000 people.

Moreover, many data scientists lack any substantial theoretical knowledge of the subject areas in which they work. This can be extremely problematic when it comes to highly politicised areas such as policing or ‘segregation’ studies. For instance, this report (PDF) on predictive policing makes absolutely no mention of the problems with using historic police data – i.e. the acknowledged institutional racism endemic to them – to inform future policing efforts. The report does not discuss what it means by ‘crime’, nor make reference to the police’s co-production of ‘crime’. It also fails to point out that police data are not actually a true record of ‘crime’, but rather a record of police involvement with particular communities and individuals. The police therefore produce ‘crime’ and ‘criminals’ by virtue of those individuals and communities they choose to police.

Let me be clear: I am not opposed to Big Data. I do not think it is even possible to be rationally opposed to Big Data. Big Data are and will continue to be so; this is a fact that it would be irresponsible not to accept. What I am deeply concerned by, however, is the rise to power of Big Data companies and the lack of public knowledge and debate around the coercive control and unilateral surveillance they represent. Third sector organisations like Privacy International, Big Brother Watch and Liberty are doing great work, but I’m not convinced their message is reaching anyone other than the already converted. By which I mean, what can each of us concerned about the privacy implications of Big Data, for instance in respect of the deeply troubling use of facial recognition software, do to help amplify the voices of these organisations fighting for our civil rights?

As well as State surveillance, I am concerned by how little people understand what pictures of themselves are daily compiled by academic and business researchers linking smart energy meter readings (now we know when you’re in your house) to their smart TV, Amazon Alexa or Google Echo devices (now we know what you do when you’re at home) to their travel data (now we know where you go when you leave home) to their credit card purchase history and loyalty card data (now we know what shops and restaurants you go to when you get to your destination) until all of a self is known, predicted, and steered by an unknown and fundamentally unknowable observer, a social hypnotist for us all. (Note that I deliberately didn’t include social media data in that account.) I am also deeply unsettled by the retailer who links my previously faceless online purchase to my previously nameless in-store purchase, simply because I used the same credit card on each transaction.

What I oppose then, is the lack of informed consent over the use of our data (which fundamentally belongs at least 50% to us), and the cavalier attitudes of the technology companies, retailers, and researchers who daily mine our behaviours, personalities, and humanity for resale and reuse. I deeply oppose the invasion of our privacy that data sharing and data surveillance technologies and researchers represent. Our right to privacy is a fundamental human right but, as Liberty have noted, ‘there is a dangerous emerging narrative that requires us to justify our desire for privacy, rather than requiring that the state provides a sound legal basis for the infringement’. (Link to PDF.)

Personally, I’m not convinced that we should solely focus on the State’s infringement of our privacy, as I believe that the State’s power is waning in favour of the Big Data companies who now have the power of near-total surveillance that the State has always coveted, alongside enormous sums of money derived from the knowledge acquired through that surveillance. That said, there are still deeply troubling contemporary examples of how the State has illegally misused technology and data, especially to the detriment of already marginalised and discriminated against communities. For instance, Amnesty International was instrumental in bringing the Met’s misuse of mainly young black males’ data, in the form of a highly spurious ‘Gangs Matrix‘ (PDF), to the attention of the Information Commissioner’s Office who issued an enforcement notice. (Although this was something of a false victory since the Matrix was merely rebranded the ‘Concern Hub‘.)

My concerns then are over the ethics of Big Data reuse: their capacity for coercion, control, surveillance, and the deeply alarming privacy breaches that are daily becoming normalised. My concern is over the cavalier attitudes many data scientists have regarding our right to privacy: they do not seem to have grasped the lesson that just because they can, doesn’t mean they should. Much as with the early days of the internet when some people refused to respect or acknowledge the human being behind their online ‘handle’ on internet forums, it seems to me that many data scientists are now refusing to see the human beings behind the data they manipulate.

And we, the public, we are variously distracted and enthralled and bored by it all, and in the cracks and crevices between those moments of resignation, we are trading our capacity for independence and freedom for a soporific and spoon-fed predictive convenience. Please, wake up and listen!

General suggested readings:

Viktor Mayer-Schönberger’s Big Data. (If you don’t really understand what Big Data are, then start here.)
Viktor Mayer-Schönberger’s Delete.
Shoshana Zuboff‘s The Age of Surveillance Capitalism.
Tim Wu’s The Attention Merchants.
Yuval Noah Harari’s 21 Lessons for the 21st Century.
Virginia Eubanks‘ Automating Inequality. (Actually, this isn’t so great, but the topic is important and in the absence of anything better on this particular area of Big Data…)

Suggested privacy oriented tech:

ProtonMail for email.
DuckDuckGo for searching.
I’m currently trialing Brave browser but there is also Firefox.

Know that switching away from the Google machine often means downgrading convenience for privacy. For instance, the search function of ProtonMail isn’t anywhere near as good as Gmail. Why? Because they don’t scan your emails to know what you’re looking for. So your searches will only return hits on email addresses and subject lines. Similarly with DuckDuckGo, it often feels less useful at first, as it’s not learning from you as to what it thinks you want to find.

I’m also in the process of saving up to replace my devices with Apple ones. Apple are not perfect by a very long shot, but they are vastly better than Google/Android and Microsoft.