Big Data: First Known When Lost by Ben Hunt, Salient Partners

Big Data: First Known When Lost

I never noticed it until
‘Twas gone – the narrow copse
Where now the woodman lops
The last of the willows with his bill

– Edward Thomas, “First Known When Lost” (1917)


Dave Bowman: Open the pod bay doors, HAL.
Hal: I’m sorry, Dave. I’m afraid I can’t do that.
Dave Bowman: What’s the problem?
Hal: I think you know what the problem is just as well as I do.
Dave Bowman: What are you talking about, HAL?
Hal: This mission is too important for me to allow you to jeopardize it.
Dave Bowman: I don’t know what you’re talking about, HAL.
Hal: I know that that you and Frank were planning to disconnect me, and I’m afraid that’s something I cannot allow to happen.
Dave Bowman: Where the hell did you get that idea, HAL?
Hal: Dave, although you took very thorough precautions in the pod against my hearing you, I could see your lips move.
Dave Bowman: Alright, HAL. I’ll go in through the emergency airlock.
Hal: Without your space helmet, Dave? You’re going to find that rather difficult.

Stanley Kubrick and Arthur C. Clarke, “2001: A Space Odyssey” (1968)

Any sufficiently advanced technology is indistinguishable from magic.
Arthur C. Clarke, “Hazards of Prophecy: The Failure of Imagination” (1962)


We kill people based on metadata.
Gen. Michael Hayden, former head of the NSA and CIA

In the future, everyone will be anonymous for 15 minutes.
Banksy (2006)

I don’t know why people are so keen to put the details of their private lives in public; they forget that invisibility is a superpower.
Banksy (2006)

Bene vixit, bene qui latuit. (To live well is to live concealed)
Ovid (43 BC – 18 AD)

The most sacred thing is to be able to shut your own door.
G.K. Chesterton (1874 – 1936)

Last Thursday the journal Science published an article by four MIT-affiliated data scientists (Sandy Pentland is in the group, and he’s a big name in these circles), titled “Unique in the shopping mall: On the reidentifiability of credit card metadata”. Sounds innocuous enough, but here’s the summary from the front page WSJ article describing the findings:

Researchers at the Massachusetts Institute of Technology, writing Thursday in the journal Science, analyzed anonymous credit-card transactions by 1.1 million people. Using a new analytic formula, they needed only four bits of secondary information—metadata such as location or timing—to identify the unique individual purchasing patterns of 90% of the people involved, even when the data were scrubbed of any names, account numbers or other obvious identifiers.

Still not sure what this means? It means that I don’t need your name and address, much less your social security number, to know who you ARE. With a trivial amount of transactional data I can figure out where you live, what you do, who you associate with, what you buy and what you sell. I don’t need to steal this data, and frankly I wouldn’t know what to do with your social security number even if I had it … it would just slow down my analysis. No, you give me everything I need just by living your very convenient life, where you’ve volunteered every bit of transactional information in the fine print of all of these wondrous services you’ve signed up for. And if there’s a bit more information I need – say, a device that records and transmits your driving habits – well, you’re only too happy to sell that to me for a few dollars off your insurance policy. After all, you’ve got nothing to hide. It’s free money!

Almost every investor I know believes that the tools of surveillance and Big Data are only used against the marginalized Other – terrorist “sympathizers” in Yemen, gang “associates” in Compton – but not us. Oh no, not us. And if those tools are trained on us, it’s only to promote “transparency” and weed out the bad guys lurking in our midst. Or maybe to suggest a movie we’d like to watch. What could possibly be wrong with that? I’ve written a lot (here, here, and here) about what’s wrong with that, about how the modern fetish with transparency, aided and abetted by technology and government, perverts the core small-l liberal institutions of markets and representative government.

It’s not that we’re complacent about our personal information. On the contrary, we are obsessed about the personal “keys” that are meaningful to humans – names, social security numbers, passwords and the like – and we spend billions of dollars and millions of hours every year to control those keys, to prevent them from falling into the wrong hands of other humans. But we willingly hand over a different set of keys to non-human hands without a second thought.

The problem is that our human brains are wired to think of data processing in human ways, and so we assume that computerized systems process data in these same human ways, albeit more quickly and more accurately. Our science fiction is filled with computer systems that are essentially god-like human brains, machines that can talk and “think” and manipulate physical objects, as if sentience in a human context is the pinnacle of data processing! This anthropomorphic bias drives me nuts, as it dampens both the sense of awe and the sense of danger we should be feeling at what already walks among us. It seems like everyone and his brother today are wringing their hands about AI and some impending “Singularity”, a moment of future doom where non-human intelligence achieves some human-esque sentience and decides in Matrix-like fashion to turn us into batteries or some such. Please. The Singularity is already here. Its name is Big Data.

Big Data is magic, in exactly the sense that Arthur C. Clarke wrote of sufficiently advanced technology. It’s magic in a way that thermonuclear bombs and television are not, because for all the complexity of these inventions they are driven by cause and effect relationships in the physical world that the human brain can process comfortably, physical world relationships that might not have existed on the African savanna 2,000,000 years ago but are understandable with the sensory and neural organs our ancestors evolved on that savanna. Big Data systems do not “see” the world as we do, with merely 3 dimensions of physical reality. Big Data systems are not social animals, evolved by nature and trained from birth to interpret all signals through a social lens. Big Data systems are sui generis, a way of perceiving the world that may have been invented by human ingenuity and can serve human interests, but are utterly non-human and profoundly not of this world.

A Big Data system couldn’t care less if it has your specific social security number or your specific account ID, because it’s not understanding who you are based on how you identify yourself to other humans. That’s the human bias here, that a Big Data system would try to predict our individual behavior based on

