Author Seth Stephens-Davidowitz on what the internet can tell us about ourselves.

In this digitally connected era, all of us produce enormous numbers of data points every day. What we search. How we search it. What we buy, and what we read. What we like and dislike, whom we chose to associate with, and so much more — a steady stream of data that can be quantified, sifted and analyzed en masse with the data from everyone else to reveal patterns previously hidden, sometimes things we’re not even aware of about ourselves.

That data may offer us as a society a better way to truly understand who people really are, a theory that author Seth Stephens-Davidowitz submits for our consideration in his new book Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are. A former Google data scientist who is also a visiting lecturer at Wharton, Stephens-Davidowitz joined the [email protected] Show on Sirius XM channel 111 to talk about what properly analyzed big data can reveal about our political views, our health, our biases and more.

Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are

An edited transcript of the conversation follows.

[email protected]There’s not much doubt that our digital footprints say a lot about who we are, but I get the sense that people, to a degree, still scoff at the idea that so much can be gleaned from all of this information.

Seth Stephens-Davidowitz: Yes. Some people have this traditional notion of what data is. They think of it like a representative survey: You have clear questions with check boxes that people can answer very clearly. I think they get a little uncomfortable with the wild world of the internet, where data tends to be more unstructured and a little bit different than they’re used to.

[email protected]Does it feel like people still believe they have a higher level of data security than they really do?

Stephens-Davidowitz: I think there are definitely concerns about the power of big data. Because data is so predictive, companies can potentially use it to really take advantage of people. I talk about it in the book. One example is if you apply for a loan, companies can predict whether you’ll pay back the loan just based on the words you use in your loan application. For example, if you use the word “God” in a loan request, you’re 2.2 times more likely to default, 2.2 times more likely not to pay it back. So a company could save money by not giving loans to people who end their requests with “God bless you,” which is pretty scary.

[email protected]Throughout the book, you tackle some of the bigger issues that we have in society, like racism and child abuse. And there are all kinds of data points which will lean one way or another in these areas.

Stephens-Davidowitz: Right. There’s just so much information now from the web. And there are certain sources, such as Google, which I focus a lot on. People are just really honest and tell Google things they may not tell anyone else. So when it comes to really important areas like the ones you mentioned, we can get really new insights into who we are.

[email protected]One of the areas you look at is sex.

Stephens-Davidowitz: I like to say that big data is so powerful that it turned me into a sex expert, because it wasn’t a natural area of expertise for me. There’s obviously a lot of lying around sex because it’s an uncomfortable, taboo area. I think we can learn a lot more from Google searches about what people like.

[email protected]You also looked at racism; and talk about how racism actually surfaced more, not during the presidential race in 2008, but in the immediate aftermath of President Obama being elected.

“[If you] go by conventional wisdom, racism is considered a Southern issue…. If you look at the Google search data, which is more honest, you see many of the areas with the highest racism are Northern places.”

Stephens-Davidowitz: There is a disturbing element to this data. If, in general, people lie to make themselves look good, then we’re going to have an overly optimistic perception of who people are. But if we know the truth, in many areas, unfortunately, we’re going to learn darker things about people, and racism is one of the areas. It’s shocking. One of the most surprising things I found right away in this data was the shocking number of racist searches people make, basically looking for jokes mocking African-Americans. And yes, this was a big theme — really nasty searches about Obama as soon as he was elected.

[email protected]One of the long-held beliefs about this was that racism is more of a Southern phenomenon, but your data showed that is not necessarily the case.

Stephens-Davidowitz: Yes. If you ask in surveys or go by conventional wisdom, racism is considered a Southern issue. But I think that may be because in the South, there’s just less need to hide that racism. If you look at the Google search data, which is more honest, you see many of the areas with the highest racism are Northern places: western Pennsylvania, eastern Ohio, upstate New York, industrial Michigan. The real divide in racism these days is not South versus North, it’s East versus West.

[email protected]If people or companies were able to use this data in a more coherent, more effective manner, what do you think the impact in general would be for the country, or for society?

Stephens-Davidowitz: Well, there’s an optimistic scenario and a pessimistic scenario. I don’t know which one will come true. The pessimistic scenario is that companies would use this to take advantage of people, to get them to spend more money that they don’t have, or spend more time on their websites even though they don’t need to be on those websites. The optimistic scenario is that we would have insights into really, really important areas — health, racism, sexuality — and really learn how to improve society.

[email protected]The health angle of it is very interesting. The idea that we would be able to glean information that might lead to cures for diseases, or be able to take a more effective preventive approach, being able to catch diseases before they become worse — those things would have an incredible impact both on the people in this country, and also the economics surrounding health care.

Stephens-Davidowitz: Yes. In one of my favorite studies, they used search data and found people who made searches such as “just diagnosed with pancreatic cancer.” And you know when someone makes a search like that, they probably just got diagnosed with pancreatic cancer. Then you compare those people to similar people who never were diagnosed with pancreatic cancer, and you look in the prior months what symptoms were they searching for. And they found really, really subtle patterns that are predictors of eventually getting a pancreatic cancer diagnosis.

For example, if you searched “indigestion” followed by “abdominal pain,” that’s a risk factor in pancreatic cancer. Whereas searching “indigestion” by itself is not a risk factor. That’s

