Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are is a nonfiction book by Seth Stephens-Davidowitz published in 2017. The book reflects on the fact that while every human being lies to other humans, they tend to be shockingly honest when searching the internet. Stephens-Davidowitz argues that this dataset yields important insights about how people really react to circumstances and deal with stress, compared to their official, public-facing responses.
Stephens-Davidowitz begins by discussing the 2016 presidential election in the United States, in which polls found Donald Trump to be a severe underdog right up to Election Day. Stephens-Davidowitz notes that clues to Trump’s victory were embedded in Big Data from the internet. He found his first clue that Big Data could reveal the lies that people tell during the previous election cycle, when Barack Obama became the first African American president and many pundits declared we were living in a post-racism society. Stephens-Davidowitz’s study of Google Trends led him to realize that people do not so much search on the internet as they confide in it, revealing what they’re really thinking and feeling. In the aftermath of Obama’s victory in 2008, racist internet queries and other Big Data spiked, indicating the opposite of pundits’ assumptions based on people’s public statements.
Returning to the 2016 election, Stephens-Davidowitz notes that there is a strong geographical correlation between areas of the country that voted for Donald Trump and internet queries involving racial epithets.
Stephens-Davidowitz next discusses data science, explaining that most of the methods employed in data science are very intuitive and make perfect sense even to laypeople—however, the results of those methods and how they are interpreted are frequently counterintuitive and unexpected. In other words, while our guts give us insight into how the world functions, we need data to confirm and refine those assumptions. As an example, Stephens-Davidowitz discusses the seemingly obvious assumption that the majority of pro basketball players in the NBA come from poor backgrounds. This is not supported by data.
Stephens-Davidowitz then discusses specific data studies that have been conducted and their often surprising results. He draws several conclusions from these studies. One, Big Data generated by the internet age is delivering wholly new types of data that did not exist in the past. Two, while people often lie in person, even when responding anonymously to polls and study questions, people do not lie when typing a search into Google. Three, Big Data allows for a very fine division of people based on incredibly small subsets. Finally, the huge amount of Big Data available means data scientists can perform a large number of casual experiments without having to seek funding or design complex studies. This makes using Big Data flexible and extremely fast.
Stephens-Davidowitz explores how these factors are changing data science. Take racehorses, which are typically purchased by the very wealthy who have traditionally relied on a horse’s pedigree—its family tree—to determine if there is a good chance the horse will be a winner. In 2013, however, a data scientist named Jeff Seder conducted an analysis of Triple Crown-winning horses and came away with a surprising conclusion: The size of a horse’s left ventricle was actually the determining factor in its ability to win races. Seder chose American Pharaoh as the best choice in that year’s auction, and the horse went on to win the Triple Crown.
Stephens-Davidowitz then discusses some of Big Data’s limitations, noting that it is easy to confuse variables and observations, and arrive at incorrect conclusions. He also digs into the ethical concerns surrounding Big Data, especially the fact that large corporations now control so much of that data and have access to insights about people that they did not have before, and will obviously use that knowledge to manipulate people. In the pre-Big Data age, Kodak insisted that the models in its advertisements always be smiling. In this way, it linked photographs to happy moments, manipulating people into taking many more photos than they had in the past. Stephens-Davidowitz then notes how much easier it might be for companies to engage in that sort of manipulation when they have access to Big Data.
Stephens-Davidowitz concludes the book with a warning. While Big Data is clearly a powerful tool that can be used to better understand our world and to improve our lives, it can also be used against us. As a result, we must be vigilant in watching how companies and political organizations—or even foreign governments, which are clearly hoovering up data for geopolitical advantage—use that data. We must also be aware of Big Data’s limitations so we are not fooled by false promises of perfect predictions and other problems.