Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are

by Seth Stephens-Davidowitz

On This Page

Description

A former Google data scientist presents an insider's look at what the vast, instantly available amounts of information from the Internet can reveal about human civilization and society.

Tags

Recommendations

Member Recommendations

alco261 Everybody Lies leans a bit optimistic, Weapons of Math Destruction leans a bit pessimistic - together they do a great job of providing a balanced understanding of big data issues

Member Reviews

51 reviews
In Everybody Lies, Seth Stephens-Davidowitz explores the idea behind social desirability bias and how internet searches are helping Big Data paint a clearer picture about society. In short:

Many people under-report embarrassing behaviors and thoughts on surveys. They want to look good, even though most surveys are anonymous.

Stephens-Davidowitz posits that while people may lie to anonymous surveys they tend to type their true feelings and intentions into Google searches. It is this vast sum of new data that will allow researchers to make better predictions and offers brand new tools to allow insight all aspects of human behavior that direct questioning never could. It's a fascinating idea and the book provides plenty of food for thought. show more

The new age of Big Data is starting to show how wrong many of our assumptions about society are. How Google searches predicted Donald Trump's victory to common body anxieties to why people root for specific sports teams to the value of attending an elite high school to zooming in on health data and how it could change the way we receive care. It's eerie and a bit creepy when you stop and think about what people type into an internet search box, how much of that data is being captured and just what that data is starting to say about society. On the flip side, the author notes that Big Data has many pitfalls and it's a fairly new science that is still in its infancy.
show less
Qué libro tan fantástico. Es un Freakonomics pero con estudios de Big Data. El autor se hace mil preguntas y las responde usando big data, principalmente mediante el análisis detallado de las búsquedas que la gente hace en Google (y en otros sitios, como PornHub).
El origen del libro es fantástico:
[Nathan] Silver found that the single factor that best correlated with Donald Trump’s support in the Republican primaries was that measure I had discovered four years earlier. Areas that supported Trump in the largest numbers were those that made the most Google searches for “nigger.”


Lo cual le lleva al autor a hacer una declaración de intenciones:

I am now convinced that Google searches are the most important dataset ever
show more
collected on the human psyche. [...] In fact, at the risk of sounding grandiose, I have come to believe that the new data increasingly available in our digital age will radically expand our understanding of humankind.


El autor nos lleva de viaje por un montón de temas interesantes. Uno de los que más me ha gustado es la vieja, viejísima regunta: ¿Qué porcentaje de la población es gay? Un viejo estudio de los años 60-70 decía que una de cada 10 personas es gay. Pero el autor lo reduce a 1 de cada 20 y explica tan bien como ha llegado a esa conclusión que me ha convencido del todo, es una cifra que parecía imposible de saber con precisión y el tío va y lo hace.

Hay muchos datos, como nuestros likes y preferencias en Facebook, que también sirven para cosas, pero no para todo. NUestras búsquedas en Google son siempre sinceras. Nuestros likes de Facebook no. De dos revistas con la misma tirada, una de cotilleos y otra de literatura, en FB la de literatura tenía más del doble de likes que la de cotilleos. Lo mismo pasa con las encuestas:
Many people underreport embarrassing behaviors and thoughts on surveys. They want to look good, even though most surveys are anonymous. This is called social desirability bias.


Más datos interesantes:
After making their decision—either to reproduce (or adopt) or not—people sometimes confess to Google that they rue their choice. This may come as something of a shock but post-decision, the numbers are reversed. Adults with children are 3.6 times more likely to tell Google they regret their decision than are adults without children.

About 28 percent of girls are overweight, while 35 percent of boys are. Even though scales measure more overweight boys than girls, parents see—or worry about—overweight girls much more frequently than overweight boys.

On weekends with a popular violent movie, the economists found, crime dropped.

Students who were taught fractions via a game tested worse than those who learned fractions in a more standard way.


El autor usa el big data para la autoayuda, al hablar de que no debenmos fiarnos de todo lo que la gente pone en Instagram:
In fact, I think Big Data can give a twenty-first-century update to a famous self-help quote: “Never compare your insides to everyone else’s outsides.” A Big Data update may be: “Never compare your Google searches to everyone else’s social media posts.”


También suelta perlas de humor:
February 27, 2000, started as an ordinary day on Google’s Mountain View campus. The sun was shining, the bikers were pedaling, the masseuses were massaging, the employees were hydrating with cucumber water.


Y muestra signos de profundidad filosófica:
Milan Kundera, the Czech-born writer, has a pithy quote about this in his novel The Unbearable Lightness of Being: “Human life occurs only once, and the reason we cannot determine which of our decisions are good and which bad is that in a given situation we can make only one decision; we are not granted a second, third or fourth life in which to compare various decisions.”



Hay muchas muchas cosas más. ¿Qué diferencias en la vida podemos esperar entre el último admitido y el primer no admitido a una escuela de prestigio? ¿Qué palabras en una petición de un crédito son claro indicador de que la persona es menos proclive a devolverlo? ¿Es lícito usar este conocimiento para denegar créditos?

El libro sigue y sigue. Tiene un montón de notas al pie, puestas todas al final (casi un tercio del libro [!!]) y acaba con un alegato a favor del big data:

The days of academics devoting months to recruiting a small number of undergrads to perform a single test will come to an end. Instead, academics will utilize digital data to test a few hundred or a few thousand ideas in just a few seconds. We’ll be able to learn a lot more in a lot less time. [...] How do ideas spread? How do new words form? How do words disappear? How do jokes form?


Interesantísimo. Divertido. Instructivo. Fantástico. Imprescindible.
show less
Big Data has become the kind of cliché you normally associate with Dilbert cartoons (the insights! the disruption! the leveraged synergies! - just add Big Data!), so it's nice to see someone drag the buzzword out of its reputational bog, clean it up for an audience, and show off its practical uses. While marketing departments and academics have always hungered for data to draw inferences from, the simultaneous rise of large data sets, cheap processing power, advanced statistical techniques, and readily available consumer data has created a seemingly endless new frontier in analyzing previously opaque human behavior. Survey data is notoriously unreliable: for example, the average number of self-reported sexual partners is famously show more higher for straight men than for straight women even though mathematically they must be equal (since everyone must go home with someone, this is charmingly known as the "high school prom theorem"). But since data can be collected in other ways that are harder to fake, like search history, browsing behavior, click rates, or app usage, an intelligent researcher can cut through the noise to shed new light on these problems. He comes up with all kinds of insights, from the discovery that lesbian porn is surprisingly popular with straight women, to the recommendation that wives should spend more time wondering if their husbands are alcoholic and less if they're gay, to the depressing conclusion that child abuse is going increasingly unreported. Under no circumstances should mindless number-crunching take the place of rational thought, but Big Data used properly is a valuable new tool to learn about ourselves, so this book comes off like a humbler yet more useful Freakonomics. show less
Fascinating and also a little horrifying. Important read to understand how data is used to influence behaviors. I appreciated that Stephen-Davidowitz acknowledged the myriad ethical implications to consider when using and collecting data. If anything, this book reaffirmed my position that STEM careers cannot exist in a vacuum--we need the humanities alongside STEM to remind us that while data might help us make sense of our world, we aren't robots. Nuance and ethics are still important to our survival as a species.
Absolutely delightful journey through the insights big data affords, particularly when contrasted to the long tradition of survey-based research (oh yeah, like we all tell the truth when people ask us questions) using small samples (of college students). Stephens-Davidowitz clearly loves his work, and wants us to love the insights enormous troves of data can give about everything. Admittedly, his desire for Everybody Lies to be the Freakanomics of his age is, oh, grandiose. But you know . . . .
Believe the hype. This is not a perfect book, but it's fun, enlightening, ground-breaking, and important. Too many people don't know the potential power of the new methodologies of data analytics, and too few ppl who think they do know that power don't know the limitations. SethSD does, and he shares a lot of what he knows with us.

This is good science for arm-chair science consumers like me, and a good read for those who just like to dabble in non-fiction. It's both concise and rich. Documented with notes, and index, and the author's own website which he promises has lots more hard info.

It may turn out to be a four-star book as more on the topic get published. But right now I urge everyone to read it. Next, I do hope to read Seth's show more next book, and more on the subject. Yes, Seth, I did read right to the end, and still I'm glad you didn't keep struggling to say anything for the ages in your conclusion... imo, you ended it perfectly.

On a personal note, one of the key points from the intro. and one of the key points from the conclusion are amazingly relevant. Here's the thing. Our youngest is looking for a school to transfer up to, at the same time we're looking for our first post-retirement community. We're hoping to find a college & town all three of us would like, and a particular field of study for our kid. In the beginning of this book are two maps, one that reveals Trump supporters, and one that reveals pockets of closet racists as exposed by their Google searches)... which is obviously relevant data for us as we choose what part of the country to move to. And at the end of the book, Seth tells my geeky son what studies to focus on:

"I hope there is some young person reading this right now who is a bit confused on what she wants to do with her life. If you have a bit of statistical skill, an abundance of creativity, and curiosity, enter the data analytics business."

(Well, my young person has been listening to me read bits from the book, but otherwise that could have been directly tailored for him.)

Read the book. Don't be fooled by my long review; I'm only sharing a bit of what I learned from it.

Other book darts:

"[P]laces with the highest racist search rates included upstate New York, western Pennsylvania, eastern Ohio, industrial Michigan and rural Illinois, along with West Virginia... The true divide... was not South versus North; it was East versus West. You don't get this sort of thing much west of the Mississippi. And racism was not limited to Republicans...."

The 4 powers of Big Data can be summarized:
"Offering up new types of data..."
"Providing honest data..."
"Allowing us to zoom in on small subsets of people..."
"Allowing us to do many causal experiments...."

Now we get to an example of what is not perfect about the book. First, context: Seth is a careful scientist; he knows about sampling errors, biases, correlation not equaling causation, etc. However, sometimes he forgets about alternative explanations and interpretations. That is to say, when the book shows us data, it's fine, but sometimes when Seth interprets the data, he gets trapped by a fallacy. Eg, he says, "[O]f the minority of women who visit PornHub, there is a (25%) subset who search... for rape imagery... sometimes people have fantasies they wish they didn't have and which they may never mention to others." Maybe... or maybe they're victims trying to process, or maybe they're wannabee authors doing research, or they're men lying to present as female.... It looks to me like Seth didn't want to think too hard about this one....

Big data allows researchers to zoom in on subsets of demographic groups, and geographical regions.... "But another huge--and still growing--advantage of data from the internet is that is easy to collect data from around the world.... And data scientists get an opportunity to tiptoe into anthropology."

Big data could really help in the field of healthcare. When I'm done here I'm going to check out the site PatientsLikeMe.com. "Heywood hopes that you can find people of your age and gender, with your history, reporting symptoms similar to yours--and see what has worked for them."

I also want to consider reading [b:Irresistible: The Rise of Addictive Technology and the Business of Keeping Us Hooked|30962055|Irresistible The Rise of Addictive Technology and the Business of Keeping Us Hooked|Adam Alter|https://images.gr-assets.com/books/1479719623s/30962055.jpg|51577230] and [b:Super Crunchers: Why Thinking-By-Numbers Is the New Way to Be Smart|1081413|Super Crunchers Why Thinking-By-Numbers Is the New Way to Be Smart|Ian Ayres|https://images.gr-assets.com/books/1320449889s/1081413.jpg|2022993].
show less
This book provides plenty of food for thought and ambitious in its scope - the author uses Google search data to present theories about why Donald Trump was elected president, the prevalence of racism in American society, the value of attending an elite high school, how to determine a good sport player, and more topics both grand and petty. It's all presented in an engaging manner and helps one make sense of what some of the trends identified mean. It also can sometimes be a little creepy if one thinks about how often we interact with internet services (Google, Facebook, Twitter, etc) and the amount of data that is recorded about our behavior (even if it's largely anonymous data gathering).

Members

Recently Added By

Lists

The Hive Recommends
62 works; 2 members

Author Information

Picture of author.
14 Works 1,520 Members
Seth Stephens-Davidowitz is a contributing op-ed writer for the New York Times, a lecturer at The Wharton School, and a former Google data scientist. He received a BA from Stanford and a PhD from Harvard. His research has appeared in the Journal of Public Economics and other prestigious publications. He lives in New York City.

Awards and Honors

Common Knowledge

Canonical title*
La macchina della verità. Come Google e i Big Data ci mostrano chi siamo veramente
Original title
Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are
Original publication date
2017-05-09
Dedication
To Mom and Dad
Quotations
You might think a paper’s owner would have some influence on the slant of its coverage, but as a rule, who owns a paper ha less effect than we might think upon its political bias. Note what happens when the same person or c... (show all)ompany owns papers in different markets. Consider the New York Times Company. It owns what Gentzkow and Shapiro find to be the liberal-leaning New York Times, based in New York City, where roughly 70 percent of the population is Democratic. It also owned, at the time of the study, the conservative leaning, by their measure, Spartanburg Herald Journal, in Spartanburg, South Carolina, where roughly 70 percent of the population is Republican. There are exceptions, of course: Rupert Murdoch’s New Corporation owns what just about anyone would find to be the conservative New York Post. But, overall, the findings suggest that the market determines newspapers slants far more than owners do.

The study has a profound impact on how we think about the news media. Many people, particularly Marxists, have viewed American journalism as controlled by rich people or corporations with the goal of influencing the masses, perhaps to push people toward their political views. Gentzkow and Shapiro’s paper suggests, however, that this is not the predominant motivation of owners. The owners of the American press, instead, are primarily giving the masses what they want so that the owners can become even richer.

Oh, and one more question – a big, controversial, and perhaps even more provocative question. Do the American news media, on average, slant left or right? Are the media on average liberal or conservative?

Gentzkow and Shapiro found that newspapers slant left. The average newspaper is more similar, in the words it uses, to a Democratic congressperson than it is to a Republican congressperson.

“Aha!” conservative readers may be ready to scream, “I told you so!” Many conservatives have long suspected newspapers have been biased to try to manipulate the masses to support left-wing viewpoints.

Not so, say the authors. In fact, the liberal bias is well calibrated to what newspaper readers want. Newspaper readership, on average, tilts a bit left. (They have data on that.) And newspapers, on average, tilt a bit left to give their readers the viewpoints they demand.

There is no grand conspiracy. There is just capitalism.
Canonical DDC/MDS
006.312
Canonical LCC
QA76.9.D343
*Some information comes from Common Knowledge in other languages. Click "Edit" for more information.

Classifications

Genres
Sociology, Technology, General Nonfiction, Nonfiction, Science & Nature
DDC/MDS
006.312Computer science, information & general worksComputer science, knowledge & systemsSpecial computer methods (AI, barcoding, VR, web design, social media)Artificial IntelligenceMachine LearningData mining
LCC
QA76.9 .D343ScienceMathematicsMathematicsInstruments and machinesCalculating machinesElectronic computers. Computer science
BISAC

Statistics

Members
1,362
Popularity
17,387
Reviews
47
Rating
½ (3.74)
Languages
6 — Chinese, Czech, English, Italian, Portuguese, Spanish
Media
Paper, Audiobook, Ebook
ISBNs
24
ASINs
7