Chapter Content

Calculating...

Okay, so like, imagine you're a king or queen, right? And you've got these, like, soothsayers coming to your court, trying to impress you with their wisdom and stuff. And, uh, two of them step up, both saying they can predict the future.

The first one, she's like, "Your Majesty, I can accurately predict a trend that'll happen in, like, six months." Okay, cool. Then the second oracle, she kneels down and, with total confidence, announces that she can predict, with absolute certainty, an event that's gonna take place on Saturday, April... whatever date, in the year 3000! Like, way far out in the future.

So, who do you trust more, right? I mean, your gut might tell you to go with the shorter time frame. Like, a lot can change in, you know, almost a thousand years. But it really, really depends on what they're predicting, and what kinda uncertainty each one is dealing with.

And this, this is key, right? Because what if I told you that the first prediction was about American economic growth being above 3 percent in six months? And the second one was about a total eclipse occurring on that Saturday in the year 3000? Suddenly, the eclipse prediction seems a whole lot more reliable. I'd bet on that eclipse any day, but the growth rate? No way.

We always say, "It's not rocket science," you know? But, and hear me out, this sounds crazy, but it kinda makes more sense to say, "It's not social science" when you're talking about something super, super difficult. Like, we got geniuses working on both, right? But rocket scientists would, I think, admit that predicting the super-stable behavior of planets and moons is, like, a piece of cake compared to making correct long-term predictions about, you know, eight billion intertwined humans! That's wild, man.

Still, so much of our world is shaped by our, you know, less-than-perfect understanding of how humans work. We, like, allocate budgets, set tax rates, all based on economic forecasts that are hardly ever accurate beyond, like, a short period. We go to war, or don't go to war, based on subjective risk assessments that, like, turn out to be totally wrong. Businesses, they invest billions based on speculative predictions. It's, it's a whole thing, you know?

We've already seen that the world works in, like, totally different ways than we think it does. And this, this false image of reality, it sticks around 'cause it's, like, reflected back at us in flawed social research. A lot of our modern oracles in, you know, economics, political science, sociology, they just kind of, uh, reinforce our storybook version of reality. Those neat and tidy myths that write out all the important flukes of life as just "noise."

Our understanding of ourselves, it starts from this kinda incorrect assumption that regular, like, cause-and-effect patterns are stable, you know, across time and space. Our search for understanding becomes this "Does X cause Y?" search, which, like, systematically ignores the role of chance and complexity. But if that storybook version of reality that's used in most research is, well, misleading, how do we present reality in a way that actually captures those, like, random accidents and, you know, takes them seriously as drivers of change?

There's this saying, "All models are wrong, but some are useful." We forget that way too often, you know? Conflating the map with the territory, thinking our simplistic representations actually, accurately depict the world. How many times have you read something like, "A new forecast says..." or "A recent study discovered that..." and just accepted it without looking at the assumptions or how they got there? Social research, it's our best tool to navigate, like, an uncertain world. It can be super, super helpful. But if we wanna avoid costly mistakes, you know, the really bad ones, we need a more accurate idea of what we *can* and *can't* understand about ourselves as we navigate this crazy world that's swayed by the random, the arbitrary, the accidental. It's time we were honest about how little we really know. We gotta, like, dive into the world of social research a bit and see how the sausage is made.

We can actually break down this problem into two parts. I call them the Easy Problem of Social Research and the Hard Problem of Social Research. The easy one comes from flawed methods. It can be fixed. The hard one, though? It's probably unsolvable. Not 'cause of human error, but because some uncertainty connected to human behavior is, like, absolute and unsolvable.

So, let's look at what's easy and what's hard.

A while back, a social psychologist, really prominent guy, decided to test whether precognition, or ESP, was real. And this guy, he wasn't some weirdo. He studied physics, got his PhD, taught at Harvard and Stanford and Cornell. He was legit. Using standard research methods, he did a series of experiments. In one, they showed people two curtains on a screen, and they had to guess which one was hiding, uh, an erotic image. Get this, the participants guessed right more often than random chance would predict. Crazy, right? And even crazier, their predictive powers vanished if the photos weren't erotic. The results were, like, verified using statistical measures.

He didn't have a good explanation, or any explanation, really, for this supposed supernatural ability. But when he ran the numbers, he felt like he'd confirmed it. Some people could "feel the future," as the title of his article said. His research went through peer review and got published in a top journal. It made a splash. The press ate it up. He was on TV, you know, a big deal.

But not everyone bought it. Some researchers tried to replicate the results. When they did the same experiments, nobody in *their* studies could "feel the future." That seemed like strong evidence that his findings weren't as real as he thought. But when they tried to publish their challenge to his work, they had a hard time. They were told it was old news. Why repeat something that had already been studied? They eventually got their paper sent out for peer review, where other academics evaluate it anonymously. One reviewer loved it. The other rejected it, killing its chances. Wanna guess who that second reviewer was? Yep, the original researcher himself.

The new study, the one that challenged the original research, did eventually get published. It contributed to this overdue reckoning in social research, especially in social psychology, known as the "replication crisis." When researchers tried to repeat earlier studies, experiments, including findings that everyone had accepted as true, they got different results. One study, they tried to replicate the findings of a hundred influential experiments published in big psychology journals. Only, like, a third of them worked. Bold claims were, like, invalidated. A lot of what we thought we knew was wrong. This crisis, it shook our faith in accepted truths. And it raised a scary question: what else are we wrong about?

To make a point about how messed up the system had become, some researchers even tried to get obviously false claims published. One time, they managed to get statistically valid results, or so it seemed, proving that listening to the Beatles song "When I'm Sixty-Four" actually made people younger. Like, biologically younger. Another study showed that women were more likely to vote for Barack Obama if they were ovulating when they voted. These "findings" followed accepted methods and, like, passed the standard thresholds for publication. What was going on?

Social researchers are, unfortunately, sometimes using bad methods or even, like, deliberately gaming the system. It might seem like inside baseball, you know, something only a social scientist would care about. But we all have a stake in this, because social research is often what societies, and leaders, use to make decisions. Airing all the dirty laundry is useful, it corrects our incorrect storybook version of reality, that imagined world where X always causes Y and flukes don't matter. Understanding these flaws will give you the tools to evaluate new "findings" with a healthy dose of skepticism.

I gotta get into the weeds here, but bear with me. It's important to understand why we get it wrong so often. Most studies in political science, economics, sociology, psychology, they produce something called a P-value. This is the measure that social researchers use to decide if a finding is real or just, you know, nothing. When the P-value is low enough, they think it's evidence that the finding is likely real, statistically significant. The research community mostly agrees that the threshold for publication is a P-value below 0.05. So, in practice, a study with a P-value of 0.051, won't get published, but one with 0.049 will. If that dreaded 0.051 pops up, researchers can, like, massage that P-value down to 0.05. Data can be sliced and diced in a lot of ways. Researchers might just pick the option that gives them the lower P-value. There's a saying, "If you torture data long enough, it will confess."

This system puts a terrible incentive into research because getting published is tied to promotions and grants and career stuff. When researchers tweak their data to get that low P-value so they can publish, that's called P-hacking, and it's, like, a major problem in modern research, making us misunderstand our world. But how much of this is really going on?

In one analysis of articles in top journals, they found this huge spike in the number of articles with P-values just below the threshold. Strong evidence that the system is being skewed. The replication crisis, partly started by those discredited ESP studies, blew the lid off P-hacking. But it didn't really stop it. Years later, economists looked at data in economics journals and found that up to a quarter of the results, using certain methods, had misleading data interpretations and potential evidence of P-hacking. That's a big chunk of research that affects how we see the world. These bogus studies often show cause and effect, reinforcing the idea that we can ignore the flukes because reality looks neater and more ordered when you contort it with P-hacking. X causes Y, and we've got the low P-value to prove it.

Bad research also gets published because of something called the file drawer problem. Say I ask you to flip a coin ten times. There's a small chance, like 5 percent, that you'll get at least eight heads. If you flip the coin ten times, twenty times in a row, you'll probably get eight heads at least once. Now, imagine you keep flipping the coin until you get eight heads. When you do, you brag to your friend, like, "I flipped a coin ten times and got eight heads! Amazing!" But you don't mention how many times you tried and failed before that.

Now, imagine the same thing with researchers trying to prove ESP. Nineteen researchers do experiments and find nothing. No result, no publication. They quietly stick their findings in a file drawer. Then, just by chance, the twentieth researcher "discovers" something astonishing that passes all the statistical tests. Excitedly, they rush to publish it. The nineteen failed experiments are invisible, collecting dust in the file drawer. The one "successful" experiment is visible, convincing people the effect is real. That's the file drawer problem.

If you knew nineteen out of twenty researchers had found nothing, you'd question the "discovery," but those studies aren't published, so you're blind to them. The file drawer problem makes our understanding of reality seem more ordered than it really is, and it encourages researchers to focus on research with "positive" results, instead of showing no relationship between cause and effect, or debunking bad research, which is just as important. Some researchers who made bold claims that were later proven wrong are still famous. Few have ever heard of the ones who did the debunking.

Bad research is often just as influential as good research. One study found that research that failed to replicate is cited at the same rate as research that's been verified. These research flaws are often obvious. One study asked experts to read papers and bet on which ones would be confirmed through replication. They were right most of the time. They could spot what was too good to be true right away. The defense research agency even has a "bullshit detector" for social research, with some success. But despite being easy to spot, a lot of bad research still gets produced. And peer review, it's a broken system. In one study, researchers deliberately planted flaws in research articles to see how many would get caught by reviewers. They only caught one in four.

These issues are on top of others linked to our storybook version of reality. For example, research continues to assume that we live in a linear world where the size of a cause matches the size of the effect. It's mapped as if everything fits on a straight line. But that's wrong. Yet, quantitative models still imagine that world. Why? Because quantitative social science came about when computing power was expensive and less advanced. That way of seeing the world stuck around because of arbitrary lock-in, even though we're capable of much more now.

Complexity science, and those who use complex adaptive systems to understand the world, that's only a tiny bit of research these days. We just pretend the world is one way when we know it's another way, and that causes serious errors in how we run things.

Now, someone might wrongly think that I'm saying social research is pointless. I'm not. We navigate our world better than we used to because of advances in research fields. Social science students are warned about P-hacking, and some journals are trying to deal with the file drawer problem. Transparency has increased. Just because economists or political scientists get it wrong sometimes doesn't mean we should ditch economics and political science. We should work hard to solve the Easy Problem of Social Research. And it *can* be solved.

But the Hard Problem? That can't be solved.

This is where things get confusing, and where it's clear that the seemingly random "noise" matters more than we think. A few years ago, social scientists tried something new. They crowdsourced research to answer a question that had divided scholars and the public: As more immigrants arrive in a country, do voters become less supportive of the social safety net? Does immigration create a backlash against social spending programs from voters who see them as illegitimate "handouts"? It's an important question, but the evidence has been mixed. Some studies said yes, others said no. What would happen if they gave researchers the same data and asked the same question? Would they get the same answers?

Seventy-six research teams participated. They didn't communicate with each other. They each took their own approach to figuring out the patterns in the numbers. When the study was over, the teams had produced over a thousand mathematical models to estimate the effect of immigration on support for social welfare programs. No two models were the same.

What they found was extraordinary: totally mixed results. A little more than half found no clear link between immigration and public support. But the rest were split, almost down the middle. Some found that immigration decreased support, others found the opposite. About a quarter of the models said yes, about a quarter said no, and half said "nothing to see here."

Trying to understand what had happened, the researchers looked closely at each team's methodological choices. But those choices only explained a small amount of the variation in the findings. Most of it was unexplainable. Nobody could figure it out. The researchers concluded that even the smallest methodological decisions could drive results in different directions. That creates unavoidable challenges that can't be wished away or solved with better math. Part of the Hard Problem is that we live in a "universe of uncertainty."

Most of the time, research teams aren't assigned to answer a specific question. Usually it's just one researcher, or a small group, working on it. Imagine if this question had been asked and answered by just one researcher. A study might have been published showing that immigration decreases support, or one showing that immigration increases support. Each would have been equally likely. That single study might have gotten press coverage and changed public views. But it would be a total toss-up.

Now, imagine if the research was open-ended and each team could pick and choose whatever data they wanted. All bets would be off. But that's how research normally works. This is another part of the Hard Problem: we can't agree on what's going on even when we're working on the same question with the same data.

Unfortunately, that's not all there is to the Hard Problem. What if the world we're trying to understand is constantly changing? Take the study of dictatorships. In the past, political scientists developed this idea called "authoritarian durability" to describe dictatorships. The idea was that certain dictatorships survive for a long time, no matter what. The theory made sense. The data supported it. There were even examples, terrible tyrants such as Gaddafi, Ben Ali, and Mubarak. Books were written about why their regimes were so stable. It became accepted. Dictators may be ruthless, but they produce stability.

Then, a vegetable vendor in Tunisia set himself on fire. Soon, the theory was seemingly destroyed. The dictators were toppled, their palaces ransacked. Ben Ali fled, Mubarak was arrested, and Gaddafi was killed. Authoritarian durability had been badly wrong. Its proponents seemed to be deeply mistaken. But it had taken everyone by surprise. When I was working on my doctorate, before I went to Tunisia, I remember seeing a poster on a professor's wall. It was a "political risk map" of the Middle East, created by people who were paid to navigate risk. The safe, stable countries were shaded green. As I looked at the map, every single green area was on fire, in the midst of revolution or war.

Here's the crucial, unanswerable question: Was the original theory wrong, or did the world change?

It's possible that Gaddafi and Mubarak were fragile all along, and we just misunderstood them. But maybe the Arab Spring changed how Middle Eastern dictatorships work. What was once resilient became brittle. We accept these shifts in the physical world. Water struck with a hammer absorbs the blow, but if you freeze it, the damage becomes visible. The water has changed, so the theory of its properties must change, too. Maybe the theory of Middle Eastern dictatorships was right, at least until around 2010, and then the world became fundamentally different. Who knows? It's impossible to say for sure. Theories don't come with an expiry date.

However, when we decide that social theories have gotten something wrong, a lot of people think the theory was wrong all along. That's a mistake. Social theories aren't the same as chemistry theories. If cavemen added baking soda and vinegar, they'd get the same fizz we do. That kind of stability doesn't exist with social dynamics. A pattern of cause and effect may exist in one context for a time, until the social world changes and the pattern goes away. In human society, some forms of causality shift. Yet, we think there's some fixed truth that we're about to discover, failing to recognize that the truth of our social systems is constantly changing, eluding our understanding.

Things get even more baffling when you consider that we only inhabit one possible world. If you believe in the idea of multiple realities, then our world is the result of countless potential paths that, but for a small change, we might have followed. But we only have one Earth to observe. That makes it impossible for us to know what's probable and what's improbable, especially for rare, important events.

On September 10, 2001, there was some unknowable probability that the attacks planned for the next day would succeed. Maybe the terrorists had a small chance of pulling it off. Or, maybe they had a really high chance, almost a sure thing. But once 9/11 happened, we can't replay history to figure out which it was, because we only have a single data point: it happened.

Events with low probability sometimes happen, and so do events with high probability, but if an event only happens once, it's hard to tell whether it was inevitable or just a fluke. You can keep flipping a coin to understand how it works, but you can't keep rerunning history. We can't know whether our world is a representative sample of all possible worlds, or if it's a crazy outlier, a bizarre reality. With just one Earth to observe, there are some things we may never know.

Let's think back to that forecast about the 2016 election. It predicted Hillary Clinton had a 71.4 percent chance of winning. These models use polls and data based on past patterns. They're good at figuring out whether polls are accurate and putting together a model based on that data. But they're no better than the rest of us at predicting events, like whether a foreign government will hack a political server or whether some politician's unrelated computer files will cause the FBI director to reopen a federal investigation days before the election. All of this analysis seems like hard science because it's statistically sophisticated, using thousands of simulations. But there aren't thousands of elections. There's just one. It's uncertain. We don't know if the outcome we experienced, Trump winning, was normal, an extreme outlier, or something in between because we can't rerun history. You can find out the probability of flipping heads is about 50 percent by flipping the coin over and over. But can you tell if a coin is fair or biased if you only flip it once and it comes up tails? Obviously not, but for unique events, we often try to make that judgment and fail.

When Clinton lost, they pointed to their model: 71.4 percent isn't 100 percent! There was a chance of Clinton losing, so the model wasn't wrong, that's just something that would happen sometimes. If you say they were wrong, you don't understand math. Could that model ever be "wrong" in that election? When the model predicts something with a low probability and it happens, then it's just the world being weird, not the model being incorrect. It's impossible to disprove. And when you can't disprove things, we get stuck in ruts, and our misunderstandings about the world grow.

Why does this old storybook worldview, the one with order and linearity, persist even though it's so wrong? If it was so wrong, wouldn't it have been replaced by something better?

Think about the differences between basketball and rowing. Basketball teams with one exceptional star, can win even if one of the players is useless. You can afford to have a weak link as long as your strongest link is really strong. Some stuff is like this. You can have a million terrible songs on Spotify and still be happy as long as it has the songs that you love the most. To improve something like this, you can ignore the bad stuff and focus on making the best stuff better.

Rowing is the opposite. Speed depends on synchronization, balance, and timing. If even one rower is off, the boat will lurch, the oars will make a racket, and the crew will lose. They're only as good as their worst athlete. It's a weak-link problem. Food safety, for example, is a weak-link problem. You don't want to eat anything that will kill you. A car engine is a weak-link problem: it doesn't matter how great your spark plugs are if your transmission is busted. To fix a weak-link problem, you can't focus on the best bits. You have to eliminate the weakest links.

Science is a strong-link problem. It's the best discoveries that change society, and it doesn't matter much if a bunch of bogus sludge clogs up academic journals. People had stupid ideas about splitting the atom. That didn't matter because all that was needed was one idea that worked.

Science tests theories. At some point they just don't work, and we can decide that the theory has been proven wrong. Some idiots still believe the Earth is flat, but that doesn't affect our ability to explore space because it's the strong link that matters. Science is an engine of progress because it combines a strong-link problem with evolutionary pressures, which makes the strong links stronger over time. The weak ideas eventually die. The strong ideas survive, driving human progress.

Social theories, in principle, should work the same way. But they don't. Bad ideas stick around. In physics, even a tiny error is enough for an idea to be rejected. That's not true in social theories. They can't predict recessions accurately, yet the same models dominate. Even theories with a terrible track record, such as the idea that cutting taxes for the richest people leads to economic growth, persist for decades. It's hard to prove social theories wrong. Getting it right sometimes is enough for people to keep believing in them. That makes it harder to tell the gold from the garbage.

Even when a theory seems to fail, it's impossible to conclude that it's been proven wrong. Maybe that country was just an outlier. Maybe the economy dipped for another reason. Social complexity and ideology shield social research from the pruning of stronger links. The survival of incorrect social theories is also worsened because everyone feels like an expert on understanding society, which isn't true for quantum mechanics. What should be a strong-link problem gets distorted, and weak links can hold sway, even become the norm. Few social theories are ever disproven the same way as the idea that the sun rotates around the earth. As a result, we get a distorted view of reality anytime we read about how societies are supposed to work.

There's one more reason why our modern intellectuals underestimate the importance of small changes as drivers of change. In recent decades, social research has become quantitative. We've started using math to understand everything. That was possible because of computers, which made it easier to analyze data and find patterns. To understand ourselves, we turned to equations.

Using numbers isn't bad. Plenty of intellectuals hate math. I'm not one of them. Math governs everything. Our world is full of mathematical relationships. Everything, at its core, is math. But sometimes, the equations that govern systems are too complex to represent accurately with math. It's theoretically possible, but impractical.

One way to think about complexity is to ask, How long would the equation have to be? A scientist can write a short equation for energy and mass. But how would you write an equation for a mouse?

It's not that an equation to describe a mouse doesn't exist, it's that it would be too long. Human society is too complex too. But that hasn't stopped us from trying and failing to use short equations to represent complex systems. That's why we're wrong so often. We often use equations to describe complex systems that can change on the tiniest detail. We're trying to describe a mouse in a few lines. It's impossible.

There's this phrase, "truthiness." If a claim felt true, it was true, no matter the facts. It describes a flaw in economics research: mathiness. Economics was using math to hide assumptions and flimsy results.

Modern attempts to understand ourselves produce equations that are nonsensical. I often look at quantitative research in social science journals and shake my head. Imagine a civil war has broken out. You wonder, "Will my friend Peter decide to grab a gun and join the rebels?" Wouldn't it be nice if there were an easy way to figure it out? Well, here's the formula:

(insert confusing equation here)

These are the emperor's new equations. They're absurd, but nobody says so. The flukes of life are dismissed as the "error term," to be ignored. They also create a world in which the "noise" is irrelevant because what matters are "signals," big shifts in obvious variables. Even worse, when researchers notice unusual data points, they "clean" the data by eliminating outliers. The goal is to show a clear pattern, so you can't let the equation be swayed by something unusual that happened once. In pursuit of findings in an irregular world, get rid of whatever doesn't fit. Find the signal, delete the noise.

That's crazy. Because of how things work, those outliers are often the most important bits of data. It's like proclaiming the Titanic's voyage a success because most of the journey went well. Nonetheless, outliers are deleted to produce a neater equation, reflecting the world we think we're looking at.

This problem is made worse because the use of numbers in social research has meant that few social researchers come face-to-face with the human dynamics they're trying to understand. If you study actual people, you can see how complex things are. In data, you can't. Imagine going to a talk by an expert on elephants who has never observed an elephant. It would be ludicrous. But for those who study humans, that has become the norm.

You might say, "Data is king. Looking at one elephant is fine, but you need to understand the herd." Sometimes, that's true. But the problem is that we don't understand the human herd, either. Understanding ourselves is most useful if we can solve problems and make the world better. That's what social research is for. To do that, we need to be able to predict what will happen if we lower tax rates, invade a country, or try to rehabilitate criminals. Yet, social science basically doesn't even try to make predictions. Researchers examined academic journals in a variety of disciplines. Very few articles tried to make predictions.

Instead, social researchers chase the Holy Grail of Causality. That's good because we all know that correlation isn't causation. But like the Holy Grail, the Holy Grail of Causality is hard to get. We crave clear evidence that one thing causes another, and so long as we continue our quest, we think we might just find it. But in a complex world, the search is mostly futile. It's important to try to figure out which causes are most important, even if they're not the only cause. But that's a question of usefulness, not causality. Once we start chasing usefulness instead of causality, we'd be better off putting rival theories through a test to see which ones survive by predicting outcomes. Prediction for the sake of it isn't useful, but it can transform our lives if it allows us to get better at improving outcomes and avoiding disaster.

How good are we at predicting social outcomes? One challenge was to study families. The purpose was to try to figure out how predictable our lives can be. Data was collected about the same children at ages one, three, five, nine, fifteen, and twenty-two. The data was detailed, not just with numbers, but also with interviews. After the data from the children who had turned fifteen was collected, it wasn't released. Instead, the researchers held a competition, in which they gave competing teams of scientists access to the data from the children at earlier ages. The challenge was to see who could best predict life outcomes for the children now that they were fifteen years old. Because the researchers had the real-world outcomes, they could see how well the teams had done. The teams used machine learning, the most powerful tool ever invented.

The results surprised the researchers. They had assumed that some of the teams would be wrong, but thought that a few would have nailed it. Instead, the teams were all bad. On almost every metric, even the best teams were about as good as a model that used random guessing. That provides two lessons. First, if we want to understand ourselves, we need to make predictions so that we can learn from our failures and develop new tools to make better predictions. The challenge will be a catalyst for innovation in social research. We will get better at it, and new tools will overcome many challenges derived from the Easy Problem.

But the second lesson is that our lives and the future of society are hard to predict. Compared to that, rocket science is easy. That's why the Hard Problem will continue to survive, blocking humans from ever fully understanding ourselves. In our complex world, some uncertainty cannot be gotten rid of. No matter how hard we try, the flukes of life will continue to confuse us.

Go Back Print Chapter