When Can We Trust Brains to Predict Abilities?
A new mega study finds big problems in how small neuroscience studies often are.
Posted March 17, 2022 Reviewed by Michelle Quirk
Key points
- Examining combined data from over 50,000 brains to estimate how many people are needed to study typical brain-behavior connections.
- Typical studies of this type use about 25 people, but the authors concluded that thousands are needed to get reliable results.
- Studies comparing brain states for the same person may be okay with fewer participants.
There are a few consistent themes that stick out in modern science reform. One is that psychologists have been massively underestimating how much data we need to draw firm conclusions. It’s not like we haven’t had fair warning: Jacob Cohen, one of the most famous quantitative psychologists in the history of the discipline, wrote about this problem starting in the 1960s. But the message has still taken a long time to sink in. Now, a new neuroscience study shows how we often do research that has too few brains in it to draw reasonable conclusions.

The new study looks at connections between individual differences in brain structure or function and behaviors or abilities. In other words, how much does something we learn from a brain scan tell us about something people can do? For example, can we relate someone’s IQ score or mental health symptoms to cortical thickness (how wide the grey matter in their brain is) or how certain networks in the brain activate? To get precise estimates of how strong these kinds of associations are, the researchers looked at huge neuroscience databases totaling around 50,000 people. This is a huge improvement over typical sample sizes in studies like this, which typically include around 25 people.
Results suggest that, as in many other research areas that have been re-examined in recent years, typical studies in this area are much too small to provide reliable information. In contrast to a typical 25-person study, the authors concluded that studies looking at how individual differences in behavior or ability are related to differences in the brain should include “thousands” of individuals.
Errors With Small Studies
Small studies can lead to many types of errors. They include the following:
- Missing out on real effects: Analyses the authors ran suggest approximately 12,000 people are needed to have a good chance (95 percent) of finding these types of behavior–brain relationships.
- Believing relationships are much larger than they really are: For example, they show that, until a study reaches around 700 participants, it has a substantial risk of showing an effect size that’s 200 percent larger than the true value. Until a study reaches around 2,800 people, it has a risk of showing an effect size that’s 50 percent larger than it should be.
- Getting opposite results: For example, it might be that activity of one brain network is positively related to cognitive ability—strong activity in that network means someone will likely perform better on that type of testing. But a small study might mistakenly conclude the opposite: Strong activity in the network means worse performance. If proper thresholds are not used when looking for effects, then opposite results are found more than 20 percent of the time even when around 2,800 people are studied. With better thresholding, results can fall to a fraction of 1 percent when 2,800 participants are studied.
- Not being able to replicate a finding in comparable data: The researchers tested this by cutting the data in half, and seeing if a result seen in one half was also seen in the other. Even when using around 2,000 people, results only replicated about 20 percent of the time.
Long story short, the standard approach to connecting brains to symptoms and abilities doesn’t work. Studies of 25, 100, or even 1,000 people present a high risk of missing important results, presenting hugely inflated results, presenting results that are the exact opposite of the truth, and presenting results that don’t show up when the same study is repeated. Funding and publishing studies of this size is a waste of resources. It can’t give us reliable answers to the questions we’re asking.
Why Do We Need So Many Brains?
The short answer is that brains can’t predict what kind of person you are very well. The strongest 1 percent of results showed a correlation of about r = 0.1 (on a scale of 0 to 1.0). This means that, at best, brains can explain about 1 percent of the variability in abilities and symptoms. Using more complex techniques that create weights of lots of different aspects of brain functioning, the researchers found they could explain up to around 16 percent of scores on cognitive ability tasks. (Results were closer to 1 percent for predicting mental health symptoms.) That suggests we may be able to create slightly better predictive models, but, overall, there’s a lot about behavior that brains just can’t explain.
The Other Surprising Point
Looking at all the figures in the paper and their supplementary materials, I kept being struck by one fact: Brains predict scores on cognitive tests (like IQ and memory tests) way better than they predict mental health (like depression, anxiety, aggression, etc.). This stuck out particularly when the more sophisticated modeling techniques were used, which combined lots of measures of brain functioning. Brains seem to predict testing more than twice as well as mental health. This suggests (to me) that social factors, like living in a bad environment or in poverty, might be more important for understanding mental health, as compared to brains.
Caveat
These results are focusing specifically on neuroscience studies that look at individual differences in brains and abilities. It doesn’t comment on studies that look at one person’s brain changing as they do different activities. Studies like that, where an individual’s brain might be measured when they are looking at happy, funny images versus sad, depressing images, need fewer people. So researchers probably don’t need thousands of people to reliably estimate changes within brains. This only applies to differences between brains.

Conclusion
Similar reckonings have come for studies of genetics and psychology in recent years. At Slate Star Codex, Scott Alexander wrote an excellent post showing how hundreds of studies on a particular gene thought to increase the risk for depression (called 5-HTTLPR) were basically wrong. Aggregation of the research shows that it has basically no effect. Neuroscientist Dorothy Bishop posted a “back of the envelope” calculation of the amount of money wasted on researching this particular gene: $1 trillion from the U.S. National Institutes of Health. Social psychologists, who regularly used to run studies with just 30 or 40 people, have been repeatedly shown that this type of work does not hold up.
This new manuscript shows that the same lessons psychologists and behavioral geneticists have learned need to be applied to neuroscience. The way forward is fewer, bigger studies we can trust. Small studies aren’t reliable. If you’re reading a neuroscience study that claims to predict someone’s abilities or mental health symptoms, and fewer than 1,000 participants were used, don’t believe it. And if you’re a scientist, don’t cite it.
References
Marek, S., Tervo-Clemmens, B., Calabro, F.J. et al. Reproducible brain-wide association studies require thousands of individuals. Nature (2022). https://doi.org/10.1038/s41586-022-04492-9