AI Outperforms Humans in Predicting Study Results

WRITTEN BY: Annie Lennon

In a new study, large language models (LLMs) predicted neuroscience study outcomes more accurately than human experts. The corresponding study was published in Nature Human Behavior.

"Since the advent of generative AI like ChatGPT, much research has focused on LLMs' question-answering capabilities, showcasing their remarkable skill in summarising knowledge from extensive training data," said lead author of the study, Dr. Ken Luo, Research Fellow in the Division of Psychology and Language Sciences at University College London in a press release.

"However, rather than emphasising their backward-looking ability to retrieve past information, we explored whether LLMs could synthesise knowledge to predict future outcomes," he added.

For the current study, the researchers developed BrainBench, a tool to evaluate how well LLMs can predict neuroscience results. BrainBench contains pairs of neuroscience study abstracts across five domains, including behavioral/ cognitive, systems/ circuits, and neurobiology of disease.

Both abstracts in each pair are identical in terms of background and methods, yet differ in results. While one abstract contains real study results, the other contains results that have been tweaked by domain experts to a plausible yet false outcome.

Altogether, the researchers tested 15 general-purpose LLMs and 171 neuroscience experts with BrainBench to see which group was superior in predicting real study results. Ultimately, LLMs outperformed human experts, with LLMs averaging 81% accuracy versus neuroscientists' 63%.

LLMs continued to outperform humans even when restricted to those with the highest degree of self-reported expertise in each neuroscience domain. Like human experts, when LLMs were more confident in their decisions, they were more likely to be correct.

The researchers also tested a new LLM tuned on neuroscience literature called BrainGPT. The new model proved to be more effective than the general-purpose LLMs, yielding 86% accuracy.

"In light of our results, we suspect it won't be long before scientists are using AI tools to design the most effective experiment for their question," senior author of the study, Professor Bradley Love of the Division of Psychology and Language Sciences at University College London said in a press release.

Love noted that although the approach focused on neuroscience, it is universal and should apply to other areas of science as well. Dr Luo added the researchers are now developing AI tools to aid researchers.

"We envision a future where researchers can input their proposed experiment designs and anticipated findings, with AI offering predictions on the likelihood of various outcomes. This would enable faster iteration and more informed decision-making in experiment design," he concluded.

Sources: Neuroscience News, Nature Human Behavior