Human researchers are superior to large language models in writing a medical systematic review in a comparative multitask assessment

IRIS

The capability of Large Language Models (LLMs) to support and facilitate research activities has sparked growing interest in their integration into scientific workflows. This paper aims to evaluate and compare against human researchers the performance of 6 different LLMs in conducting the various tasks necessary to produce a systematic literature review. The evaluation of the 6 LLMs was split into 3 tasks: literature search, article screening and selection (task 1); data extraction and analysis (task 2); final paper drafting (task 3). Their results were compared with a human-produced systematic review on the same topic, serving as reference standard. The evaluation was repeated on two rounds to evaluate between-version changes and improvements of LLMs over time. Out of the 18 scientific articles to be extracted from the literature for task 1, the best LLM managed to identify 13. Data extraction and analysis for task 2 was only partially accurate and cumbersome. The full papers generated by LLMs for task 3 were short and uninspiring, often not fully adhering to the standard PRISMA 2020 template for a systematic review. Currently, LLMs are not capable of conducting a scientific systematic review in the medical domain without prompt-engineering strategies. However, their capabilities are advancing rapidly, and, with an appropriate supervision they can provide valuable support throughout the review process.

Human researchers are superior to large language models in writing a medical systematic review in a comparative multitask assessment / Sollini, M., Pini, C., Lazar, A., Gelardi, F., Ninatti, G., Bauckneht, M., Chiti, A., Kirienko, M.. - In: SCIENTIFIC REPORTS. - ISSN 2045-2322. - 16:1(2026). [10.1038/s41598-025-28993-5]

Human researchers are superior to large language models in writing a medical systematic review in a comparative multitask assessment

Sollini, Martina^Primo;Lazar, Alexandra;Gelardi, Fabrizia;Ninatti, Gaia;Bauckneht, Matteo;Chiti, Arturo^Penultimo;Kirienko, Margarita^Ultimo

2026-01-01

Abstract

The capability of Large Language Models (LLMs) to support and facilitate research activities has sparked growing interest in their integration into scientific workflows. This paper aims to evaluate and compare against human researchers the performance of 6 different LLMs in conducting the various tasks necessary to produce a systematic literature review. The evaluation of the 6 LLMs was split into 3 tasks: literature search, article screening and selection (task 1); data extraction and analysis (task 2); final paper drafting (task 3). Their results were compared with a human-produced systematic review on the same topic, serving as reference standard. The evaluation was repeated on two rounds to evaluate between-version changes and improvements of LLMs over time. Out of the 18 scientific articles to be extracted from the literature for task 1, the best LLM managed to identify 13. Data extraction and analysis for task 2 was only partially accurate and cumbersome. The full papers generated by LLMs for task 3 were short and uninspiring, often not fully adhering to the standard PRISMA 2020 template for a systematic review. Currently, LLMs are not capable of conducting a scientific systematic review in the medical domain without prompt-engineering strategies. However, their capabilities are advancing rapidly, and, with an appropriate supervision they can provide valuable support throughout the review process.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2026
			
	Parole chiave
	
				Artificial intelligence
Evidence-based medicine
Generative artificial intelligence
Large language models
Scientific writing
Systematic review
			
	Appare nelle tipologie:
	
				1.1 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
s41598-025-28993-5.pdf accesso aperto Tipologia: PDF editoriale (versione pubblicata dall'editore) Licenza: Creative commons Dimensione 1.37 MB Formato Adobe PDF Visualizza/Apri	1.37 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11768/197076

Citazioni

1

1

0

social impact