62nd National Congress of the Italian Society of Rheumatology
Vol. 77 No. s1 (2025): Abstract book of the 62th Conference of the Italian Society for Rheumatology, Rimini, 26-29 November 2025

PO:38:283 | Accuracy of chatgpt 4 omni in providing clinical recommendations in musculoskeletal care: a valuable support for clinicians and patients?

Dario Taborelli1, Silvia Negro1, Federico Padovani2, Tiziano Innocenti1|3, Stefano Salvioli1 | 1DINOGMI, Università di Genova, Savona; 2Dipartimento di Neuroscienze e Riabilitazione, Università di Ferrara; 3GIMBE Foundation, Bologna, Italy

Publisher's note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.
Published: 26 November 2025
84
Views
0
Downloads

Authors

Background. Generative artificial intelligence models, such as ChatGPT-4 omni, is gaining popularity in healthcare and could provide valuable support for the clinical management of widespread diseases in the population, such as osteoarthritis. However, its accuracy in providing up-to-date clinical recommendations remains underexplored, particularly in the context of osteoarthritis (OA), one of the most common conditions with a high socio-economic impact, where it is critical that recommendations are up-to-date, clear and consistent with scientific evidence. Objective: to evaluate the accuracy and consistency of the clinical recommendations generated by ChatGPT-4 omni in relation to the most recent international guidelines for osteoarthritis (NICE, EULAR, OARSI) and to compare them with the responses given by Italian physiotherapists.

Materials and methods: cross-sectional study based on 24 clinical questions derived from OARSI, NICE and EULAR guidelines. The questions were presented to ChatGPT-4o in the form of ‘classic’ and ‘medical’ prompts, in Italian and English, for a total of nine administrations each. A complex clinical case was also evaluated. Accuracy, internal reliability (SD, CV%) and concordance (Cohen's K) were calculated.

Results: ChatGPT-4o showed good internal consistency (CV < 25%) and moderate to high agreement between Italian and English (K up to 0.902). Accuracy ranged from 66.7% to 79.2%, higher in classical and English prompts. Agreement with Italian physiotherapist was low (K = 0.106). In the clinical case, the mean accuracy was 81.4%, but answers were only 38.2% complete.

Conclusion: ChatGPT-4o proved to be potentially useful to support clinical practice in the musculoskeletal field. However, its efficacy in daily practice is contingent on its integration with human clinical reasoning and critical practitioner supervision. Keywords: Artificial Intelligence (AI), Deep Learning (DL), Large Language Model (LLM), Large Multimodal Model (LMM), Clinical Practice Guidelines (CPG), Osteoarthritis (OA), knee, hip.

 mceclip0-fba6671d0130a2950afb4da9e110d5a1.jpg

354_20250608120759.jpg

Downloads

Download data is not yet available.

Citations

How to Cite



1.
PO:38:283 | Accuracy of chatgpt 4 omni in providing clinical recommendations in musculoskeletal care: a valuable support for clinicians and patients? Dario Taborelli1, Silvia Negro1, Federico Padovani2, Tiziano Innocenti1|3, Stefano Salvioli1 | 1DINOGMI, Università di Genova, Savona; 2Dipartimento di Neuroscienze e Riabilitazione, Università di Ferrara; 3GIMBE Foundation, Bologna, Italy. Reumatismo [Internet]. 2025 Nov. 26 [cited 2026 Jan. 23];77(s1). Available from: https://www.reumatismo.org/reuma/article/view/2226