Objectives: Patients with rheumatic diseases frequently turn to online sources for medical information. Large language models, such as ChatGPT, may offer an accessible alternative to conventional patient‑education resources; however, their reliability remains poorly explored. We conducted an exploratory, descriptive comparison to examine whether ChatGPT-4 might provide responses comparable to those of experts. Methods: Seventy-six psoriatic arthritis (PsA) patients generated 32 questions (296 selections) grouped into 6 themes. Each question was answered by ChatGPT-4 and by 12 Italian PsA specialists (each drafted 2-3 answers). Fourteen clinicians, The 14 clinicians scored the accuracy and completeness of AI and human-generated answers, rated accuracy (1-5 Likert scale) and completeness (1-3). Interrater reliability was calculated, and mixed-effects ordinal logistic models were used to compare sources. In a separate arm, 67 PsA patients reviewed 16 randomly selected answer pairs and indicated their preference. Readability was assessed. No formal sample size calculation was performed; P values were descriptive and interpreted alongside effect sizes and 95% CIs. Results: Patients most frequently sought information on prognosis/comorbidities (54/76, 71.1%), therapy strategy (48/76, 63.2%), and treatment risks (38/76, 50.0%). Accuracy appeared comparable between ChatGPT and experts, but ChatGPT scored lower in completeness. Accuracy was lower for pregnancy/fertility, with no clear relevant differences in other domains. ChatGPT answers were chosen 491/998 times (49.2%), clinician answers 343/998 times (34.4%), and no preference 164/998 times (16.4%, P < .001), with a relative preference for ChatGPT responses in prognosis and therapy. ChatGPT responses were, on average, more readable across indices. Conclusions: In this exploratory study, ChatGPT-4 appeared able to generate accurate and readable responses to PsA-related questions and was often preferred by patients.

ChatGPT vs rheumatologists: cross-sectional study on accuracy and patient perception of AI-generated information for psoriatic arthritis / Forte, Giulio; Mauro, Daniele; Raimondi, Maura; Pantano, Ilenia; Gandolfo, Saviana; Cauli, Alberto; Guggino, Giuliana; Lubrano, Ennio; Guiducci, Serena; Chimenti, Maria Sole; Peluso, Giusy; D'Agostino, Maria Antonietta; Ramonda, Roberta; Caso, Francesco; Costa, Luisa; Ruscitti, Piero; Maioli, Gabriella; Lopalco, Giuseppe; Tirri, Enrico; Caporali, Roberto; Ciccia, Francesco. - In: ANNALS OF THE RHEUMATIC DISEASES. - ISSN 0003-4967. - ELETTRONICO. - (2025), pp. 0-0. [10.1016/j.ard.2025.11.012]

ChatGPT vs rheumatologists: cross-sectional study on accuracy and patient perception of AI-generated information for psoriatic arthritis

Guiducci, Serena;
2025

Abstract

Objectives: Patients with rheumatic diseases frequently turn to online sources for medical information. Large language models, such as ChatGPT, may offer an accessible alternative to conventional patient‑education resources; however, their reliability remains poorly explored. We conducted an exploratory, descriptive comparison to examine whether ChatGPT-4 might provide responses comparable to those of experts. Methods: Seventy-six psoriatic arthritis (PsA) patients generated 32 questions (296 selections) grouped into 6 themes. Each question was answered by ChatGPT-4 and by 12 Italian PsA specialists (each drafted 2-3 answers). Fourteen clinicians, The 14 clinicians scored the accuracy and completeness of AI and human-generated answers, rated accuracy (1-5 Likert scale) and completeness (1-3). Interrater reliability was calculated, and mixed-effects ordinal logistic models were used to compare sources. In a separate arm, 67 PsA patients reviewed 16 randomly selected answer pairs and indicated their preference. Readability was assessed. No formal sample size calculation was performed; P values were descriptive and interpreted alongside effect sizes and 95% CIs. Results: Patients most frequently sought information on prognosis/comorbidities (54/76, 71.1%), therapy strategy (48/76, 63.2%), and treatment risks (38/76, 50.0%). Accuracy appeared comparable between ChatGPT and experts, but ChatGPT scored lower in completeness. Accuracy was lower for pregnancy/fertility, with no clear relevant differences in other domains. ChatGPT answers were chosen 491/998 times (49.2%), clinician answers 343/998 times (34.4%), and no preference 164/998 times (16.4%, P < .001), with a relative preference for ChatGPT responses in prognosis and therapy. ChatGPT responses were, on average, more readable across indices. Conclusions: In this exploratory study, ChatGPT-4 appeared able to generate accurate and readable responses to PsA-related questions and was often preferred by patients.
2025
0
0
Forte, Giulio; Mauro, Daniele; Raimondi, Maura; Pantano, Ilenia; Gandolfo, Saviana; Cauli, Alberto; Guggino, Giuliana; Lubrano, Ennio; Guiducci, Seren...espandi
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificatore per citare o creare un link a questa risorsa: https://hdl.handle.net/2158/1452177
Citazioni
  • ???jsp.display-item.citation.pmc??? 1
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact