The advent of computerized medical recording systems in healthcare facilities has made data retrieval tasks easier, compared to manual recording. Nevertheless, the potential of the information contained within medical records remains largely untapped, mostly due to the time and effort required to extract data from unstructured documents. Natural Language Processing (NLP) represents a promising solution to this challenge, as it enables the use of automated text-mining tools for clinical practitioners. In this work, we present the architecture of the Virtual Dementia Institute (IVD), a consortium of sixteen Italian hospitals, using the NLP Extraction and Management Tool (NEMT), a (semi-) automated end-to-end pipeline that extracts relevant information from clinical documents and stores it in a centralized REDCap database. After defining a common Case Report Form (CRF) across the IVD hospitals, we implemented NEMT, the core of which is a Question Answering Bot (QABot) based on a modern NLP model. This QABot is fine-tuned on thousands of examples from IVD centers. Detailed descriptions of the process to define a common minimum dataset, Inter-Annotator Agreement calculated on clinical documents, and NEMT results are provided. The best QABot performance show an Exact Match score (EM) of 78.1%, a F1-score of 84.7%, a Lenient Accuracy (LAcc) of 0.834, and a Mean Reciprocal Rank (MRR) of 0.810. EM and F1 scores outperform the same metrics obtained with ChatGPTv3.5 (68.9% and 52.5%, respectively). With NEMT the IVD has been able to populate a database that will contain data from thousands of Italian patients, all screened with the same procedure. NEMT represents an efficient tool that paves the way for medical information extraction and exploitation for new research studies.

Medical Information Extraction with NLPPowered QABots: a Real-World Scenario / Claudio Crema, Federico Verde, Pietro Tiraboschi, Camillo Marra, Andrea Arighi, Silvia Fostinelli, Guido Maria Giuffre, Vera Pacoova Dal Maschio, Federica L'Abbate, Federica Solca, Barbara Poletti, Vincenzo Silani, Emanuela Rotondo, Vittoria Borracci, Roberto Vimercati, Valeria Crepaldi, Emanuela Inguscio, Massimo Filippi, Francesca Caso, Alessandra Maria Rosati, Davide Quaranta, Giuliano Binetti, Ilaria Pagnoni, Manuela Morreale, Francesca Burgio, Michelangelo Stanzani Maserati, Sabina Capellari, Matteo Pardini, Nicola Girtler, Federica Piras, Fabrizio Piras, Stefania Lalli, Elena Perdixi, Gemma Lombardi, Sonia Di Tella, Alfredo Costa, Marco Capelli, Cira Fundaro, Marina Manera, Cristina Muscio, Elisa Pellencin, Raffaele Lodi, Fabrizio Tagliavini, Alberto Redolfi. - In: IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS. - ISSN 2168-2194. - ELETTRONICO. - (2024), pp. 0-0. [10.1109/JBHI.2024.3450118]

Medical Information Extraction with NLPPowered QABots: a Real-World Scenario

Gemma Lombardi;
2024

Abstract

The advent of computerized medical recording systems in healthcare facilities has made data retrieval tasks easier, compared to manual recording. Nevertheless, the potential of the information contained within medical records remains largely untapped, mostly due to the time and effort required to extract data from unstructured documents. Natural Language Processing (NLP) represents a promising solution to this challenge, as it enables the use of automated text-mining tools for clinical practitioners. In this work, we present the architecture of the Virtual Dementia Institute (IVD), a consortium of sixteen Italian hospitals, using the NLP Extraction and Management Tool (NEMT), a (semi-) automated end-to-end pipeline that extracts relevant information from clinical documents and stores it in a centralized REDCap database. After defining a common Case Report Form (CRF) across the IVD hospitals, we implemented NEMT, the core of which is a Question Answering Bot (QABot) based on a modern NLP model. This QABot is fine-tuned on thousands of examples from IVD centers. Detailed descriptions of the process to define a common minimum dataset, Inter-Annotator Agreement calculated on clinical documents, and NEMT results are provided. The best QABot performance show an Exact Match score (EM) of 78.1%, a F1-score of 84.7%, a Lenient Accuracy (LAcc) of 0.834, and a Mean Reciprocal Rank (MRR) of 0.810. EM and F1 scores outperform the same metrics obtained with ChatGPTv3.5 (68.9% and 52.5%, respectively). With NEMT the IVD has been able to populate a database that will contain data from thousands of Italian patients, all screened with the same procedure. NEMT represents an efficient tool that paves the way for medical information extraction and exploitation for new research studies.
2024
0
0
Claudio Crema, Federico Verde, Pietro Tiraboschi, Camillo Marra, Andrea Arighi, Silvia Fostinelli, Guido Maria Giuffre, Vera Pacoova Dal Maschio, Fede...espandi
File in questo prodotto:
File Dimensione Formato  
Crema_2024.pdf

Accesso chiuso

Tipologia: Pdf editoriale (Version of record)
Licenza: Solo lettura
Dimensione 1.33 MB
Formato Adobe PDF
1.33 MB Adobe PDF   Richiedi una copia

I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificatore per citare o creare un link a questa risorsa: https://hdl.handle.net/2158/1397416
Citazioni
  • ???jsp.display-item.citation.pmc??? 0
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact