Natural Language Processing (NLP) is a discipline that involves the design of methods that process text. Deep learning, and Machine Learning (ML) in general, is the discipline that studies and implements methods that learn to make predictions from data. In the last years, many different ML methods have been presented in the context of NLP. In this work we focused in par- ticular on text classification methods. Cancer registries collect pathology re- ports from clinical data sources and combine them with administrative data sources to identify cancer diagnoses in a specific area. Here we present a large scale study on deep learning methods applied to cancer pathology reports in Italian language. In this study we developed several classifiers to predict to- pography and morphology ICD-O codes. We compared classic machine learn- ing approaches, i.e. Support Vector Machine (SVM), with recent deep learn- ing techniques, i.e. Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU). Furthermore, we compared recent attention-based and hierar- chical techniques, e.g. Bidirectional Encoder Representations from Transform- ers (BERT), with a more simple hard attention method, showing that the latter is enough to perform slightly better in this specific domain.

Classification of cancer pathology reports with Deep Learning methods / Martina, Stefano. - (2020).

Classification of cancer pathology reports with Deep Learning methods

Martina, Stefano
2020

Abstract

Natural Language Processing (NLP) is a discipline that involves the design of methods that process text. Deep learning, and Machine Learning (ML) in general, is the discipline that studies and implements methods that learn to make predictions from data. In the last years, many different ML methods have been presented in the context of NLP. In this work we focused in par- ticular on text classification methods. Cancer registries collect pathology re- ports from clinical data sources and combine them with administrative data sources to identify cancer diagnoses in a specific area. Here we present a large scale study on deep learning methods applied to cancer pathology reports in Italian language. In this study we developed several classifiers to predict to- pography and morphology ICD-O codes. We compared classic machine learn- ing approaches, i.e. Support Vector Machine (SVM), with recent deep learn- ing techniques, i.e. Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU). Furthermore, we compared recent attention-based and hierar- chical techniques, e.g. Bidirectional Encoder Representations from Transform- ers (BERT), with a more simple hard attention method, showing that the latter is enough to perform slightly better in this specific domain.
2020
Paolo Frasconi
ITALIA
Goal 3: Good health and well-being for people
Goal 9: Industry, Innovation, and Infrastructure
Martina, Stefano
File in questo prodotto:
File Dimensione Formato  
thesis.pdf

accesso aperto

Descrizione: Tesi di Dottorato
Tipologia: Tesi di dottorato
Licenza: Open Access
Dimensione 1.19 MB
Formato Adobe PDF
1.19 MB Adobe PDF

I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificatore per citare o creare un link a questa risorsa: https://hdl.handle.net/2158/1187936
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact