Graphs are a natural representation of the patterns we glimpse in the world as we perceive it. The data as we receive it from nature is not only a set of objects but also a group of informative interactions within them. Thanks to this remarkable expressiveness, graphs have achieved ubiquitous prominence beyond mathematics and have permeated various scientific domains, including Document Understanding. Documents have a precise and rather complex structure: the objects found within them can take on different meanings depending on their positioning and/or their mutual relationships. These reasons elected graphs as an adequate framework for leveraging structural information from documents, due to their inherent representational power to codify the object components (or semantic entities) and their pairwise relationships. The recent success of Geometric Deep Learning as well as Graph Neural Networks has enabled the development of state-of-the-art methods based on these architectures, which have made it possible to fill the gap between theoretical foundations and practical applications. Through a graph-based approach, our work tackled several Document Understanding tasks, trying to meet some of the limitations we found in this context, contributing with novel frameworks, data collections and augmentation techniques. The aim of this dissertation has been an attempt to connect our publications under one consistent narrative to support our hypotheses and, in particular, to connect graphs and documents under a common intersecting definition we referred as graph document representation. Starting from a general overview of how documents met graph theory, we delve into more specific details about the implementations of our research questions, both for structured objects, such as tables, and whole document pages.

Connecting the DOCS: a graph-based approach to document understanding / Andrea Gemelli. - (2024).

Connecting the DOCS: a graph-based approach to document understanding

Andrea Gemelli
2024

Abstract

Graphs are a natural representation of the patterns we glimpse in the world as we perceive it. The data as we receive it from nature is not only a set of objects but also a group of informative interactions within them. Thanks to this remarkable expressiveness, graphs have achieved ubiquitous prominence beyond mathematics and have permeated various scientific domains, including Document Understanding. Documents have a precise and rather complex structure: the objects found within them can take on different meanings depending on their positioning and/or their mutual relationships. These reasons elected graphs as an adequate framework for leveraging structural information from documents, due to their inherent representational power to codify the object components (or semantic entities) and their pairwise relationships. The recent success of Geometric Deep Learning as well as Graph Neural Networks has enabled the development of state-of-the-art methods based on these architectures, which have made it possible to fill the gap between theoretical foundations and practical applications. Through a graph-based approach, our work tackled several Document Understanding tasks, trying to meet some of the limitations we found in this context, contributing with novel frameworks, data collections and augmentation techniques. The aim of this dissertation has been an attempt to connect our publications under one consistent narrative to support our hypotheses and, in particular, to connect graphs and documents under a common intersecting definition we referred as graph document representation. Starting from a general overview of how documents met graph theory, we delve into more specific details about the implementations of our research questions, both for structured objects, such as tables, and whole document pages.
2024
Simone Marinai
ITALIA
Andrea Gemelli
File in questo prodotto:
File Dimensione Formato  
PhD_Thesis_Gemelli_Corretta.pdf

accesso aperto

Tipologia: Pdf editoriale (Version of record)
Licenza: Creative commons
Dimensione 26.17 MB
Formato Adobe PDF
26.17 MB Adobe PDF

I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificatore per citare o creare un link a questa risorsa: https://hdl.handle.net/2158/1353891
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact