Document image understanding involves several tasks including, among others, the layout analysis of historical handwritten and the symbol recognition in graphical documents. The understanding of document images implies two processes, the analysis, and the recognition, which are complex tasks. Moreover, each application domain has a specific information structure which increases the complexity of the understanding process. In the last years, many machine learning approaches have been presented to address document image understanding. In this research, we present a series of deep learning methods to address different application domains: historical handwritten and graphical documents understanding. We show the difficulties encountered when applying these techniques and the proposed solutions for each application domain. We cope with the problem of working with supervised deep networks that require to have a large dataset for training. We address the over-fitting related to the scarcity of labeled data showing several solutions to prevent this issue in these application domains. First, we show our contributions to historical handwritten layout analysis. We propose a toolkit to generate structured synthetic documents emulating the actual document production process. Synthetic documents can be used to train systems to perform layout analysis. Then, we study the use of deep networks for counting the number of records in each page of a historical handwritten document. Furthermore, we present a novel approach for the extraction of text lines in handwritten documents using another deep network to label document image patches as text lines or separators. Related to the page segmentation, we propose a fully convolutional network trained by a domain-specific loss for classifying pixels to segment semantic regions on handwritten pages. Second, we propose a novel interactive annotation system to help users to label symbols at the pixel level for the graphical symbol understanding problem. Using the proposed interactive system we can improve the annotation results and reduce the time-consuming process of labeling data. Using this system, we built a novel floor plan image dataset for object detection. We show preliminary results by using state-of-the-art deep networks to detect symbols on this dataset. In the end, we provide an extensive discussion for each task addressed showing the obtained results and proposing future works.

Deep Learning Methods for Document Image Understanding / Samuele Capobianco. - (2020).

Deep Learning Methods for Document Image Understanding

Samuele Capobianco
2020

Abstract

Document image understanding involves several tasks including, among others, the layout analysis of historical handwritten and the symbol recognition in graphical documents. The understanding of document images implies two processes, the analysis, and the recognition, which are complex tasks. Moreover, each application domain has a specific information structure which increases the complexity of the understanding process. In the last years, many machine learning approaches have been presented to address document image understanding. In this research, we present a series of deep learning methods to address different application domains: historical handwritten and graphical documents understanding. We show the difficulties encountered when applying these techniques and the proposed solutions for each application domain. We cope with the problem of working with supervised deep networks that require to have a large dataset for training. We address the over-fitting related to the scarcity of labeled data showing several solutions to prevent this issue in these application domains. First, we show our contributions to historical handwritten layout analysis. We propose a toolkit to generate structured synthetic documents emulating the actual document production process. Synthetic documents can be used to train systems to perform layout analysis. Then, we study the use of deep networks for counting the number of records in each page of a historical handwritten document. Furthermore, we present a novel approach for the extraction of text lines in handwritten documents using another deep network to label document image patches as text lines or separators. Related to the page segmentation, we propose a fully convolutional network trained by a domain-specific loss for classifying pixels to segment semantic regions on handwritten pages. Second, we propose a novel interactive annotation system to help users to label symbols at the pixel level for the graphical symbol understanding problem. Using the proposed interactive system we can improve the annotation results and reduce the time-consuming process of labeling data. Using this system, we built a novel floor plan image dataset for object detection. We show preliminary results by using state-of-the-art deep networks to detect symbols on this dataset. In the end, we provide an extensive discussion for each task addressed showing the obtained results and proposing future works.
2020
Simone Marinai, Kimbal Marriott
ITALIA
Samuele Capobianco
File in questo prodotto:
File Dimensione Formato  
thesis.pdf

Open Access dal 01/01/2022

Descrizione: tesi
Tipologia: Tesi di dottorato
Licenza: Creative commons
Dimensione 30.01 MB
Formato Adobe PDF
30.01 MB Adobe PDF

I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificatore per citare o creare un link a questa risorsa: https://hdl.handle.net/2158/1182536
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact