Deep Learning Methods for Document Image Understanding

Capobianco, Samuele

Document image understanding involves several tasks including, among others, the layout analysis of historical handwritten and the symbol recognition in graphical documents. The understanding of document images implies two processes, the analysis, and the recognition, which are complex tasks. Moreover, each application domain has a specific information structure which increases the complexity of the understanding process. In the last years, many machine learning approaches have been presented to address document image understanding. In this research, we present a series of deep learning methods to address different application domains: historical handwritten and graphical documents understanding. We show the difficulties encountered when applying these techniques and the proposed solutions for each application domain. We cope with the problem of working with supervised deep networks that require to have a large dataset for training. We address the over-fitting related to the scarcity of labeled data showing several solutions to prevent this issue in these application domains. First, we show our contributions to historical handwritten layout analysis. We propose a toolkit to generate structured synthetic documents emulating the actual document production process. Synthetic documents can be used to train systems to perform layout analysis. Then, we study the use of deep networks for counting the number of records in each page of a historical handwritten document. Furthermore, we present a novel approach for the extraction of text lines in handwritten documents using another deep network to label document image patches as text lines or separators. Related to the page segmentation, we propose a fully convolutional network trained by a domain-specific loss for classifying pixels to segment semantic regions on handwritten pages. Second, we propose a novel interactive annotation system to help users to label symbols at the pixel level for the graphical symbol understanding problem. Using the proposed interactive system we can improve the annotation results and reduce the time-consuming process of labeling data. Using this system, we built a novel floor plan image dataset for object detection. We show preliminary results by using state-of-the-art deep networks to detect symbols on this dataset. In the end, we provide an extensive discussion for each task addressed showing the obtained results and proposing future works.

Deep Learning Methods for Document Image Understanding

Samuele Capobianco

2020

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

Citazioni

social impact

Deep Learning Methods for Document Image Understanding

Samuele Capobianco

2020

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Informazioni

Citazioni

social impact

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)