Abstract: In this chapter we describe several approaches that have been proposed to use learning algorithm to analyze the layout of digitized documents. Layout analysis encompasses all the techniques that are used to infer the organization of the page layout of document images. From a physical point of view the layout can be described as composed by blocks, in most cases rectangular, that are arranged in the page and contain homogeneous content, such as text, vectorial graphics, or illustrations. From a logical point of view text blocks can have a different meaning on the basis of their content and their position in the page. For instance, in the case of technical papers blocks can correspond to the title, author, or abstract of the paper. The learning algorithms adopted in this domain are often related to supervised classifiers that are used at various processing levels to label the objects in the document image according to physical or logical categories. The classification can be performed for individual pixels, for regions, or even for whole pages. The different approaches adopted for using supervised classifiers in layout analysis are analyzed in this chapter.

Learning Algorithms for Document Layout Analysis / Simone Marinai. - STAMPA. - (2013), pp. 400-419. [10.1016/B978-0-444-53859-8.00016-3]

Learning Algorithms for Document Layout Analysis

MARINAI, SIMONE
2013

Abstract

Abstract: In this chapter we describe several approaches that have been proposed to use learning algorithm to analyze the layout of digitized documents. Layout analysis encompasses all the techniques that are used to infer the organization of the page layout of document images. From a physical point of view the layout can be described as composed by blocks, in most cases rectangular, that are arranged in the page and contain homogeneous content, such as text, vectorial graphics, or illustrations. From a logical point of view text blocks can have a different meaning on the basis of their content and their position in the page. For instance, in the case of technical papers blocks can correspond to the title, author, or abstract of the paper. The learning algorithms adopted in this domain are often related to supervised classifiers that are used at various processing levels to label the objects in the document image according to physical or logical categories. The classification can be performed for individual pixels, for regions, or even for whole pages. The different approaches adopted for using supervised classifiers in layout analysis are analyzed in this chapter.
2013
9780444538598
Handbook of Statistics - Machine Learning: Theory and Applications
400
419
Simone Marinai
File in questo prodotto:
File Dimensione Formato  
16136_10016.pdf

Accesso chiuso

Tipologia: Pdf editoriale (Version of record)
Licenza: Tutti i diritti riservati
Dimensione 701.41 kB
Formato Adobe PDF
701.41 kB Adobe PDF   Richiedi una copia

I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificatore per citare o creare un link a questa risorsa: https://hdl.handle.net/2158/880325
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 7
  • ???jsp.display-item.citation.isi??? 5
social impact