Abstract: In this chapter we describe several approaches that have been proposed to use learning algorithm to analyze the layout of digitized documents. Layout analysis encompasses all the techniques that are used to infer the organization of the page layout of document images. From a physical point of view the layout can be described as composed by blocks, in most cases rectangular, that are arranged in the page and contain homogeneous content, such as text, vectorial graphics, or illustrations. From a logical point of view text blocks can have a different meaning on the basis of their content and their position in the page. For instance, in the case of technical papers blocks can correspond to the title, author, or abstract of the paper. The learning algorithms adopted in this domain are often related to supervised classifiers that are used at various processing levels to label the objects in the document image according to physical or logical categories. The classification can be performed for individual pixels, for regions, or even for whole pages. The different approaches adopted for using supervised classifiers in layout analysis are analyzed in this chapter.
Learning Algorithms for Document Layout Analysis / Simone Marinai. - STAMPA. - (2013), pp. 400-419. [10.1016/B978-0-444-53859-8.00016-3]
Learning Algorithms for Document Layout Analysis
MARINAI, SIMONE
2013
Abstract
Abstract: In this chapter we describe several approaches that have been proposed to use learning algorithm to analyze the layout of digitized documents. Layout analysis encompasses all the techniques that are used to infer the organization of the page layout of document images. From a physical point of view the layout can be described as composed by blocks, in most cases rectangular, that are arranged in the page and contain homogeneous content, such as text, vectorial graphics, or illustrations. From a logical point of view text blocks can have a different meaning on the basis of their content and their position in the page. For instance, in the case of technical papers blocks can correspond to the title, author, or abstract of the paper. The learning algorithms adopted in this domain are often related to supervised classifiers that are used at various processing levels to label the objects in the document image according to physical or logical categories. The classification can be performed for individual pixels, for regions, or even for whole pages. The different approaches adopted for using supervised classifiers in layout analysis are analyzed in this chapter.File | Dimensione | Formato | |
---|---|---|---|
16136_10016.pdf
Accesso chiuso
Tipologia:
Pdf editoriale (Version of record)
Licenza:
Tutti i diritti riservati
Dimensione
701.41 kB
Formato
Adobe PDF
|
701.41 kB | Adobe PDF | Richiedi una copia |
I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.