We approach the general problem of classifying machine-printed documents into genres. Layout is a critical factor in recognizing fine-grained genres, as document content features are similar. Document genre is determined from the layout structure detected from scanned binary images of the document pages, using no OCR results and minimal a priori knowledge of document logical structures. Our method uses the attributed relational graphs (ARGs) to represent the layout structure of document instances, and the first order random graphs (FORGs) to represent document genres. In this paper we develop our FORG-based genre classification method and present a comparative evaluation between our technique and a variety of statistical pattern classifiers. FORGs are capable of modeling common layout structure within a document genre and are shown to significantly outperform traditional pattern classification techniques when fine-grained genre distinctions must be drawn

Fine-grained document genre classification using first order random graphs / Bagdanov, Andrew D.; Worring, Marcel. - STAMPA. - 2001-:(2001), pp. 79-83. (Intervento presentato al convegno 6th International Conference on Document Analysis and Recognition, ICDAR 2001 tenutosi a usa nel 2001) [10.1109/ICDAR.2001.953759].

Fine-grained document genre classification using first order random graphs

BAGDANOV, ANDREW DAVID;
2001

Abstract

We approach the general problem of classifying machine-printed documents into genres. Layout is a critical factor in recognizing fine-grained genres, as document content features are similar. Document genre is determined from the layout structure detected from scanned binary images of the document pages, using no OCR results and minimal a priori knowledge of document logical structures. Our method uses the attributed relational graphs (ARGs) to represent the layout structure of document instances, and the first order random graphs (FORGs) to represent document genres. In this paper we develop our FORG-based genre classification method and present a comparative evaluation between our technique and a variety of statistical pattern classifiers. FORGs are capable of modeling common layout structure within a document genre and are shown to significantly outperform traditional pattern classification techniques when fine-grained genre distinctions must be drawn
2001
Proceedings of the International Conference on Document Analysis and Recognition, ICDAR
6th International Conference on Document Analysis and Recognition, ICDAR 2001
usa
2001
Bagdanov, Andrew D.; Worring, Marcel
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificatore per citare o creare un link a questa risorsa: https://hdl.handle.net/2158/1020582
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 40
  • ???jsp.display-item.citation.isi??? 29
social impact