We describe a system for the retrieval on the basis of layout similarity of document images belonging to collections stored in digital libraries. Layout regions are extracted and represented with the XY tree. The proposed indexing method combines a new tree clustering algorithm (based on self organizing maps) with principal component analysis. The combination of these techniques allows us to retrieve the most similar pages from large collections without the need for a direct comparison of the query page with each indexed document
Tree clustering for layout-based document image retrieval / S. Marinai; E. Marino; G. Soda. - STAMPA. - (2006), pp. 243-251. (Intervento presentato al convegno Document Image Analysis for Libraries tenutosi a Lyon (France) nel April 2006) [10.1109/DIAL.2006.44].
Tree clustering for layout-based document image retrieval
MARINAI, SIMONE;SODA, GIOVANNI
2006
Abstract
We describe a system for the retrieval on the basis of layout similarity of document images belonging to collections stored in digital libraries. Layout regions are extracted and represented with the XY tree. The proposed indexing method combines a new tree clustering algorithm (based on self organizing maps) with principal component analysis. The combination of these techniques allows us to retrieve the most similar pages from large collections without the need for a direct comparison of the query page with each indexed documentI documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.