In this paper we compare three clustering methods used to perform word image indexing. The three methods are: the Self-Organizing Map (SOM), the Growing Hierarchical Self-Organizing Map (GHSOM), and the Spectral Clustering. We test these methods on a real data set composed of word images extracted from an encyclopedia of the XIX-th Century. The word images are grouped on the basis of the clustering methods and subsequently retrieved identifying the closest clusters to a query word. The accuracy of the methods is compared evaluating the performance of the word retrieval algorithm. From the experimental results we conclude that methods designed to automatically determine the number and the structure of clusters, such as GHSOM, are particularly suitable in the context represented by our data set.
A Comparison of Clustering Methods for Word Image Indexing / S. Marinai; E. Marino; G. Soda. - STAMPA. - (2008), pp. 671-676. (Intervento presentato al convegno DAS '08. Eighth IAPR International Workshop on Document Analysis Systems tenutosi a Nara (Japan) nel 16-19 Sept. 2008) [10.1109/DAS.2008.85].
A Comparison of Clustering Methods for Word Image Indexing
MARINAI, SIMONE;SODA, GIOVANNI
2008
Abstract
In this paper we compare three clustering methods used to perform word image indexing. The three methods are: the Self-Organizing Map (SOM), the Growing Hierarchical Self-Organizing Map (GHSOM), and the Spectral Clustering. We test these methods on a real data set composed of word images extracted from an encyclopedia of the XIX-th Century. The word images are grouped on the basis of the clustering methods and subsequently retrieved identifying the closest clusters to a query word. The accuracy of the methods is compared evaluating the performance of the word retrieval algorithm. From the experimental results we conclude that methods designed to automatically determine the number and the structure of clusters, such as GHSOM, are particularly suitable in the context represented by our data set.I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.