In this paper we compare three clustering methods used to perform word image indexing. The three methods are: the Self-Organizing Map (SOM), the Growing Hierarchical Self-Organizing Map (GHSOM), and the Spectral Clustering. We test these methods on a real data set composed of word images extracted from an encyclopedia of the XIX-th Century. The word images are grouped on the basis of the clustering methods and subsequently retrieved identifying the closest clusters to a query word. The accuracy of the methods is compared evaluating the performance of the word retrieval algorithm. From the experimental results we conclude that methods designed to automatically determine the number and the structure of clusters, such as GHSOM, are particularly suitable in the context represented by our data set.

A Comparison of Clustering Methods for Word Image Indexing / S. Marinai; E. Marino; G. Soda. - STAMPA. - (2008), pp. 671-676. (Intervento presentato al convegno DAS '08. Eighth IAPR International Workshop on Document Analysis Systems tenutosi a Nara (Japan) nel 16-19 Sept. 2008) [10.1109/DAS.2008.85].

A Comparison of Clustering Methods for Word Image Indexing

MARINAI, SIMONE;SODA, GIOVANNI
2008

Abstract

In this paper we compare three clustering methods used to perform word image indexing. The three methods are: the Self-Organizing Map (SOM), the Growing Hierarchical Self-Organizing Map (GHSOM), and the Spectral Clustering. We test these methods on a real data set composed of word images extracted from an encyclopedia of the XIX-th Century. The word images are grouped on the basis of the clustering methods and subsequently retrieved identifying the closest clusters to a query word. The accuracy of the methods is compared evaluating the performance of the word retrieval algorithm. From the experimental results we conclude that methods designed to automatically determine the number and the structure of clusters, such as GHSOM, are particularly suitable in the context represented by our data set.
2008
The Eighth IAPR International Workshop on Document Analysis Systems
DAS '08. Eighth IAPR International Workshop on Document Analysis Systems
Nara (Japan)
16-19 Sept. 2008
S. Marinai; E. Marino; G. Soda
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificatore per citare o creare un link a questa risorsa: https://hdl.handle.net/2158/351128
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? 2
social impact