In this paper, we describe a general approach for script (and language) recognition from printed documents and for writer identification in handwritten documents. The method is based on a bag of visual word strategy where the visual words correspond to characters and the clustering is obtained by means of Self Organizing Maps (SOM). Unknown pages (words in the case of script recognition) are classified comparing their vectorial representations with those of one training set using a cosine similarity. The comparison is improved using a similarity score that is obtained taking into account the SOM organization of cluster centroids. Promising results are presented for both printed documents and handwritten musical scores.

Bag of Characters and SOM Clustering for Script Recognition and Writer Identification / S. MARINAI; B. MIOTTI; G. SODA. - STAMPA. - (2010), pp. 2182-2185. (Intervento presentato al convegno International Conference on Pattern Recognition tenutosi a Istanbul (Turchia)) [10.1109/ICPR.2010.534].

Bag of Characters and SOM Clustering for Script Recognition and Writer Identification

MARINAI, SIMONE;MIOTTI, BEATRICE;SODA, GIOVANNI
2010

Abstract

In this paper, we describe a general approach for script (and language) recognition from printed documents and for writer identification in handwritten documents. The method is based on a bag of visual word strategy where the visual words correspond to characters and the clustering is obtained by means of Self Organizing Maps (SOM). Unknown pages (words in the case of script recognition) are classified comparing their vectorial representations with those of one training set using a cosine similarity. The comparison is improved using a similarity score that is obtained taking into account the SOM organization of cluster centroids. Promising results are presented for both printed documents and handwritten musical scores.
2010
20th International Conference on Pattern Recognition
International Conference on Pattern Recognition
Istanbul (Turchia)
S. MARINAI; B. MIOTTI; G. SODA
File in questo prodotto:
File Dimensione Formato  
ICPR10.pdf

Accesso chiuso

Tipologia: Versione finale referata (Postprint, Accepted manuscript)
Licenza: Tutti i diritti riservati
Dimensione 461.79 kB
Formato Adobe PDF
461.79 kB Adobe PDF   Richiedi una copia

I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificatore per citare o creare un link a questa risorsa: https://hdl.handle.net/2158/397146
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 11
  • ???jsp.display-item.citation.isi??? ND
social impact