We describe a system for the automatic transcription of books with concordances. Even if the recognition of printed text with OCR tools is nearly solved for high quality documents, the recognition of structured text, where dictionaries and other linguistic tools can be of little help, is still a difficult task. In this work, we propose to use several techniques for correcting the imperfect text recognized by the OCR software by taking into account both physical features of the documents and the redundancy of information implicit in concordances.
Recognition of Concordances for Indexing in Digital Libraries / Marinai, Simone; Capobianco, Samuele; Ziran, Zahra; Giuntini, Andrea; Mansueto, Pierluigi. - ELETTRONICO. - 1177:(2020), pp. 135-147. (Intervento presentato al convegno Italian Research Conference on Digital Libraries) [10.1007/978-3-030-39905-4_14].
Recognition of Concordances for Indexing in Digital Libraries
Marinai, Simone;Capobianco, Samuele;Ziran, Zahra;Mansueto, Pierluigi
2020
Abstract
We describe a system for the automatic transcription of books with concordances. Even if the recognition of printed text with OCR tools is nearly solved for high quality documents, the recognition of structured text, where dictionaries and other linguistic tools can be of little help, is still a difficult task. In this work, we propose to use several techniques for correcting the imperfect text recognized by the OCR software by taking into account both physical features of the documents and the redundancy of information implicit in concordances.I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.