Abstract - In this paper we analyze our recent research on the use of document analysis techniques for metadata extraction from PDF papers. We describe a package that is designed to extract basic metadata from these documents. The package is used in combination with a digital library software suite to easily build personal digital libraries. The proposed software is based on a suitable combination of several techniques that include PDF parsing, low level document image processing, and layout analysis. In addition, we use the information gathered from a widely known citation database (DBLP) to assist the tool in the difficult task of author identification. The system is tested on some paper collections selected from recent conference proceedings.
Metadata Extraction from PDF Papers for Digital Library Ingest / S. Marinai. - STAMPA. - (2009), pp. 251-255. (Intervento presentato al convegno 10th International Conference on Document Analysis and Recognition, ICDAR 2009 tenutosi a Barcelona (Spain) nel 26-29 July, 2009) [10.1109/ICDAR.2009.232].
Metadata Extraction from PDF Papers for Digital Library Ingest
MARINAI, SIMONE
2009
Abstract
Abstract - In this paper we analyze our recent research on the use of document analysis techniques for metadata extraction from PDF papers. We describe a package that is designed to extract basic metadata from these documents. The package is used in combination with a digital library software suite to easily build personal digital libraries. The proposed software is based on a suitable combination of several techniques that include PDF parsing, low level document image processing, and layout analysis. In addition, we use the information gathered from a widely known citation database (DBLP) to assist the tool in the difficult task of author identification. The system is tested on some paper collections selected from recent conference proceedings.File | Dimensione | Formato | |
---|---|---|---|
ICDAR09.pdf
Accesso chiuso
Tipologia:
Versione finale referata (Postprint, Accepted manuscript)
Licenza:
Tutti i diritti riservati
Dimensione
150.99 kB
Formato
Adobe PDF
|
150.99 kB | Adobe PDF | Richiedi una copia |
I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.