Abstract - In this paper we analyze our recent research on the use of document analysis techniques for metadata extraction from PDF papers. We describe a package that is designed to extract basic metadata from these documents. The package is used in combination with a digital library software suite to easily build personal digital libraries. The proposed software is based on a suitable combination of several techniques that include PDF parsing, low level document image processing, and layout analysis. In addition, we use the information gathered from a widely known citation database (DBLP) to assist the tool in the difficult task of author identification. The system is tested on some paper collections selected from recent conference proceedings.

Metadata Extraction from PDF Papers for Digital Library Ingest / S. Marinai. - STAMPA. - IEEE Computer Society:(2009), pp. 251-255. (Intervento presentato al convegno 10th International Conference on Document Analysis and Recognition, ICDAR 2009 tenutosi a Barcelona (Spain) nel 26-29 July, 2009) [10.1109/ICDAR.2009.232].

Metadata Extraction from PDF Papers for Digital Library Ingest

MARINAI, SIMONE
2009

Abstract

Abstract - In this paper we analyze our recent research on the use of document analysis techniques for metadata extraction from PDF papers. We describe a package that is designed to extract basic metadata from these documents. The package is used in combination with a digital library software suite to easily build personal digital libraries. The proposed software is based on a suitable combination of several techniques that include PDF parsing, low level document image processing, and layout analysis. In addition, we use the information gathered from a widely known citation database (DBLP) to assist the tool in the difficult task of author identification. The system is tested on some paper collections selected from recent conference proceedings.
2009
10th International Conference on Document Analysis and Recognition, ICDAR 2009
Barcelona (Spain)
26-29 July, 2009
S. Marinai
File in questo prodotto:
File Dimensione Formato  
ICDAR09.pdf

Accesso chiuso

Tipologia: Versione finale referata (Postprint, Accepted manuscript)
Licenza: Tutti i diritti riservati
Dimensione 150.99 kB
Formato Adobe PDF
150.99 kB Adobe PDF   Richiedi una copia

I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificatore per citare o creare un link a questa risorsa: https://hdl.handle.net/2158/373559
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 33
  • ???jsp.display-item.citation.isi??? ND
social impact