The constantly growing number of websites, web pages, documents and, textual (Big) Data populating the Internet currently represents a massive resource of information and knowledge for various interests and across many different domains. However, the big amount and the complexity of unstructured, natural language textual data implies several issues and difficulties for end users to find a specific, desired pieces of information. In the era of maximum uptake of social networks and media, automatic extraction and retrieval of geographic information is becoming a field of large interest. In this paper, the GeLo system for extracting addresses and geographical coordinates of companies and organizations from their web domains is presented. The information extraction process relies on NLP techniques, specifically Part-Of-Speech-tagging, pattern recognition and annotation. The overall system performances have been manually evaluated against a consistent subset of the extracted URLs database.

Ge(o)Lo(cator): Geographic Information Extraction from Unstructured Text Data and Web Documents / P. Nesi;G. Pantaleo;M. Tenti. - STAMPA. - (2014), pp. 60-65. (Intervento presentato al convegno 9th International Workshop on Semantic and Social Media Adaptation and Personalization nel 2014-Nov) [10.1109/SMAP.2014.27].

Ge(o)Lo(cator): Geographic Information Extraction from Unstructured Text Data and Web Documents

NESI, PAOLO;PANTALEO, GIANNI;
2014

Abstract

The constantly growing number of websites, web pages, documents and, textual (Big) Data populating the Internet currently represents a massive resource of information and knowledge for various interests and across many different domains. However, the big amount and the complexity of unstructured, natural language textual data implies several issues and difficulties for end users to find a specific, desired pieces of information. In the era of maximum uptake of social networks and media, automatic extraction and retrieval of geographic information is becoming a field of large interest. In this paper, the GeLo system for extracting addresses and geographical coordinates of companies and organizations from their web domains is presented. The information extraction process relies on NLP techniques, specifically Part-Of-Speech-tagging, pattern recognition and annotation. The overall system performances have been manually evaluated against a consistent subset of the extracted URLs database.
2014
Semantic and Social Media Adaptation and Personalization (SMAP), 2014 9th International Workshop on
9th International Workshop on Semantic and Social Media Adaptation and Personalization
2014-Nov
P. Nesi;G. Pantaleo;M. Tenti
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificatore per citare o creare un link a questa risorsa: https://hdl.handle.net/2158/956908
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 13
  • ???jsp.display-item.citation.isi??? 11
social impact