Search results clustering (SRC) is a challenging algorithmic problem that requires grouping together the results returned by one or more search engines in topically coherent clusters, and labeling the clusters with meaningful phrases describing the topics of the results included in them. In this paper we propose to solve SRC via an innovative approach that consists of modeling the problem as the labeled clustering of the nodes of a newly introduced graph of topics. The topics are Wikipedia-pages identified by means of recently proposed topic annotators [9, 11, 16, 20] applied to the search results, and the edges denote the relatedness among these topics computed by taking into account the linkage of the Wikipedia-graph. We tackle this problem by designing a novel algorithm that exploits the spectral properties and the labels of that graph of topics. We show the superiority of our approach with respect to academic state-of-the-art work [6] and well-known commercial systems (CLUSTY and LINGO3G) by performing an extensive set of experiments on standard datasets and user studies via Amazon Mechanical Turk. We test several standard measures for evaluating the performance of all systems and show a relative improvement of up to 20%.

Topical clustering of search results / U. Scaiella; P. Ferragina; A. Marino; M. Ciaramita. - ELETTRONICO. - (2012), pp. 223-232. (Intervento presentato al convegno Fifth ACM International Conference on Web Search Data Mining, WSDM 2012 tenutosi a Seattle, Washington nel February 8-12, 2012) [10.1145/2124295.2124324].

Topical clustering of search results

MARINO, ANDREA;
2012

Abstract

Search results clustering (SRC) is a challenging algorithmic problem that requires grouping together the results returned by one or more search engines in topically coherent clusters, and labeling the clusters with meaningful phrases describing the topics of the results included in them. In this paper we propose to solve SRC via an innovative approach that consists of modeling the problem as the labeled clustering of the nodes of a newly introduced graph of topics. The topics are Wikipedia-pages identified by means of recently proposed topic annotators [9, 11, 16, 20] applied to the search results, and the edges denote the relatedness among these topics computed by taking into account the linkage of the Wikipedia-graph. We tackle this problem by designing a novel algorithm that exploits the spectral properties and the labels of that graph of topics. We show the superiority of our approach with respect to academic state-of-the-art work [6] and well-known commercial systems (CLUSTY and LINGO3G) by performing an extensive set of experiments on standard datasets and user studies via Amazon Mechanical Turk. We test several standard measures for evaluating the performance of all systems and show a relative improvement of up to 20%.
2012
Proceedings of the Fifth International Conference on Web Search and Web Data Mining, WSDM 2012
Fifth ACM International Conference on Web Search Data Mining, WSDM 2012
Seattle, Washington
February 8-12, 2012
U. Scaiella; P. Ferragina; A. Marino; M. Ciaramita
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificatore per citare o creare un link a questa risorsa: https://hdl.handle.net/2158/558889
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 89
  • ???jsp.display-item.citation.isi??? ND
social impact