In the context of textual analysis, network-based procedures for topic detection are gaining attention, also as an alternative to classical topic models. These procedures are based on the idea that documents can be represented as word co-occurrence networks, where topics are defined as groups of strongly connected words. Although many works have used network-based procedures for topic detection, there is a lack of systematic analysis of how dierent design choices, such as building the word co-occurrence matrix and selecting the community detection algorithm, aect the final results in terms of detected topics. Another unexplored question about network-based topic detection concerns its relationship with classical topic models, such as the Latent Dirichlet Allocation (LDA) model. Therefore, this thesis aims to address these questions by developing a deeper understanding of optimal design choices for network-based procedures for topic detection, showing how and to what extent the choices made during the design phase aect the results, and contextually comparing these procedures with classical topic models.

Extracting knowledge from text news: A systematic evaluation of network-based topic detection / Carla Galluccio. - (2023).

Extracting knowledge from text news: A systematic evaluation of network-based topic detection

Carla Galluccio
2023

Abstract

In the context of textual analysis, network-based procedures for topic detection are gaining attention, also as an alternative to classical topic models. These procedures are based on the idea that documents can be represented as word co-occurrence networks, where topics are defined as groups of strongly connected words. Although many works have used network-based procedures for topic detection, there is a lack of systematic analysis of how dierent design choices, such as building the word co-occurrence matrix and selecting the community detection algorithm, aect the final results in terms of detected topics. Another unexplored question about network-based topic detection concerns its relationship with classical topic models, such as the Latent Dirichlet Allocation (LDA) model. Therefore, this thesis aims to address these questions by developing a deeper understanding of optimal design choices for network-based procedures for topic detection, showing how and to what extent the choices made during the design phase aect the results, and contextually comparing these procedures with classical topic models.
2023
Alessandra Petrucci
Carla Galluccio
File in questo prodotto:
File Dimensione Formato  
Tesi_Galluccio.pdf

accesso aperto

Tipologia: Pdf editoriale (Version of record)
Licenza: Open Access
Dimensione 13.01 MB
Formato Adobe PDF
13.01 MB Adobe PDF

I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificatore per citare o creare un link a questa risorsa: https://hdl.handle.net/2158/1319771
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact