In order to enhance the performance of topic modeling algorithms and determine the optimal number of topics, this thesis will present two different approaches: Stop N-gram Removal, a novel preprocessing procedure based on the elimination of a dynamic number of repeated words in text documents and Topic-Similarity, a new way to determine the optimal number of topics which automatically measures the similarity between the meanings of the words in each topic.

Stop N-gram Removal and Topic-Similarity to Improve Topic Modeling / Mohamad Almgerbi. - (2022).

Stop N-gram Removal and Topic-Similarity to Improve Topic Modeling

Mohamad Almgerbi
2022

Abstract

In order to enhance the performance of topic modeling algorithms and determine the optimal number of topics, this thesis will present two different approaches: Stop N-gram Removal, a novel preprocessing procedure based on the elimination of a dynamic number of repeated words in text documents and Topic-Similarity, a new way to determine the optimal number of topics which automatically measures the similarity between the meanings of the words in each topic.
Prof. Valentina Poggioni
LIBIA
Mohamad Almgerbi
File in questo prodotto:
File Dimensione Formato  
VERSIONE FINALE DELLA TESI.pdf

Accesso chiuso

Tipologia: Tesi di dottorato
Licenza: Open Access
Dimensione 1.63 MB
Formato Adobe PDF
1.63 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/2158/1272669
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact