In order to enhance the performance of topic modeling algorithms and determine the optimal number of topics, this thesis will present two different approaches: Stop N-gram Removal, a novel preprocessing procedure based on the elimination of a dynamic number of repeated words in text documents and Topic-Similarity, a new way to determine the optimal number of topics which automatically measures the similarity between the meanings of the words in each topic.
Stop N-gram Removal and Topic-Similarity to Improve Topic Modeling / Mohamad Almgerbi. - (2022).
Stop N-gram Removal and Topic-Similarity to Improve Topic Modeling
Mohamad Almgerbi
2022
Abstract
In order to enhance the performance of topic modeling algorithms and determine the optimal number of topics, this thesis will present two different approaches: Stop N-gram Removal, a novel preprocessing procedure based on the elimination of a dynamic number of repeated words in text documents and Topic-Similarity, a new way to determine the optimal number of topics which automatically measures the similarity between the meanings of the words in each topic.File in questo prodotto:
File | Dimensione | Formato | |
---|---|---|---|
VERSIONE FINALE DELLA TESI.pdf
Accesso chiuso
Tipologia:
Tesi di dottorato
Licenza:
Open Access
Dimensione
1.63 MB
Formato
Adobe PDF
|
1.63 MB | Adobe PDF | Richiedi una copia |
I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.