This study investigates biases present in large language models (LLMs) when utilized for narrative tasks, specifically in game story generation and story ending classification. Our experiment involves using popular LLMs, including GPT-3.5, GPT-4, and Llama 2, to generate game stories and classify their endings into three categories: positive, negative, and neutral. The results of our analysis reveal a notable bias towards positive-ending stories in the LLMs under examination. Moreover, we observe that GPT-4 and Llama 2 tend to classify stories into uninstructed categories, underscoring the critical importance of thoughtfully designing downstream systems that employ LLM-generated outputs. These findings provide a groundwork for the development of systems that incorporate LLMs in game story generation and classification. They also emphasize the necessity of being vigilant in addressing biases and improving system performance. By acknowledging and rectifying these biases, we can create more fair and accurate applications of LLMs in various narrative-based tasks.
What Is Waiting for Us at the End? Inherent Biases of Game Story Endings in Large Language Models / Taveekitworachai P.; Abdullah F.; Gursesli M.C.; Dewantoro M.F.; Chen S.; Antonio Lanata; Guazzini A.; Thawonmas R.. - ELETTRONICO. - 14384 LNCS:(2023), pp. 274-284. (Intervento presentato al convegno 16th International Conference on Interactive Digital Storytelling) [10.1007/978-3-031-47658-7_26].
What Is Waiting for Us at the End? Inherent Biases of Game Story Endings in Large Language Models
Gursesli M. C.;Antonio Lanata;Guazzini A.;
2023
Abstract
This study investigates biases present in large language models (LLMs) when utilized for narrative tasks, specifically in game story generation and story ending classification. Our experiment involves using popular LLMs, including GPT-3.5, GPT-4, and Llama 2, to generate game stories and classify their endings into three categories: positive, negative, and neutral. The results of our analysis reveal a notable bias towards positive-ending stories in the LLMs under examination. Moreover, we observe that GPT-4 and Llama 2 tend to classify stories into uninstructed categories, underscoring the critical importance of thoughtfully designing downstream systems that employ LLM-generated outputs. These findings provide a groundwork for the development of systems that incorporate LLMs in game story generation and classification. They also emphasize the necessity of being vigilant in addressing biases and improving system performance. By acknowledging and rectifying these biases, we can create more fair and accurate applications of LLMs in various narrative-based tasks.| File | Dimensione | Formato | |
|---|---|---|---|
|
978-3-031-47658-7_26.pdf
Accesso chiuso
Licenza:
Open Access
Dimensione
2.02 MB
Formato
Adobe PDF
|
2.02 MB | Adobe PDF | Richiedi una copia |
I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.



