Log files are essential for monitoring, diagnosing, and troubleshooting smart systems, capturing critical operational events. However, privacy concerns and limited access to real-world log datasets hinder the development and evaluation of anomaly detection techniques. Synthetic log generation has emerged as a viable solution to this challenge, enabling researchers to create diverse datasets that include both normal and fault-related logs. In this paper, we introduce novel methodologies for generating synthetic log files using Generative Adversarial Networks and Large Language Models. First, we propose a GAN-based approach, leveraging different GAN implementations to produce high-quality synthetic logs. A comprehensive evaluation reveals that CTGAN outperforms other models in generating realistic and varied log entries. Building on these findings, we present LogGenST, an innovative synthetic log generation framework that employs three LLMs in an adversarial setup. Unlike traditional GAN-based methods, LogGenST features a unique Prompt Engineer LLM that refines prompts based on feedback from generator and discriminator LLMs. This approach ensures temporal consistency, logical coherence, and domain-specific patterns without requiring extensive model training. Comparative analysis shows that LogGenST significantly enhances log authenticity, pattern consistency, and fault representation, supporting advanced smart-troubleshooting experimentation in industrial cyber-physical systems and the Internet of Things.

LogGenST: A Framework for Synthetic Log Generation Using LLMs for Smart-Troubleshooting / Partovian S., Flammini F., Bucaioni A., Thornadtsson J.. - ELETTRONICO. - 542:(2025), pp. 64-83. [10.1007/978-3-031-84913-8_3]

LogGenST: A Framework for Synthetic Log Generation Using LLMs for Smart-Troubleshooting

Flammini F.;
2025

Abstract

Log files are essential for monitoring, diagnosing, and troubleshooting smart systems, capturing critical operational events. However, privacy concerns and limited access to real-world log datasets hinder the development and evaluation of anomaly detection techniques. Synthetic log generation has emerged as a viable solution to this challenge, enabling researchers to create diverse datasets that include both normal and fault-related logs. In this paper, we introduce novel methodologies for generating synthetic log files using Generative Adversarial Networks and Large Language Models. First, we propose a GAN-based approach, leveraging different GAN implementations to produce high-quality synthetic logs. A comprehensive evaluation reveals that CTGAN outperforms other models in generating realistic and varied log entries. Building on these findings, we present LogGenST, an innovative synthetic log generation framework that employs three LLMs in an adversarial setup. Unlike traditional GAN-based methods, LogGenST features a unique Prompt Engineer LLM that refines prompts based on feedback from generator and discriminator LLMs. This approach ensures temporal consistency, logical coherence, and domain-specific patterns without requiring extensive model training. Comparative analysis shows that LogGenST significantly enhances log authenticity, pattern consistency, and fault representation, supporting advanced smart-troubleshooting experimentation in industrial cyber-physical systems and the Internet of Things.
2025
Lecture Notes in Business Information Processing
64
83
Partovian S.; Flammini F.; Bucaioni A.; Thornadtsson J.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificatore per citare o creare un link a questa risorsa: https://hdl.handle.net/2158/1453449
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 0
social impact