In the latest years, deep learning has been massively used to face problems that have not been solved by means of classical approaches. In particular, an autoencoder is a popular unsupervised artificial neural network that learns efficient data representations (encoding) by training the network to ignore features with a small content of information. Even though autoencoders over-perform classical techniques in several applications like anomaly detection, dimensionality reduction, features denoising, and missing values imputation, the literature does not provide a commonly accepted methodology to define the optimal amount of data needed to train the model. This paper proposes a procedure to determine the optimal train-set size to minimize the reconstruction error of an autoencoder with pre-defined structure and hyper-parameters that will be trained to encode the normal behavior of energy generation systems. This procedure exploits the outcome of learning curves, a powerful tool to track algorithms performance while the train-set dimension varies. Afterward, the procedure is applied to three real case studies where two types of autoencoders are trained to learn the normal behavior of a YANMAR combined heat and power unit with the scope of detecting incoming anomalies. In the end, the outcomes of the procedure are explained and, under the constraint of a daily retraining frequency, 6 weeks are identified as the optimal train-set size for both autoencoders.

A methodology to determine the optimal train-set size for autoencoders applied to energy systems / Piero Danti, Alessandro Innocenti. - In: ADVANCED ENGINEERING INFORMATICS. - ISSN 1474-0346. - ELETTRONICO. - 58:(2023), pp. 0-0. [10.1016/j.aei.2023.102139]

A methodology to determine the optimal train-set size for autoencoders applied to energy systems

Danti P.
;
2023

Abstract

In the latest years, deep learning has been massively used to face problems that have not been solved by means of classical approaches. In particular, an autoencoder is a popular unsupervised artificial neural network that learns efficient data representations (encoding) by training the network to ignore features with a small content of information. Even though autoencoders over-perform classical techniques in several applications like anomaly detection, dimensionality reduction, features denoising, and missing values imputation, the literature does not provide a commonly accepted methodology to define the optimal amount of data needed to train the model. This paper proposes a procedure to determine the optimal train-set size to minimize the reconstruction error of an autoencoder with pre-defined structure and hyper-parameters that will be trained to encode the normal behavior of energy generation systems. This procedure exploits the outcome of learning curves, a powerful tool to track algorithms performance while the train-set dimension varies. Afterward, the procedure is applied to three real case studies where two types of autoencoders are trained to learn the normal behavior of a YANMAR combined heat and power unit with the scope of detecting incoming anomalies. In the end, the outcomes of the procedure are explained and, under the constraint of a daily retraining frequency, 6 weeks are identified as the optimal train-set size for both autoencoders.
2023
58
0
0
Piero Danti, Alessandro Innocenti
File in questo prodotto:
File Dimensione Formato  
1-s2.0-S1474034623002677-main.pdf

Accesso chiuso

Tipologia: Versione finale referata (Postprint, Accepted manuscript)
Licenza: Creative commons
Dimensione 5.83 MB
Formato Adobe PDF
5.83 MB Adobe PDF   Richiedi una copia

I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificatore per citare o creare un link a questa risorsa: https://hdl.handle.net/2158/1328428
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 5
  • ???jsp.display-item.citation.isi??? 3
social impact