A methodology to determine the optimal train-set size for autoencoders applied to energy systems

Danti, P.; Innocenti, A.

doi:10.1016/j.aei.2023.102139

In the latest years, deep learning has been massively used to face problems that have not been solved by means of classical approaches. In particular, an autoencoder is a popular unsupervised artificial neural network that learns efficient data representations (encoding) by training the network to ignore features with a small content of information. Even though autoencoders over-perform classical techniques in several applications like anomaly detection, dimensionality reduction, features denoising, and missing values imputation, the literature does not provide a commonly accepted methodology to define the optimal amount of data needed to train the model. This paper proposes a procedure to determine the optimal train-set size to minimize the reconstruction error of an autoencoder with pre-defined structure and hyper-parameters that will be trained to encode the normal behavior of energy generation systems. This procedure exploits the outcome of learning curves, a powerful tool to track algorithms performance while the train-set dimension varies. Afterward, the procedure is applied to three real case studies where two types of autoencoders are trained to learn the normal behavior of a YANMAR combined heat and power unit with the scope of detecting incoming anomalies. In the end, the outcomes of the procedure are explained and, under the constraint of a daily retraining frequency, 6 weeks are identified as the optimal train-set size for both autoencoders.

A methodology to determine the optimal train-set size for autoencoders applied to energy systems / Piero Danti, Alessandro Innocenti. - In: ADVANCED ENGINEERING INFORMATICS. - ISSN 1474-0346. - ELETTRONICO. - 58:(2023), pp. 0-0. [10.1016/j.aei.2023.102139]