Recent years have seen a growing involvement of researchers and practitioners in crafting Deep Neural Networks (DNNs) that seem to outperform existing machine learning approaches for solving classification problems as anomaly-based error and intrusion detection. Undoubtedly, classifiers may be very diverse among themselves, and choosing one or another is typically due to the specific task and target system. Designing and training the optimal tabular data classifier requires extensive experimentation, sensitivity analyses, big datasets, and domain-specific knowledge that may not be available at will or considered a non-strategical asset by many companies and stakeholders. This paper compares, using a total of 23 public datasets: i) traditional (tree-based, statistical) supervised classifiers, ii) DNNs that are specifically designed for classifying tabular data, iii) DNNs for image classification that are applied to tabular data after converting data points into images, alone and as ensembles. Experimental results and related discussions show clear advantages in adopting tree-based classifiers for anomaly-based error and intrusion detection in tabular data as they outperform their competitors, including DNNs. Then, individual classifiers are compared against ensembles using different combinations of the classifiers considered in this study as base-learners, providing a unified final response through many meta-learning strategies. Results show that there is no benefit in building ensembles instead of using a tree-based classifier as Random Forests, eXtreme Gradient Boosting or Extra Trees. The paper concludes that anomaly-based error and intrusion detectors for critical systems should use the old (but gold) tree-based classifiers, which are also easier to fine-tune, and understand; plus, they require less time and resources to learn their model.

Anomaly-based error and intrusion detection in tabular data: No DNN outperforms tree-based classifiers / Zoppi T.; Gazzini S.; Ceccarelli A.. - In: FUTURE GENERATION COMPUTER SYSTEMS. - ISSN 0167-739X. - ELETTRONICO. - 160:(2024), pp. 951-965. [10.1016/j.future.2024.06.051]

Anomaly-based error and intrusion detection in tabular data: No DNN outperforms tree-based classifiers

Zoppi T.;Ceccarelli A.
2024

Abstract

Recent years have seen a growing involvement of researchers and practitioners in crafting Deep Neural Networks (DNNs) that seem to outperform existing machine learning approaches for solving classification problems as anomaly-based error and intrusion detection. Undoubtedly, classifiers may be very diverse among themselves, and choosing one or another is typically due to the specific task and target system. Designing and training the optimal tabular data classifier requires extensive experimentation, sensitivity analyses, big datasets, and domain-specific knowledge that may not be available at will or considered a non-strategical asset by many companies and stakeholders. This paper compares, using a total of 23 public datasets: i) traditional (tree-based, statistical) supervised classifiers, ii) DNNs that are specifically designed for classifying tabular data, iii) DNNs for image classification that are applied to tabular data after converting data points into images, alone and as ensembles. Experimental results and related discussions show clear advantages in adopting tree-based classifiers for anomaly-based error and intrusion detection in tabular data as they outperform their competitors, including DNNs. Then, individual classifiers are compared against ensembles using different combinations of the classifiers considered in this study as base-learners, providing a unified final response through many meta-learning strategies. Results show that there is no benefit in building ensembles instead of using a tree-based classifier as Random Forests, eXtreme Gradient Boosting or Extra Trees. The paper concludes that anomaly-based error and intrusion detectors for critical systems should use the old (but gold) tree-based classifiers, which are also easier to fine-tune, and understand; plus, they require less time and resources to learn their model.
2024
160
951
965
Zoppi T.; Gazzini S.; Ceccarelli A.
File in questo prodotto:
File Dimensione Formato  
1-s2.0-S0167739X24003510-main.pdf

accesso aperto

Tipologia: Pdf editoriale (Version of record)
Licenza: Open Access
Dimensione 3.51 MB
Formato Adobe PDF
3.51 MB Adobe PDF

I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificatore per citare o creare un link a questa risorsa: https://hdl.handle.net/2158/1385112
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 2
social impact