In many applications, the presence of interactions or even mild non--linearities can affect inference and predictions. For that reason, we suggest the use of a class of models laying between statistics and machine learning and we propose a learning procedure. The models combine a linear part and a tree component that is selected via an evolutionary algorithm, and they can be adopted for any kind of response, such as, for instance, continuous, categorical, ordinal responses, and survival times. They are inherently interpretable but more flexible than standard regression models, as they easily capture non--linear and interaction effects. The proposed genetic--like learning algorithm allows avoiding a greedy search of the tree component. In a simulation study, we show that the proposed approach has a performance comparable with other machine learning algorithms, with a substantial gain in interpretability and transparency, and we illustrate the method on a real data set.

An evolutionary estimation procedure for Generalized Semilinear Regression Trees / Giulia Vannucci; Anna Gottard. - In: COMPUTATIONAL STATISTICS. - ISSN 0943-4062. - ELETTRONICO. - (In corso di stampa), pp. 1-23. [10.1007/s00180-022-01302-8]

An evolutionary estimation procedure for Generalized Semilinear Regression Trees

Anna Gottard
In corso di stampa

Abstract

In many applications, the presence of interactions or even mild non--linearities can affect inference and predictions. For that reason, we suggest the use of a class of models laying between statistics and machine learning and we propose a learning procedure. The models combine a linear part and a tree component that is selected via an evolutionary algorithm, and they can be adopted for any kind of response, such as, for instance, continuous, categorical, ordinal responses, and survival times. They are inherently interpretable but more flexible than standard regression models, as they easily capture non--linear and interaction effects. The proposed genetic--like learning algorithm allows avoiding a greedy search of the tree component. In a simulation study, we show that the proposed approach has a performance comparable with other machine learning algorithms, with a substantial gain in interpretability and transparency, and we illustrate the method on a real data set.
1
23
Giulia Vannucci; Anna Gottard
File in questo prodotto:
File Dimensione Formato  
s00180-022-01302-8.pdf

Accesso chiuso

Tipologia: Versione finale referata (Postprint, Accepted manuscript)
Licenza: DRM non definito
Dimensione 1.75 MB
Formato Adobe PDF
1.75 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2158/1288864
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact