An evolutionary estimation procedure for Generalized Semilinear Regression Trees

In many applications, the presence of interactions or even mild non--linearities can affect inference and predictions. For that reason, we suggest the use of a class of models laying between statistics and machine learning and we propose a learning procedure. The models combine a linear part and a tree component that is selected via an evolutionary algorithm, and they can be adopted for any kind of response, such as, for instance, continuous, categorical, ordinal responses, and survival times. They are inherently interpretable but more flexible than standard regression models, as they easily capture non--linear and interaction effects. The proposed genetic--like learning algorithm allows avoiding a greedy search of the tree component. In a simulation study, we show that the proposed approach has a performance comparable with other machine learning algorithms, with a substantial gain in interpretability and transparency, and we illustrate the method on a real data set.

An evolutionary estimation procedure for Generalized Semilinear Regression Trees / Giulia Vannucci; Anna Gottard. - In: COMPUTATIONAL STATISTICS. - ISSN 0943-4062. - ELETTRONICO. - (2022), pp. 1-23. [10.1007/s00180-022-01302-8]

An evolutionary estimation procedure for Generalized Semilinear Regression Trees

Giulia Vannucci^{Membro del Collaboration Group};Anna Gottard^{Membro del Collaboration Group}

2022

Abstract

In many applications, the presence of interactions or even mild non--linearities can affect inference and predictions. For that reason, we suggest the use of a class of models laying between statistics and machine learning and we propose a learning procedure. The models combine a linear part and a tree component that is selected via an evolutionary algorithm, and they can be adopted for any kind of response, such as, for instance, continuous, categorical, ordinal responses, and survival times. They are inherently interpretable but more flexible than standard regression models, as they easily capture non--linear and interaction effects. The proposed genetic--like learning algorithm allows avoiding a greedy search of the tree component. In a simulation study, we show that the proposed approach has a performance comparable with other machine learning algorithms, with a substantial gain in interpretability and transparency, and we illustrate the method on a real data set.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data di pubblicazione
	
				2022
			
	Rivista
	
				COMPUTATIONAL STATISTICS
			
	Pagina iniziale
	
				1
			
	Pagina finale
	
				23
			
	Tutti gli autori
	
						Giulia Vannucci; Anna Gottard
					
	Appare nelle tipologie:
	
				1a - Articolo su rivista

File in questo prodotto:

File	Dimensione	Formato
s00180-022-01302-8.pdf Accesso chiuso Tipologia: Versione finale referata (Postprint, Accepted manuscript) Licenza: Tutti i diritti riservati Dimensione 1.75 MB Formato Adobe PDF Richiedi una copia	1.75 MB	Adobe PDF	Richiedi una copia

I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificatore per citare o creare un link a questa risorsa: https://hdl.handle.net/2158/1288864

Citazioni

ND

0

0

social impact