Tree-based methods refer to a class of predictive models largely employed in many scientific areas. Regression trees partition the variable space into a set of hyper- rectangles, and fit a model within each of them. They are conceptually simple, ap- parently easy to interpret and capable to deal with non linearities and interactions. Random forests are an ensemble of regression trees constructed on subsamples of statistical units and on a subset of explanatory variables randomly selected. The prediction is a combination of this kind of trees. Despite the loss in interpretability, thanks to their high predictive performance, random forests have achieved great success. The aim of this thesis is to propose a class of models combining a linear component and a tree, able to discover the relevant variables directly influencing a response. The proposal is a semilinear model that can handle linear and non linear dependencies and maintains a good predictive performance, while ensuring a simple and intuitive interpretation in a generative model sense. Moreover, two different algorithms for estimation, a two-stage estimation procedure based on a backfitting algorithm and one based on evolutionary algorithms are proposed.
Interpretable semilinear regression trees / Giulia Vannucci. - (2019).
Interpretable semilinear regression trees
Giulia Vannucci
2019
Abstract
Tree-based methods refer to a class of predictive models largely employed in many scientific areas. Regression trees partition the variable space into a set of hyper- rectangles, and fit a model within each of them. They are conceptually simple, ap- parently easy to interpret and capable to deal with non linearities and interactions. Random forests are an ensemble of regression trees constructed on subsamples of statistical units and on a subset of explanatory variables randomly selected. The prediction is a combination of this kind of trees. Despite the loss in interpretability, thanks to their high predictive performance, random forests have achieved great success. The aim of this thesis is to propose a class of models combining a linear component and a tree, able to discover the relevant variables directly influencing a response. The proposal is a semilinear model that can handle linear and non linear dependencies and maintains a good predictive performance, while ensuring a simple and intuitive interpretation in a generative model sense. Moreover, two different algorithms for estimation, a two-stage estimation procedure based on a backfitting algorithm and one based on evolutionary algorithms are proposed.File | Dimensione | Formato | |
---|---|---|---|
Thesis_Phd_GiuliaVannucci.pdf
accesso aperto
Tipologia:
Tesi di dottorato
Licenza:
Open Access
Dimensione
2.39 MB
Formato
Adobe PDF
|
2.39 MB | Adobe PDF |
I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.