Choosing the right neural network architecture for a given learning task is still an open problem. It is well known among practitioners that it often requires several trials, with a consequent wasting of lot of time. It is an active research area, and several solutions have been proposed in last years. Anyway, the nowadays common practise is to set a certain architecture and than update its configuration (network weights) in order to achieve good performances. Furthermore, over the search for the right number of neurons and layers, also other components, as the activation functions, need to be considered in the choice of the right neural architecture. The aim of this research is to go over the classical concept of learning model, where the architecture is static and the model performances are evaluated only at the whole model level. The idea is to design a learning model that can autonomously define an its own internal structure, and with the capacity to discover the particular components of its architecture that need to be empowered, avoiding in this way to impact the whole model configuration. The steps taken in achieving this goal have gone through the development of the following three milestones: Target Propagation, Depth-Growing Neural Networks and Downward-Growing Neural Architectures; which also reflect the organization of the Thesis. Here we define the Depth Growing Neural Network framework (DGNN), as having the following main features: first, a dynamic architecture, it evolves during the learning process in order to autonomously find the internal structure that best suits the needed computational power; second, the evolution process is driven by an evaluation of the single components of its architecture, the so called metaneurons; their performances are rated with regards to their expected out- comes, and the worst performing metaneurons are chosen to be upgraded. Ad- hoc algorithms have been developed to estimate metaneuron outcomes, these are identified as Target Propagation techniques. The mentioned evolution process basically consists in the transformation of a single internal neurons, or a set of them, in a more complex structure, as a set of interconnected neurons; obtaining in this way more powerful computational units. We define these more complex structures as subnets. The internal neurons of the subnets can evolve in turn, realizing in this way a recursive process that can lead to the building of deep architectures. The definition of the right setting for these new structures is acted using a classical gradient descent approach, by back propagating the errors with respect to the estimated outcomes of the subnets. As mentioned above, few techniques have been developed for estimating the metaneuron (and subnet) outcomes. These consist in propagating the neural network target outputs to the internal layers of the network, defining in this way layer-specific targets, allowing to formalize a layer-specific and potentially neuron-specific loss function. Although DGNN can represent a step through the development of autonomously defined architectures, empirical evidence highlighted some of their limitations, first of all the difficulty for subnets to learn their given learning tasks. This was basically a consequence of how the DGNN architecture was conceived. The solutions designed to overcome limitations highlighted in the DGNN, resulted in the development of the Downward-Growing Neural Architecture (DGNA) framework. The latter indirectly shares the same goal and philosophy of its ancestor, but implements a completely different growing strategy. Here the evolution of the architecture is seen as a consequence of another goal, that is to improve the performances of the model by defining more and more complex decision regions. This is realized by replacing computational units, that define a certain decision region, with more powerful computational components. An in depth analysis of this topic is carried on in the Thesis, identifying as a result- ing solution the replacement of hidden neurons and their input connections with brand new subnets, one for each hidden neuron. This approach entails a substantial modification of the first layer of the network, contrariwise to what happens with the previous model. Experimental results validate this technique.

Neural architecture search by growing internal computational units / Vincenzo Laveglia. - (2019).

Neural architecture search by growing internal computational units

Vincenzo Laveglia
2019

Abstract

Choosing the right neural network architecture for a given learning task is still an open problem. It is well known among practitioners that it often requires several trials, with a consequent wasting of lot of time. It is an active research area, and several solutions have been proposed in last years. Anyway, the nowadays common practise is to set a certain architecture and than update its configuration (network weights) in order to achieve good performances. Furthermore, over the search for the right number of neurons and layers, also other components, as the activation functions, need to be considered in the choice of the right neural architecture. The aim of this research is to go over the classical concept of learning model, where the architecture is static and the model performances are evaluated only at the whole model level. The idea is to design a learning model that can autonomously define an its own internal structure, and with the capacity to discover the particular components of its architecture that need to be empowered, avoiding in this way to impact the whole model configuration. The steps taken in achieving this goal have gone through the development of the following three milestones: Target Propagation, Depth-Growing Neural Networks and Downward-Growing Neural Architectures; which also reflect the organization of the Thesis. Here we define the Depth Growing Neural Network framework (DGNN), as having the following main features: first, a dynamic architecture, it evolves during the learning process in order to autonomously find the internal structure that best suits the needed computational power; second, the evolution process is driven by an evaluation of the single components of its architecture, the so called metaneurons; their performances are rated with regards to their expected out- comes, and the worst performing metaneurons are chosen to be upgraded. Ad- hoc algorithms have been developed to estimate metaneuron outcomes, these are identified as Target Propagation techniques. The mentioned evolution process basically consists in the transformation of a single internal neurons, or a set of them, in a more complex structure, as a set of interconnected neurons; obtaining in this way more powerful computational units. We define these more complex structures as subnets. The internal neurons of the subnets can evolve in turn, realizing in this way a recursive process that can lead to the building of deep architectures. The definition of the right setting for these new structures is acted using a classical gradient descent approach, by back propagating the errors with respect to the estimated outcomes of the subnets. As mentioned above, few techniques have been developed for estimating the metaneuron (and subnet) outcomes. These consist in propagating the neural network target outputs to the internal layers of the network, defining in this way layer-specific targets, allowing to formalize a layer-specific and potentially neuron-specific loss function. Although DGNN can represent a step through the development of autonomously defined architectures, empirical evidence highlighted some of their limitations, first of all the difficulty for subnets to learn their given learning tasks. This was basically a consequence of how the DGNN architecture was conceived. The solutions designed to overcome limitations highlighted in the DGNN, resulted in the development of the Downward-Growing Neural Architecture (DGNA) framework. The latter indirectly shares the same goal and philosophy of its ancestor, but implements a completely different growing strategy. Here the evolution of the architecture is seen as a consequence of another goal, that is to improve the performances of the model by defining more and more complex decision regions. This is realized by replacing computational units, that define a certain decision region, with more powerful computational components. An in depth analysis of this topic is carried on in the Thesis, identifying as a result- ing solution the replacement of hidden neurons and their input connections with brand new subnets, one for each hidden neuron. This approach entails a substantial modification of the first layer of the network, contrariwise to what happens with the previous model. Experimental results validate this technique.
2019
Edmondo Trentin
ITALIA
Vincenzo Laveglia
File in questo prodotto:
File Dimensione Formato  
VL_PhD_Thesis.pdf

accesso aperto

Tipologia: Tesi di dottorato
Licenza: Open Access
Dimensione 1.7 MB
Formato Adobe PDF
1.7 MB Adobe PDF

I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificatore per citare o creare un link a questa risorsa: https://hdl.handle.net/2158/1303131
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact