Complex relational structures are used to represent data in many scientific fields such as chemistry, bioinformatics, natural language processing and social network analysis. It is often desirable to classify these complex objects, a problem which is increasingly being dealt with machine learning approaches. While a number of algorithms have been shown to be effective in solving this task for graphs of moderate size, dealing with large structures still poses significant challenges due to the difficulty in scaling exhibited by the existing techniques. In this thesis we introduce a framework to approach supervised learning problems on structured data by extending the R-convolution concept used in graph kernels. We represent a graph (or, more in general, a relational structure) as a hierarchy of objects and we define how to unroll a template neural network on it. This approach is able to outperform state-of-the-art methods on large social networks datasets, while at the same time being competitive on small chemobiological datasets. We also introduce a lossless compression algorithm for the hierarchical decompositions that improves the temporal complexity of our approach by exploiting symmetries in the input data. Another contribution of this thesis is an application of the aforementioned framework to the context-dependent claim detection task. Claim detection is the assessment of whether a sentence contains a claim, i.e. the thesis, or conclusion, of an argument; in particular we focus on context-dependent claims, where the context (i.e. the topic of the argument) is a determining factor in classifying a sentence. We show how our framework is able to take advantage of contextual information in a straightforward way and we present some preliminary results that indicates how this approach is viable on real world datasets. A third contribution is a machine learning approach to aortic size normalcy assesment. The definition of normalcy is crucial when dealing with thoracic aortas, as a dilatation of its diameter often precedes serious disease. We build a new estimator based on OC-SVM fitted on a cohort of 1024 healty individuals aging from 5 to 89 years, and we compare its results to those obtained on the same set of subjects by an approach based on linear regression. As a further novelty, we also build a second estimator that combines the diameters measured at multiple levels in order to assess the normalcy of the overall shape of the aorta.

Novel neural networks for structured data / Daniele Baracchi. - (2018).

Novel neural networks for structured data

BARACCHI, DANIELE
2018

Abstract

Complex relational structures are used to represent data in many scientific fields such as chemistry, bioinformatics, natural language processing and social network analysis. It is often desirable to classify these complex objects, a problem which is increasingly being dealt with machine learning approaches. While a number of algorithms have been shown to be effective in solving this task for graphs of moderate size, dealing with large structures still poses significant challenges due to the difficulty in scaling exhibited by the existing techniques. In this thesis we introduce a framework to approach supervised learning problems on structured data by extending the R-convolution concept used in graph kernels. We represent a graph (or, more in general, a relational structure) as a hierarchy of objects and we define how to unroll a template neural network on it. This approach is able to outperform state-of-the-art methods on large social networks datasets, while at the same time being competitive on small chemobiological datasets. We also introduce a lossless compression algorithm for the hierarchical decompositions that improves the temporal complexity of our approach by exploiting symmetries in the input data. Another contribution of this thesis is an application of the aforementioned framework to the context-dependent claim detection task. Claim detection is the assessment of whether a sentence contains a claim, i.e. the thesis, or conclusion, of an argument; in particular we focus on context-dependent claims, where the context (i.e. the topic of the argument) is a determining factor in classifying a sentence. We show how our framework is able to take advantage of contextual information in a straightforward way and we present some preliminary results that indicates how this approach is viable on real world datasets. A third contribution is a machine learning approach to aortic size normalcy assesment. The definition of normalcy is crucial when dealing with thoracic aortas, as a dilatation of its diameter often precedes serious disease. We build a new estimator based on OC-SVM fitted on a cohort of 1024 healty individuals aging from 5 to 89 years, and we compare its results to those obtained on the same set of subjects by an approach based on linear regression. As a further novelty, we also build a second estimator that combines the diameters measured at multiple levels in order to assess the normalcy of the overall shape of the aorta.
2018
Paolo Frasconi
ITALIA
Daniele Baracchi
File in questo prodotto:
File Dimensione Formato  
tesi_daniele.pdf

accesso aperto

Descrizione: Tesi di dottorato
Tipologia: Tesi di dottorato
Licenza: Open Access
Dimensione 1.73 MB
Formato Adobe PDF
1.73 MB Adobe PDF

I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificatore per citare o creare un link a questa risorsa: https://hdl.handle.net/2158/1113665
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact