In the past few years, data dimensionality has become so high and complex that a specific field has been created: Big Data. Besides the size of the data, that is continuing to grow in each sector, from business to scientific domains, the advent of IoT (Internet of Things) and data from sensors, introduces a large volume of information that is not simple to manage and to extract valuable knowledge. The process to extract useful information and value from such data is mainly composed of two phases: first, the processing, and then the data access. One of the main requirements for data access is fast response time, whose order of magnitude can vary a lot depending on the specific type of processing as well as processing patterns. Therefore, besides the specific optimization of algorithms and software processes, there are several aspects that involve the infrastructure level of the analysis environment that could be enhanced. From this point of view, the optimization of the access layer becomes more and more important while dealing with a geographically distributed environment where data must be retrieved from remote servers of a Data Lake. From the infrastructural perspectives, caching systems are used to mitigate latency and to serve better popular data. Thus, the role of the cache becomes key to effective and efficient data access. In this thesis, we will explore how to make a cache autonomous and adapt- able to improve the performances of a system in terms of data management with the aim of reducing the cache costs, such as the amount of data written and the amount of data read from the cache memory.

Enhancing cache content management in a data lake architecture using Reinforcement Learning / Mirco Tracolli, Marco Baioletti, Valentina Poggioni, Daniele Spiga. - (2021).

Enhancing cache content management in a data lake architecture using Reinforcement Learning

Mirco Tracolli
Writing – Review & Editing
;
Valentina Poggioni
Supervision
;
2021

Abstract

In the past few years, data dimensionality has become so high and complex that a specific field has been created: Big Data. Besides the size of the data, that is continuing to grow in each sector, from business to scientific domains, the advent of IoT (Internet of Things) and data from sensors, introduces a large volume of information that is not simple to manage and to extract valuable knowledge. The process to extract useful information and value from such data is mainly composed of two phases: first, the processing, and then the data access. One of the main requirements for data access is fast response time, whose order of magnitude can vary a lot depending on the specific type of processing as well as processing patterns. Therefore, besides the specific optimization of algorithms and software processes, there are several aspects that involve the infrastructure level of the analysis environment that could be enhanced. From this point of view, the optimization of the access layer becomes more and more important while dealing with a geographically distributed environment where data must be retrieved from remote servers of a Data Lake. From the infrastructural perspectives, caching systems are used to mitigate latency and to serve better popular data. Thus, the role of the cache becomes key to effective and efficient data access. In this thesis, we will explore how to make a cache autonomous and adapt- able to improve the performances of a system in terms of data management with the aim of reducing the cache costs, such as the amount of data written and the amount of data read from the cache memory.
2021
Marco Baioletti, Valentina Poggioni
ITALIA
Mirco Tracolli, Marco Baioletti, Valentina Poggioni, Daniele Spiga
File in questo prodotto:
File Dimensione Formato  
tracolli_mirco_dt25923.pdf

Open Access dal 01/01/2022

Descrizione: Tesi di dottorato in formato pdf
Tipologia: Tesi di dottorato
Licenza: Open Access
Dimensione 23.57 MB
Formato Adobe PDF
23.57 MB Adobe PDF

I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificatore per citare o creare un link a questa risorsa: https://hdl.handle.net/2158/1237482
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact