In this thesis, we propose to study the data of crash and near-crash events, collectively called safety-critical driving events. Such data include a footage of the event, acquired from a camera mounted inside the vehicle, and the data from a GPS/IMU module, i.e., speed, acceleration and angular velocity. First, we introduce a novel problem, that we call unsafe maneuver classification, that aims at classifying safety-critical driving events based on the maneuver that leads to the unsafe situation and we propose a two-stream neural architecture based on Convolutional Neural Networks that performs sensor fusion and address the classification task. Then, we propose to integrate the output of an object detector in the classification task, to provide the network explicit knowledge of the entities in the scene. We design a specific architecture that leverages a tracking algorithm to extract information of a single real-world object over time, and then uses attention to ground the prediction on a single (or a few) objects, i.e., the dangerous or in danger ones, leveraging a solution that we called Spatio-Temporal Attention Selector (STAS). Finally, we propose to address video captioning of safety-critical events, with the goal of providing a description of the dangerous situation in a human-understandable form.

Deep learning methods for safety-critical driving events analysis / Fabio Schoen. - (2022).

Deep learning methods for safety-critical driving events analysis

Fabio Schoen
Supervision
2022

Abstract

In this thesis, we propose to study the data of crash and near-crash events, collectively called safety-critical driving events. Such data include a footage of the event, acquired from a camera mounted inside the vehicle, and the data from a GPS/IMU module, i.e., speed, acceleration and angular velocity. First, we introduce a novel problem, that we call unsafe maneuver classification, that aims at classifying safety-critical driving events based on the maneuver that leads to the unsafe situation and we propose a two-stream neural architecture based on Convolutional Neural Networks that performs sensor fusion and address the classification task. Then, we propose to integrate the output of an object detector in the classification task, to provide the network explicit knowledge of the entities in the scene. We design a specific architecture that leverages a tracking algorithm to extract information of a single real-world object over time, and then uses attention to ground the prediction on a single (or a few) objects, i.e., the dangerous or in danger ones, leveraging a solution that we called Spatio-Temporal Attention Selector (STAS). Finally, we propose to address video captioning of safety-critical events, with the goal of providing a description of the dangerous situation in a human-understandable form.
2022
Fabio Schoen
ITALIA
Fabio Schoen
File in questo prodotto:
File Dimensione Formato  
PhD_Thesis.pdf

accesso aperto

Descrizione: Tesi di dottorato
Tipologia: Pdf editoriale (Version of record)
Licenza: Open Access
Dimensione 15.83 MB
Formato Adobe PDF
15.83 MB Adobe PDF

I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificatore per citare o creare un link a questa risorsa: https://hdl.handle.net/2158/1260238
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact