In this thesis, we propose to study the data of crash and near-crash events, collectively called safety-critical driving events. Such data include a footage of the event, acquired from a camera mounted inside the vehicle, and the data from a GPS/IMU module, i.e., speed, acceleration and angular velocity. First, we introduce a novel problem, that we call unsafe maneuver classification, that aims at classifying safety-critical driving events based on the maneuver that leads to the unsafe situation and we propose a two-stream neural architecture based on Convolutional Neural Networks that performs sensor fusion and address the classification task. Then, we propose to integrate the output of an object detector in the classification task, to provide the network explicit knowledge of the entities in the scene. We design a specific architecture that leverages a tracking algorithm to extract information of a single real-world object over time, and then uses attention to ground the prediction on a single (or a few) objects, i.e., the dangerous or in danger ones, leveraging a solution that we called Spatio-Temporal Attention Selector (STAS). Finally, we propose to address video captioning of safety-critical events, with the goal of providing a description of the dangerous situation in a human-understandable form.
Deep learning methods for safety-critical driving events analysis / Fabio Schoen. - (2022).
Deep learning methods for safety-critical driving events analysis
Fabio SchoenSupervision
2022
Abstract
In this thesis, we propose to study the data of crash and near-crash events, collectively called safety-critical driving events. Such data include a footage of the event, acquired from a camera mounted inside the vehicle, and the data from a GPS/IMU module, i.e., speed, acceleration and angular velocity. First, we introduce a novel problem, that we call unsafe maneuver classification, that aims at classifying safety-critical driving events based on the maneuver that leads to the unsafe situation and we propose a two-stream neural architecture based on Convolutional Neural Networks that performs sensor fusion and address the classification task. Then, we propose to integrate the output of an object detector in the classification task, to provide the network explicit knowledge of the entities in the scene. We design a specific architecture that leverages a tracking algorithm to extract information of a single real-world object over time, and then uses attention to ground the prediction on a single (or a few) objects, i.e., the dangerous or in danger ones, leveraging a solution that we called Spatio-Temporal Attention Selector (STAS). Finally, we propose to address video captioning of safety-critical events, with the goal of providing a description of the dangerous situation in a human-understandable form.File | Dimensione | Formato | |
---|---|---|---|
PhD_Thesis.pdf
accesso aperto
Descrizione: Tesi di dottorato
Tipologia:
Pdf editoriale (Version of record)
Licenza:
Open Access
Dimensione
15.83 MB
Formato
Adobe PDF
|
15.83 MB | Adobe PDF |
I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.