Road accidents are one of the leading causes of death in the world, which has prompted all major legislators to adopt safety initiatives and mandate Advanced Driving Assistance Systems (ADAS) in new vehicles. While the success of ADAS is largely due to impressive advances in computer vision, many challenges remain, hindering their potential impact on road safety. In real-world applications, such as smart dashcams, viewpoint shifts pose a significant challenge. Vehicles of different sizes, combined with cameras installed in various positions and orientations, require computer vision models to handle highly diverse viewpoints. In this PhD thesis we systematically study the effects of viewpoint shifts on different perception tasks, including depth estimation and bird's eye view representation. We first collect a real-world depth dataset from dashcams installed in different positions on the windshield. Instead of using expensive lidar sensors, we devise a new ground-truth strategy based on homographies and object detection. The collected data enables us to quantitatively measure the effects of the different viewpoints on self-supervised monocular depth estimators and to identify the main cause of performance degradation in deformed scale perception. Benchmarks on complex and large foundation models highlight their generalisation capabilities, but also their high computational requirements, so we apply knowledge distillation to transfer this ability to smaller models. In particular, with the aim of disentangling viewpoint generalisation from data acquisition, we propose a distillation method based on simulated rotations. Experimental and qualitative results highlight the effectiveness of our distillation. Moreover, we develop a new synthetic dataset for the study of viewpoint shifts on more complex tasks, such as bird's eye view semantic segmentation. Our experiments show the large impact of specific camera orientations on the model and highlight the positive effect of including different viewpoints during training. Finally, we tackle the problem of accident anticipation with a more holistic approach that does not directly consider viewpoint shift effects. Instead, we devise a self-supervised loss function that enables training on a large private dataset with different vehicles and camera positions, with satisfactory results.

Road scene perception under camera viewpoint shifts / Aurel Pjetri. - (2026).

Road scene perception under camera viewpoint shifts

Aurel Pjetri
2026

Abstract

Road accidents are one of the leading causes of death in the world, which has prompted all major legislators to adopt safety initiatives and mandate Advanced Driving Assistance Systems (ADAS) in new vehicles. While the success of ADAS is largely due to impressive advances in computer vision, many challenges remain, hindering their potential impact on road safety. In real-world applications, such as smart dashcams, viewpoint shifts pose a significant challenge. Vehicles of different sizes, combined with cameras installed in various positions and orientations, require computer vision models to handle highly diverse viewpoints. In this PhD thesis we systematically study the effects of viewpoint shifts on different perception tasks, including depth estimation and bird's eye view representation. We first collect a real-world depth dataset from dashcams installed in different positions on the windshield. Instead of using expensive lidar sensors, we devise a new ground-truth strategy based on homographies and object detection. The collected data enables us to quantitatively measure the effects of the different viewpoints on self-supervised monocular depth estimators and to identify the main cause of performance degradation in deformed scale perception. Benchmarks on complex and large foundation models highlight their generalisation capabilities, but also their high computational requirements, so we apply knowledge distillation to transfer this ability to smaller models. In particular, with the aim of disentangling viewpoint generalisation from data acquisition, we propose a distillation method based on simulated rotations. Experimental and qualitative results highlight the effectiveness of our distillation. Moreover, we develop a new synthetic dataset for the study of viewpoint shifts on more complex tasks, such as bird's eye view semantic segmentation. Our experiments show the large impact of specific camera orientations on the model and highlight the positive effect of including different viewpoints during training. Finally, we tackle the problem of accident anticipation with a more holistic approach that does not directly consider viewpoint shift effects. Instead, we devise a self-supervised loss function that enables training on a large private dataset with different vehicles and camera positions, with satisfactory results.
2026
Andrew David Bagdanov, Marco Bertini, Stefano Caprasecca
ITALIA
Aurel Pjetri
File in questo prodotto:
File Dimensione Formato  
PhD_Thesis_Pjetri.pdf

embargo fino al 17/04/2027

Licenza: Open Access
Dimensione 32.25 MB
Formato Adobe PDF
32.25 MB Adobe PDF   Richiedi una copia

I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificatore per citare o creare un link a questa risorsa: https://hdl.handle.net/2158/1469454
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact