Road accidents are one of the leading causes of death in the world, which has prompted all major legislators to adopt safety initiatives and mandate Advanced Driving Assistance Systems (ADAS) in new vehicles. While the success of ADAS is largely due to impressive advances in computer vision, many challenges remain, hindering their potential impact on road safety. In real-world applications, such as smart dashcams, viewpoint shifts pose a significant challenge. Vehicles of different sizes, combined with cameras installed in various positions and orientations, require computer vision models to handle highly diverse viewpoints. In this PhD thesis we systematically study the effects of viewpoint shifts on different perception tasks, including depth estimation and bird's eye view representation. We first collect a real-world depth dataset from dashcams installed in different positions on the windshield. Instead of using expensive lidar sensors, we devise a new ground-truth strategy based on homographies and object detection. The collected data enables us to quantitatively measure the effects of the different viewpoints on self-supervised monocular depth estimators and to identify the main cause of performance degradation in deformed scale perception. Benchmarks on complex and large foundation models highlight their generalisation capabilities, but also their high computational requirements, so we apply knowledge distillation to transfer this ability to smaller models. In particular, with the aim of disentangling viewpoint generalisation from data acquisition, we propose a distillation method based on simulated rotations. Experimental and qualitative results highlight the effectiveness of our distillation. Moreover, we develop a new synthetic dataset for the study of viewpoint shifts on more complex tasks, such as bird's eye view semantic segmentation. Our experiments show the large impact of specific camera orientations on the model and highlight the positive effect of including different viewpoints during training. Finally, we tackle the problem of accident anticipation with a more holistic approach that does not directly consider viewpoint shift effects. Instead, we devise a self-supervised loss function that enables training on a large private dataset with different vehicles and camera positions, with satisfactory results.
Road scene perception under camera viewpoint shifts / Aurel Pjetri. - (2026).
Road scene perception under camera viewpoint shifts
Aurel Pjetri
2026
Abstract
Road accidents are one of the leading causes of death in the world, which has prompted all major legislators to adopt safety initiatives and mandate Advanced Driving Assistance Systems (ADAS) in new vehicles. While the success of ADAS is largely due to impressive advances in computer vision, many challenges remain, hindering their potential impact on road safety. In real-world applications, such as smart dashcams, viewpoint shifts pose a significant challenge. Vehicles of different sizes, combined with cameras installed in various positions and orientations, require computer vision models to handle highly diverse viewpoints. In this PhD thesis we systematically study the effects of viewpoint shifts on different perception tasks, including depth estimation and bird's eye view representation. We first collect a real-world depth dataset from dashcams installed in different positions on the windshield. Instead of using expensive lidar sensors, we devise a new ground-truth strategy based on homographies and object detection. The collected data enables us to quantitatively measure the effects of the different viewpoints on self-supervised monocular depth estimators and to identify the main cause of performance degradation in deformed scale perception. Benchmarks on complex and large foundation models highlight their generalisation capabilities, but also their high computational requirements, so we apply knowledge distillation to transfer this ability to smaller models. In particular, with the aim of disentangling viewpoint generalisation from data acquisition, we propose a distillation method based on simulated rotations. Experimental and qualitative results highlight the effectiveness of our distillation. Moreover, we develop a new synthetic dataset for the study of viewpoint shifts on more complex tasks, such as bird's eye view semantic segmentation. Our experiments show the large impact of specific camera orientations on the model and highlight the positive effect of including different viewpoints during training. Finally, we tackle the problem of accident anticipation with a more holistic approach that does not directly consider viewpoint shift effects. Instead, we devise a self-supervised loss function that enables training on a large private dataset with different vehicles and camera positions, with satisfactory results.| File | Dimensione | Formato | |
|---|---|---|---|
|
PhD_Thesis_Pjetri.pdf
embargo fino al 17/04/2027
Licenza:
Open Access
Dimensione
32.25 MB
Formato
Adobe PDF
|
32.25 MB | Adobe PDF | Richiedi una copia |
I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.



