Computer vision allows to automatically detect and recognize objects from images. Nowadays, timely detection of relevant events and efficient recognition of objects in an environment is a critical activity for many Cyber-Physical Systems (CPSs). Particularly, the detection and recognition of traffic signs (TSDR) from images was and is currently being investigated, as it heavily impacts the behaviour of (semi-)autonomous vehicles. TSDR provides drivers with critical traffic sign information, constituting an enabling condition for autonomous driving and attaining a safe circulation of road vehicles. Misclassifying even a single sign may constitute a severe hazard to the environment, infrastructures, and human lives. In the last decades, researchers, practitioners, and companies have worked to devise more efficient and accurate Traffic Sign Recognition (TSR) subsystems or components to be integrated into CPSs. Mostly TSR relies on the same main blocks, namely: i) Datasets creation/identification and pre-processing (e.g., histogram equalization for improvement of contrast), ii) Feature extraction, i.e., Keypoint Detection and Feature Description, and iii) Model learning through non-deep or Deep Neural Networks (DNNs) classifiers. Unfortunately, despite many classifiers and feature extraction strategies applied to images sampled by sensors installed on vehicles that have been developed throughout the years; those efforts did not escalate into a clear benchmark nor provide a comparison of the most common techniques. The main target of this thesis is to improve the robustness and efficiency of TSR systems. Improving the efficiency of the TSR system means achieving better classification performance (classification accuracy) on publicly available datasets, while the robustness of an image classifier is defined as sustaining the performance of the model under various image corruptions or alterations that in our case due to visual camera malfunctioning. Albeit TSDR embraces both detection and recognition of traffic signs, here we focus on the latter aspect of recognition. In the literature, many researchers proposed different techniques for the detection of traffic signs in a full-scene image. Therefore, this thesis starts by providing a comprehensive quantitative comparison of non-deep Machine Learning (ML) algorithms with different feature sets and DNNs for the recognition of traffic signs from three publicly available datasets. Afterward, we propose a TSR system that analyses a sliding window of images instead of considering individual images sampled by sensors on a vehicle. Such TSR processes the last image and recent images sampled by sensors through ML algorithms that take advantage of these multiple information. Particularly, we focused on (i) Long Short-Term Memory (LSTM) networks and (ii) Stacking Meta-Learners, which allow for efficiently combining base-learning classification episodes into a unified and improved meta-level classification. Experimental results by using publicly available datasets show that Stacking Meta-Learners dramatically reduce misclassifications of traffic signs and achieve perfect classification on all three considered datasets. This shows the potential of our novel approach based on sliding windows to be used as an efficient solution for TSR. Furthermore, we consider the failures of visual cameras installed on vehicles that may compromise the correct acquisition of images, delivering a corrupted image to the TSR system. After going through the most common camera failures, we artificially injected 13 different types of visual camera failures into each image contained in the three traffic sign datasets. Then, we train three DNNs to classify a single image, and compare them to our TSR system that uses a sequence (i.e., a sliding window) of images. Experimental results show that sliding windows significantly improve the robustness of the TSR system against altered images. Further, we dig into the results using LIME, a toolbox for explainable Artificial Intelligence (AI). Explainable AI allows an understanding of how a classifier uses the input image to derive its output. We confirm our observations through explainable AI, which allows understand why different classifiers have different TSR performances in case of visual camera failures. Visual camera failures have a negative impact on TSR systems as they may lead to the creation of altered images: therefore, it is of utmost importance to build image classifiers that are robust to those failures. As such, this part of the thesis explores techniques to make TSR systems robust to visual camera failures such as broken lens, blurring, brightness, dead pixels or no noise reduction by image signal processor. Particularly, we discuss to what extent training image classifiers with images altered due to camera failures can improve the robustness of the whole TSR system. Results show that augmenting the training set with altered images significantly improves the overall classification performance of DNN image classifiers and makes the classifier robust against the majority of visual camera failures. In addition, we found that no noise reduction and brightness visual camera failures have a major impact on image classification. We discuss how image classifiers trained using altered images have better accuracy compared to classifiers trained only with original images, even in presence of such failures. Ultimately, we further improve the robustness of the TSR system by crafting a camera failure detector component in conjunction with image classifiers trained using altered images that further enhance the robustness of the TSR system against visual camera failures. We trained different ML-based camera failure detectors (binary classifiers) that work in sequence with DNN to check the images are altered to a certain level of failure severity. Based on the output of the failure detector, we decide that either image will be passed to DNN for classification or will alert the user that the image is severely degraded by visual camera failure and DNN may not be able to classify the image correctly. Experimental results reveal that the failure detector component in conjunction with image classifiers trained using altered images enhances the performance of the TSR system compared to the image classifiers trained using altered images, but it reduces the availability of the system which is 100% in case of image classifiers trained using altered images without camera failure detector component.

Robust Recognition of Objects in the Safety Critical Systems: A Case Study of Traffic Sign Recognition / Atif;. - (2023).

Robust Recognition of Objects in the Safety Critical Systems: A Case Study of Traffic Sign Recognition

Atif
2023

Abstract

Computer vision allows to automatically detect and recognize objects from images. Nowadays, timely detection of relevant events and efficient recognition of objects in an environment is a critical activity for many Cyber-Physical Systems (CPSs). Particularly, the detection and recognition of traffic signs (TSDR) from images was and is currently being investigated, as it heavily impacts the behaviour of (semi-)autonomous vehicles. TSDR provides drivers with critical traffic sign information, constituting an enabling condition for autonomous driving and attaining a safe circulation of road vehicles. Misclassifying even a single sign may constitute a severe hazard to the environment, infrastructures, and human lives. In the last decades, researchers, practitioners, and companies have worked to devise more efficient and accurate Traffic Sign Recognition (TSR) subsystems or components to be integrated into CPSs. Mostly TSR relies on the same main blocks, namely: i) Datasets creation/identification and pre-processing (e.g., histogram equalization for improvement of contrast), ii) Feature extraction, i.e., Keypoint Detection and Feature Description, and iii) Model learning through non-deep or Deep Neural Networks (DNNs) classifiers. Unfortunately, despite many classifiers and feature extraction strategies applied to images sampled by sensors installed on vehicles that have been developed throughout the years; those efforts did not escalate into a clear benchmark nor provide a comparison of the most common techniques. The main target of this thesis is to improve the robustness and efficiency of TSR systems. Improving the efficiency of the TSR system means achieving better classification performance (classification accuracy) on publicly available datasets, while the robustness of an image classifier is defined as sustaining the performance of the model under various image corruptions or alterations that in our case due to visual camera malfunctioning. Albeit TSDR embraces both detection and recognition of traffic signs, here we focus on the latter aspect of recognition. In the literature, many researchers proposed different techniques for the detection of traffic signs in a full-scene image. Therefore, this thesis starts by providing a comprehensive quantitative comparison of non-deep Machine Learning (ML) algorithms with different feature sets and DNNs for the recognition of traffic signs from three publicly available datasets. Afterward, we propose a TSR system that analyses a sliding window of images instead of considering individual images sampled by sensors on a vehicle. Such TSR processes the last image and recent images sampled by sensors through ML algorithms that take advantage of these multiple information. Particularly, we focused on (i) Long Short-Term Memory (LSTM) networks and (ii) Stacking Meta-Learners, which allow for efficiently combining base-learning classification episodes into a unified and improved meta-level classification. Experimental results by using publicly available datasets show that Stacking Meta-Learners dramatically reduce misclassifications of traffic signs and achieve perfect classification on all three considered datasets. This shows the potential of our novel approach based on sliding windows to be used as an efficient solution for TSR. Furthermore, we consider the failures of visual cameras installed on vehicles that may compromise the correct acquisition of images, delivering a corrupted image to the TSR system. After going through the most common camera failures, we artificially injected 13 different types of visual camera failures into each image contained in the three traffic sign datasets. Then, we train three DNNs to classify a single image, and compare them to our TSR system that uses a sequence (i.e., a sliding window) of images. Experimental results show that sliding windows significantly improve the robustness of the TSR system against altered images. Further, we dig into the results using LIME, a toolbox for explainable Artificial Intelligence (AI). Explainable AI allows an understanding of how a classifier uses the input image to derive its output. We confirm our observations through explainable AI, which allows understand why different classifiers have different TSR performances in case of visual camera failures. Visual camera failures have a negative impact on TSR systems as they may lead to the creation of altered images: therefore, it is of utmost importance to build image classifiers that are robust to those failures. As such, this part of the thesis explores techniques to make TSR systems robust to visual camera failures such as broken lens, blurring, brightness, dead pixels or no noise reduction by image signal processor. Particularly, we discuss to what extent training image classifiers with images altered due to camera failures can improve the robustness of the whole TSR system. Results show that augmenting the training set with altered images significantly improves the overall classification performance of DNN image classifiers and makes the classifier robust against the majority of visual camera failures. In addition, we found that no noise reduction and brightness visual camera failures have a major impact on image classification. We discuss how image classifiers trained using altered images have better accuracy compared to classifiers trained only with original images, even in presence of such failures. Ultimately, we further improve the robustness of the TSR system by crafting a camera failure detector component in conjunction with image classifiers trained using altered images that further enhance the robustness of the TSR system against visual camera failures. We trained different ML-based camera failure detectors (binary classifiers) that work in sequence with DNN to check the images are altered to a certain level of failure severity. Based on the output of the failure detector, we decide that either image will be passed to DNN for classification or will alert the user that the image is severely degraded by visual camera failure and DNN may not be able to classify the image correctly. Experimental results reveal that the failure detector component in conjunction with image classifiers trained using altered images enhances the performance of the TSR system compared to the image classifiers trained using altered images, but it reduces the availability of the system which is 100% in case of image classifiers trained using altered images without camera failure detector component.
2023
Prof. Andrea Bondavalli
PAKISTAN
Atif;
File in questo prodotto:
File Dimensione Formato  
Atif_Final_Thesis.pdf

accesso aperto

Descrizione: thesis
Tipologia: Tesi di dottorato
Licenza: Open Access
Dimensione 4.07 MB
Formato Adobe PDF
4.07 MB Adobe PDF

I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificatore per citare o creare un link a questa risorsa: https://hdl.handle.net/2158/1300518
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact