Deep Neural Networks (DNNs) have become the standard-de-facto technology in most computer vision applications due to the exceptional performance and versatile applicability they demonstrated in the last years. However, studies have shown that DNNs are remarkably vulnerable to adversarial examples, input data intentionally modified with perturbations specifically crafted to mislead a model into making wrong predictions. Most of the proposed works on adversarial examples focus on finding small image perturbations bounded by Lp-norm distance measure, known as restricted attacks, forcing the adversarial example to be as similar as possible to the original data. Due to the urgency of taking counter-measures, several defense techniques have been introduced to overcome such vulnerabilities. Nowadays, most of the restricted perturbations can be defended with adversarial training or with input denoising and restoration. To overcome the limitations of restricted attacks, unrestricted methods that allow unbounded large perturbations (i.e. geometric image transformation or color manipulation) have been recently proposed. To learn image manipulation filters, the majority of such methods require full access to the target model, which is not always feasible, especially in real-world applications. For this reason, algorithms that work in a black-box fashion have been recently introduced. Nevertheless, crafting unrestricted perturbations in a black-box setup where the adversary has no knowledge about the target model often results in suspicious unnatural images that do not deceive the human eye. Moreover, some methods require additional resources (i.e. image segmentation models) to extract prior information about the image and modify its colors in accordance with human perception. In this case, the naturalness of an image and its non-suspiciousness highly depends on the performance of the segmentation network. Although some works on unrestricted adversarial attacks have been proposed, the area of image filtering attacks and colorization-based methods is still under-explored. This further motivated us to conduct this study and to introduce novel approaches to address the limitations of existing methods and to expand the landscape of this research topic. In this work, we propose the One-for-Many attack, a black-box method to generate un restricted adversarial perturbations by optimizing multiple Instagram-inspired image filters that manipulate specific image characteristics such as saturation, contrast, and brightness, perform edge-enhancement, or apply light gradient. By using well-known image manipulation filters available in several image processing libraries, modern cameras, and widely used in social media (e.g. Instagram, Facebook) we aim to reduce noticeability and to produce natural-looking adversarial examples without relying on additional resources. Moreover, the combination of filters is useful to generate more reliable and transferable perturbations and create images with a wide range of visual effects, including soft warm looks and vibrant colors. The proposed method generates the adversarial perturbations with a two-step nested evolutionary algorithm: given a set of parameterized image filters, the outer optimization step determines the sequence of filters to apply to an image, while the inner step optimizes the parameters of each filter selected in the previous step. The algorithm is flexible and can be easily customized for many computer vision tasks. It also allows different attack strategies and combinations of multiple objectives, such as image-specific or universal attacks, and single or multiple-objective optimization. We validate the proposed adversarial attack on state-of-the-art image classifiers, object detectors, and a newly proposed multimodal explanation model for activity recognition. The experimental results show that the method generates high-quality natural-looking adversarial images that can effectively fool the above-mentioned systems. In the case of image classification, our method generates more transferable, more robust, and more deceitful adversarial perturbations than a similar state-of-the-art method. The proposed attack also greatly decreases the task performance of object detection models while also maintaining good transferability properties. These results indicate that more effort is necessary in order to increase the robustness of deep neural networks to common image editing techniques. On the other hand, by leveraging this vulnerability, our method could be employed for the development of privacy protection tools that apply customized image filters to defend the user’s privacy from unauthorized automatic information extraction on social media platforms. Attacks to multimodal explainable systems are one of the newest emerging trends. To the best of our knowledge, no other work has explicitly studied the robustness of such systems in a black-box setting. In particular, we focus on a novel explanation model that takes an image as input, predicts an activity label, and generates a textual and visual explanation. The attack is effortlessly adapted to consider objective functions for different types of data (i.e. image and text data) and successfully breaks the correlation between activity prediction and its explanations under two scenarios: keeping the activity the same and changing the textual explanation, and vice-versa. The results obtained are very exciting and open up a line of research where our method could be used to develop a system-independent explanation evaluation metric to enable comparative analysis of different vision-language explanation systems, which the literature lacks at the moment. We hope that our work will inspire further research and studies on the susceptibility of deep neural models to image filtering attacks and that our findings will help deepen the understanding of the implications posed by such attacks and encourage the development of robust defenses.
One-for-Many: A flexible adversarial attack on different DNN-based systems / ALINA ELENA BAIA. - (2023).
One-for-Many: A flexible adversarial attack on different DNN-based systems
ALINA ELENA BAIA
2023
Abstract
Deep Neural Networks (DNNs) have become the standard-de-facto technology in most computer vision applications due to the exceptional performance and versatile applicability they demonstrated in the last years. However, studies have shown that DNNs are remarkably vulnerable to adversarial examples, input data intentionally modified with perturbations specifically crafted to mislead a model into making wrong predictions. Most of the proposed works on adversarial examples focus on finding small image perturbations bounded by Lp-norm distance measure, known as restricted attacks, forcing the adversarial example to be as similar as possible to the original data. Due to the urgency of taking counter-measures, several defense techniques have been introduced to overcome such vulnerabilities. Nowadays, most of the restricted perturbations can be defended with adversarial training or with input denoising and restoration. To overcome the limitations of restricted attacks, unrestricted methods that allow unbounded large perturbations (i.e. geometric image transformation or color manipulation) have been recently proposed. To learn image manipulation filters, the majority of such methods require full access to the target model, which is not always feasible, especially in real-world applications. For this reason, algorithms that work in a black-box fashion have been recently introduced. Nevertheless, crafting unrestricted perturbations in a black-box setup where the adversary has no knowledge about the target model often results in suspicious unnatural images that do not deceive the human eye. Moreover, some methods require additional resources (i.e. image segmentation models) to extract prior information about the image and modify its colors in accordance with human perception. In this case, the naturalness of an image and its non-suspiciousness highly depends on the performance of the segmentation network. Although some works on unrestricted adversarial attacks have been proposed, the area of image filtering attacks and colorization-based methods is still under-explored. This further motivated us to conduct this study and to introduce novel approaches to address the limitations of existing methods and to expand the landscape of this research topic. In this work, we propose the One-for-Many attack, a black-box method to generate un restricted adversarial perturbations by optimizing multiple Instagram-inspired image filters that manipulate specific image characteristics such as saturation, contrast, and brightness, perform edge-enhancement, or apply light gradient. By using well-known image manipulation filters available in several image processing libraries, modern cameras, and widely used in social media (e.g. Instagram, Facebook) we aim to reduce noticeability and to produce natural-looking adversarial examples without relying on additional resources. Moreover, the combination of filters is useful to generate more reliable and transferable perturbations and create images with a wide range of visual effects, including soft warm looks and vibrant colors. The proposed method generates the adversarial perturbations with a two-step nested evolutionary algorithm: given a set of parameterized image filters, the outer optimization step determines the sequence of filters to apply to an image, while the inner step optimizes the parameters of each filter selected in the previous step. The algorithm is flexible and can be easily customized for many computer vision tasks. It also allows different attack strategies and combinations of multiple objectives, such as image-specific or universal attacks, and single or multiple-objective optimization. We validate the proposed adversarial attack on state-of-the-art image classifiers, object detectors, and a newly proposed multimodal explanation model for activity recognition. The experimental results show that the method generates high-quality natural-looking adversarial images that can effectively fool the above-mentioned systems. In the case of image classification, our method generates more transferable, more robust, and more deceitful adversarial perturbations than a similar state-of-the-art method. The proposed attack also greatly decreases the task performance of object detection models while also maintaining good transferability properties. These results indicate that more effort is necessary in order to increase the robustness of deep neural networks to common image editing techniques. On the other hand, by leveraging this vulnerability, our method could be employed for the development of privacy protection tools that apply customized image filters to defend the user’s privacy from unauthorized automatic information extraction on social media platforms. Attacks to multimodal explainable systems are one of the newest emerging trends. To the best of our knowledge, no other work has explicitly studied the robustness of such systems in a black-box setting. In particular, we focus on a novel explanation model that takes an image as input, predicts an activity label, and generates a textual and visual explanation. The attack is effortlessly adapted to consider objective functions for different types of data (i.e. image and text data) and successfully breaks the correlation between activity prediction and its explanations under two scenarios: keeping the activity the same and changing the textual explanation, and vice-versa. The results obtained are very exciting and open up a line of research where our method could be used to develop a system-independent explanation evaluation metric to enable comparative analysis of different vision-language explanation systems, which the literature lacks at the moment. We hope that our work will inspire further research and studies on the susceptibility of deep neural models to image filtering attacks and that our findings will help deepen the understanding of the implications posed by such attacks and encourage the development of robust defenses.File | Dimensione | Formato | |
---|---|---|---|
Baia_Alina_Elena_Adversarial_attacks_2023.pdf
accesso aperto
Descrizione: Tesi di Dottorato
Tipologia:
Tesi di dottorato
Licenza:
Creative commons
Dimensione
21.58 MB
Formato
Adobe PDF
|
21.58 MB | Adobe PDF |
I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.