Recent advances in text-based image editing have enabled finegrained manipulation of visual content guided by natural language. However, such methods are susceptible to adversarial attacks. In this work, we propose a novel attack that targets the visual component of editing methods. We introduce Attention Attack, which disrupts the cross-attention between a textual prompt and the visual representation of the image by using an automatically generated caption of the source image as a proxy for the edit prompt. This breaks the alignment between the contents of the image and their textual description, without requiring knowledge of the editing method or the editing prompt. Reflecting on the reliability of existing metrics for immunization success, we propose two novel evaluation strategies: Caption Similarity, which quantifies semantic consistency between original and adversarial edits, and semantic Intersection over Union (IoU), which measures spatial layout disruption via segmentation masks. Experiments conducted on the TEDBench++ benchmark demonstrate that our attack significantly degrades editing performance while remaining imperceptible.

Immunizing Images from Text to Image Editing via Adversarial Cross-Attention / Trippodo, Matteo; Becattini, Federico; Seidenari, Lorenzo. - ELETTRONICO. - (2025), pp. 10535-10543. (Intervento presentato al convegno Multimedia) [10.1145/3746027.3755755].

Immunizing Images from Text to Image Editing via Adversarial Cross-Attention

Trippodo, Matteo;Becattini, Federico;Seidenari, Lorenzo
2025

Abstract

Recent advances in text-based image editing have enabled finegrained manipulation of visual content guided by natural language. However, such methods are susceptible to adversarial attacks. In this work, we propose a novel attack that targets the visual component of editing methods. We introduce Attention Attack, which disrupts the cross-attention between a textual prompt and the visual representation of the image by using an automatically generated caption of the source image as a proxy for the edit prompt. This breaks the alignment between the contents of the image and their textual description, without requiring knowledge of the editing method or the editing prompt. Reflecting on the reliability of existing metrics for immunization success, we propose two novel evaluation strategies: Caption Similarity, which quantifies semantic consistency between original and adversarial edits, and semantic Intersection over Union (IoU), which measures spatial layout disruption via segmentation masks. Experiments conducted on the TEDBench++ benchmark demonstrate that our attack significantly degrades editing performance while remaining imperceptible.
2025
Multimedia
Multimedia
Trippodo, Matteo; Becattini, Federico; Seidenari, Lorenzo
File in questo prodotto:
File Dimensione Formato  
3746027.3755755.pdf

accesso aperto

Tipologia: Pdf editoriale (Version of record)
Licenza: Tutti i diritti riservati
Dimensione 23.69 MB
Formato Adobe PDF
23.69 MB Adobe PDF

I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificatore per citare o creare un link a questa risorsa: https://hdl.handle.net/2158/1439409
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact