Transferable universal adversarial perturbations using generative models

Hashemi, Atiye Sadat; Bär, Andreas; Mozaffari, Saeed; Fingscheidt, Tim

Transferable universal adversarial perturbations using generative models

Mark

Hashemi, Atiye Sadat ^LU ; Bär, Andreas ; Mozaffari, Saeed and Fingscheidt, Tim (2020)

Abstract: Deep neural networks tend to be vulnerable to adversarial perturbations, which by adding to a natural image can fool a respective model with high confidence. Recently, the existence of image-agnostic perturbations, also known as universal adversarial perturbations (UAPs), were discovered. However, existing UAPs still lack a sufficiently high fooling rate, when being applied to an unknown target model. In this paper, we propose a novel deep learning technique for generating more transferable UAPs. We utilize a perturbation generator and some given pretrained networks so-called source models to generate UAPs using the ImageNet dataset. Due to the similar feature representation of various model architectures in the first layer, we propose a... (More); Deep neural networks tend to be vulnerable to adversarial perturbations, which by adding to a natural image can fool a respective model with high confidence. Recently, the existence of image-agnostic perturbations, also known as universal adversarial perturbations (UAPs), were discovered. However, existing UAPs still lack a sufficiently high fooling rate, when being applied to an unknown target model. In this paper, we propose a novel deep learning technique for generating more transferable UAPs. We utilize a perturbation generator and some given pretrained networks so-called source models to generate UAPs using the ImageNet dataset. Due to the similar feature representation of various model architectures in the first layer, we propose a loss formulation that focuses on the adversarial energy only in the respective first layer of the source models. This supports the transferability of our generated UAPs to any other target model. We further empirically analyze our generated UAPs and demonstrate that these perturbations generalize very well towards different target models. Surpassing the current state of the art in both, fooling rate and model-transferability, we can show the superiority of our proposed approach. Using our generated non-targeted UAPs, we obtain an average fooling rate of 93.36% on the source models (state of the art: 82.16%). Generating our UAPs on the deep ResNet-152, we obtain about a 12% absolute fooling rate advantage vs. cutting-edge methods on VGG-16 and VGG-19 target models. (Less)

Please use this url to cite or link to this publication: https://lup.lub.lu.se/record/7669c10f-4545-42ae-91c4-296de97cf7a0

author: Hashemi, Atiye Sadat ^LU ; Bär, Andreas ; Mozaffari, Saeed and Fingscheidt, Tim
publishing date: 2020
type: Working paper/Preprint
publication status: published
publisher: arXiv.org
DOI: 10.48550/arXiv.2010.14919
language: English
LU publication?: no
id: 7669c10f-4545-42ae-91c4-296de97cf7a0
date added to LUP: 2025-01-31 14:20:14
date last changed: 2025-02-03 10:04:50

@misc{7669c10f-4545-42ae-91c4-296de97cf7a0,
  abstract     = {{Deep neural networks tend to be vulnerable to adversarial perturbations, which by adding to a natural image can fool a respective model with high confidence. Recently, the existence of image-agnostic perturbations, also known as universal adversarial perturbations (UAPs), were discovered. However, existing UAPs still lack a sufficiently high fooling rate, when being applied to an unknown target model. In this paper, we propose a novel deep learning technique for generating more transferable UAPs. We utilize a perturbation generator and some given pretrained networks so-called source models to generate UAPs using the ImageNet dataset. Due to the similar feature representation of various model architectures in the first layer, we propose a loss formulation that focuses on the adversarial energy only in the respective first layer of the source models. This supports the transferability of our generated UAPs to any other target model. We further empirically analyze our generated UAPs and demonstrate that these perturbations generalize very well towards different target models. Surpassing the current state of the art in both, fooling rate and model-transferability, we can show the superiority of our proposed approach. Using our generated non-targeted UAPs, we obtain an average fooling rate of 93.36% on the source models (state of the art: 82.16%). Generating our UAPs on the deep ResNet-152, we obtain about a 12% absolute fooling rate advantage vs. cutting-edge methods on VGG-16 and VGG-19 target models.}},
  author       = {{Hashemi, Atiye Sadat and Bär, Andreas and Mozaffari, Saeed and Fingscheidt, Tim}},
  language     = {{eng}},
  note         = {{Preprint}},
  publisher    = {{arXiv.org}},
  title        = {{Transferable universal adversarial perturbations using generative models}},
  url          = {{http://dx.doi.org/10.48550/arXiv.2010.14919}},
  doi          = {{10.48550/arXiv.2010.14919}},
  year         = {{2020}},
}

Lund University Publications

LUND UNIVERSITY LIBRARIES

Transferable universal adversarial perturbations using generative models