Binary thresholding defense against adversarial attacks

Yutong Wang, Wenwen Zhang, Tianyu Shen, Hui Yu, Fei-Yue Wang

Research output: Contribution to journalArticlepeer-review

Abstract

Convolutional neural networks are always vulnerable to adversarial attacks. In recent research, Projected Gradient Descent (PGD) has been recognized as the most effective attack method, and adversarial training on adversarial examples generated by PGD attack is the most reliable defense method. However, adversarial training requires a large amount of computation time. In this paper, we propose a fast, simple and strong defense method that achieves the best speed-accuracy trade-off. We first compare the feature maps of naturally trained model with adversarially trained model in same architecture, then we find the key of adversarially trained model lies on the binary thresholding the convolutional layers perform. Inspired by this, we perform binary thresholding to preprocess the input image and defend against PGD attack. On MNIST, our defense achieves 99.0% accuracy on clean images and 91.2% on white-box adversarial images. This performance is slightly better than adversarial training, and our method largely saves the computation time for retraining. On Fashion-MNIST and CIFAR-10, we train a new model on binarized images and use this model to defend against attack. Though its performance is not as good as adversarial training, it gains the best speed-accuracy trade-off.
Original languageEnglish
JournalNeurocomputing
Publication statusAccepted for publication - 11 Mar 2021

Keywords

  • binary thresholding
  • defense
  • adversarial training
  • adversarial attack

Fingerprint

Dive into the research topics of 'Binary thresholding defense against adversarial attacks'. Together they form a unique fingerprint.

Cite this