Skip to content

Binary thresholding defense against adversarial attacks

Research output: Contribution to journalArticlepeer-review

Convolutional neural networks are always vulnerable to adversarial attacks. In recent research, Projected Gradient Descent (PGD) has been recognized as the most effective attack method, and adversarial training on adversarial examples generated by PGD attack is the most reliable defense method. However, adversarial training requires a large amount of computation time. In this paper, we propose a fast, simple and strong defense method that achieves the best speed-accuracy trade-off. We first compare the feature maps of naturally trained model with adversarially trained model in same architecture, then we find the key of adversarially trained model lies on the binary thresholding the convolutional layers perform. Inspired by this, we perform binary thresholding to preprocess the input image and defend against PGD attack. On MNIST, our defense achieves 99.0% accuracy on clean images and 91.2% on white-box adversarial images. This performance is slightly better than adversarial training, and our method largely saves the computation time for retraining. On Fashion-MNIST and CIFAR-10, we train a new model on binarized images and use this model to defend against attack. Though its performance is not as good as adversarial training, it gains the best speed-accuracy trade-off.
Original languageEnglish
JournalNeurocomputing
Publication statusAccepted for publication - 11 Mar 2021

Documents

  • Binary Thresholding Defense against Adversarial Attacks_pp

    Rights statement: The embargo end date of 2050 is a temporary measure until we know the publication date. Once we know the publication date the full text of this article will be able to view shortly afterwards.

    Accepted author manuscript (Post-print), 1.04 MB, PDF document

    Due to publisher’s copyright restrictions, this document is not freely available to download from this website until: 1/01/50

    Licence: CC BY-NC-ND

Related information

Relations Get citation (various referencing formats)

ID: 26920182