Binary thresholding defense against adversarial attacks

Yutong Wang, Wenwen Zhang, Tianyu Shen, Hui Yu, Fei-Yue Wang

    Research output: Contribution to journalArticlepeer-review

    130 Downloads (Pure)

    Abstract

    Convolutional neural networks are always vulnerable to adversarial attacks. In recent research, Projected Gradient Descent (PGD) has been recognized as the most effective attack method, and adversarial training on adversarial examples generated by PGD attack is the most reliable defense method. However, adversarial training requires a large amount of computation time. In this paper, we propose a fast, simple and strong defense method that achieves the best speed-accuracy trade-off. We first compare the feature maps of naturally trained model with adversarially trained model in same architecture, then we find the key of adversarially trained model lies on the binary thresholding the convolutional layers perform. Inspired by this, we perform binary thresholding to preprocess the input image and defend against PGD attack. On MNIST, our defense achieves 99.0% accuracy on clean images and 91.2% on white-box adversarial images. This performance is slightly better than adversarial training, and our method largely saves the computation time for retraining. On Fashion-MNIST and CIFAR-10, we train a new model on binarized images and use this model to defend against attack. Though its performance is not as good as adversarial training, it gains the best speed-accuracy trade-off.
    Original languageEnglish
    Pages (from-to)61-71
    Number of pages11
    JournalNeurocomputing
    Volume445
    Early online date1 Apr 2021
    DOIs
    Publication statusPublished - 1 Jul 2021

    Keywords

    • binary thresholding
    • defense
    • adversarial training
    • adversarial attack

    Fingerprint

    Dive into the research topics of 'Binary thresholding defense against adversarial attacks'. Together they form a unique fingerprint.

    Cite this