STEMM Institute Press
Science, Technology, Engineering, Management and Medicine
AEAED: Attention-Enhanced Autoencoder for Adversarial Example Detection with Multi-Scale Feature Learning
DOI: https://doi.org/10.62517/jike.202404313
Author(s)
Ming He1,2, Mengyao Cui1,2, Yanbing Liang1,2,*, Hanqi Liu1,2
Affiliation(s)
1Hebei Key Laboratory of Data Science and Application, North China University of Science and Technology, Tangshan, Hebei, China 2College of Science, North China University of Science and Technology, Tangshan, Hebei, China *Corresponding Author
Abstract
Deep learning models face significant security threats from adversarial attacks, which can mislead model predictions by introducing subtle perturbations to input images. Existing adversarial example detection methods often suffer from limited detection accuracy and poor generalization capabilities. To address these challenges, we propose AEAED (Attention-Enhanced Autoencoder for Adversarial Example Detection with Multi-Scale Feature Learning), a novel detection model that integrates Transformer's multi-head self-attention mechanism into an autoencoder framework, significantly enhancing the model's ability to capture both global and local image features. AEAED comprises three key components: (1) a multi-scale attention encoder that combines convolutional layers' local feature extraction capabilities with self-attention mechanisms for global dependency modeling, improving sensitivity to adversarial perturbations; (2) an adaptive reconstruction decoder that leverages multi-head self-attention mechanisms to achieve high-quality image reconstruction; and (3) a comprehensive reconstruction error calculation method that integrates pixel-level errors and feature layer discrepancies, coupled with an adaptive threshold strategy for adversarial example detection. Experimental results on the CIFAR-10 dataset demonstrate that AEAED significantly outperforms existing baseline models, including MagNet, DAGMM, and SafetyNet, across multiple evaluation metrics. Notably, when detecting FGSM adversarial examples generated using the Swin Transformer, the model achieves an accuracy of 87.10%, a recall of 85.40%, and an AUC-ROC value of 0.89, while reducing the false positive rate to 0.08. These results conclusively validate the effectiveness and superiority of the proposed method in adversarial example detection tasks.
Keywords
Deep Learning Security, Adversarial Example Detection, Autoencoder, Multi-head Self-attention Mechanism, Image Reconstruction
References
[1] C. Szegedy et al., “Intriguing properties of neural networks,” in Proceedings of the 2nd International Conference on Learning Representations, Banff, AB, Canada, 14–16. [2] I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and Harnessing Adversarial Examples,” in Proceedings of the International Conference on Learning Representations (ICLR), Dec. 2015. [3] A. Kurakin, I. J. Goodfellow, and S. Bengio, “Adversarial examples in the physical world,” in Artificial intelligence safety and security, Chapman and Hall/CRC, 2018, pp. 99–112. [4] N. Carlini and D. Wagner, “Towards evaluating the robustness of neural networks,” in 2017 ieee symposium on security and privacy (sp), Ieee, 2017, pp. 39–57. [5] H. Kim, W. Lee, S. Lee, and J. Lee, “Bridged adversarial training,” Neural Networks, vol. 167, pp. 266–282, 2023. [6] C. Shao, W. Li, J. Huo, Z. Feng, and Y. Gao, “Attention-based investigation and solution to the trade-off issue of adversarial training,” Neural Networks, vol. 174, p. 106224, 2024. [7] L. He, Q. Ai, X. Yang, Y. Ren, Q. Wang, and Z. Xu, “Boosting adversarial robustness via self-paced adversarial training,” Neural Networks, vol. 167, pp. 706–714, 2023. [8] X. Chen, J. Weng, X. Deng, W. Luo, Y. Lan, and Q. Tian, “Feature distillation in deep attention network against adversarial examples,” IEEE Transactions on Neural Networks and Learning Systems, vol. 34, no. 7, pp. 3691–3705, 2021. [9] C. Chen, D. Ye, Y. He, L. Tang, and Y. Xu, “Improving Adversarial Robustness With Adversarial Augmentations,” IEEE Internet of Things Journal, 2023. [10] M. Ren, Y. Zhu, Y. Wang, and Z. Sun, “Perturbation inactivation based adversarial defense for face recognition,” IEEE Transactions on Information Forensics and Security, vol. 17, pp. 2947–2962, 2022. [11] S.-H. Choi, J. Shin, P. Liu, and Y.-H. Choi, “EEJE: Two-step input transformation for robust DNN against adversarial examples,” IEEE Transactions on Network Science and Engineering, vol. 8, no. 2, pp. 908–920, 2020. [12] Z.-H. Niu and Y.-B. Yang, “Defense against adversarial attacks with efficient frequency-adaptive compression and reconstruction,” Pattern Recognition, vol. 138, p. 109382, 2023. [13] S.-H. Choi, T. Bahk, S. Ahn, and Y.-H. Choi, “Clustering Approach for Detecting Multiple Types of Adversarial Examples,” Sensors, vol. 22, no. 10, p. 3826, 2022. [14] J. Lu, T. Issaranon, and D. Forsyth, “SafetyNet: Detecting and Rejecting Adversarial Examples Robustly,” in 2017 IEEE International Conference on Computer Vision (ICCV), IEEE, 2017, pp. 446–454. [15] S. Halim, A. Anandi, V. Devin, A. A. S. Gunawan, and K. E. Setiawan, “Automated Adversarial-Attack Removal with SafetyNet Using ADGIT,” in 2024 10th International Conference on Smart Computing and Communication (ICSCC), IEEE, 2024, pp. 22–27. [16] D. Meng and H. Chen, “Magnet: a two-pronged defense against adversarial examples,” in Proceedings of the 2017 ACM SIGSAC conference on computer and communications security, 2017, pp. 135–147. [17] J. H. Metzen, T. Genewein, V. Fischer, and B. Bischoff, “On Detecting Adversarial Perturbations,” in International Conference on Learning Representations, 2022. [18] A. Dosovitskiy et al., “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale,” in International Conference on Learning Representations, 2021. [Online]. Available: https://openreview.net/forum?id=YicbFdNTTy [19] Z. Liu et al., “Swin transformer: Hierarchical vision transformer using shifted windows,” in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 10012–10022. [20] Q. Fan, H. Huang, M. Chen, H. Liu, and R. He, “Rmt: Retentive networks meet vision transformers,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 5641–5651. [21] Y. Sun et al., “Retentive network: A successor to transformer for large language models,” arXiv preprint arXiv:2307.08621, 2023.
Copyright @ 2020-2035 STEMM Institute Press All Rights Reserved