STEMM Institute Press
Science, Technology, Engineering, Management and Medicine
Enhancing Multi-Scale Feature Fusion in YOLOv5 with ASFF and SimAM for Robust Defect Detection
DOI: https://doi.org/10.62517/jbdc.202501216
Author(s)
Yubo Liu, Zihao Yan, Zhaoyan Ma, Jingjing Hou, Zhengxin Liu, Yang yang
Affiliation(s)
Xi’an Mingde Institute of Technology, Xi’an, Shaanxi, China
Abstract
In complex object detection tasks, especially those involving irregular and multi-scale visual patterns, conventional recognition algorithms often fall short due to their reliance on low-level features. To address this limitation, this study proposes an enhanced detection framework based on the You Only Look Once version 5 (YOLOv5) model. Two key components are integrated: The Adaptive Spatial Feature Fusion (ASFF) module and the Similarity Attention Module (SimAM). The ASFF module improves the consistency and semantic alignment of feature maps across multiple scales, while the SimAM module enhances the model’s ability to focus on salient information by suppressing background noise through a parameter-free attention mechanism. We evaluate the proposed model using the NEU-DET dataset for steel surface defect detection, demonstrating significant improvements in mean Average Precision (mAP), accuracy, and robustness compared to the baseline YOLOv5. Despite slight increases in computational cost, the model retains its real-time inference capabilities, making it suitable for applications such as automated infrastructure inspection and road surface monitoring. These results highlight the effectiveness of combining multi-scale feature fusion with lightweight attention strategies to improve detection performance in visually complex environments.
Keywords
Object Detection; Feature Fusion; Attention Mechanism; Model Optimization; Defect Recognition
References
[1] Z. Du, S. Pan, R. Li, "Pothole detection from street view images using deep learning," IEEE Transactions on Intelligent Transportation Systems, Sep. 2020, vol. 21, no. 9, pp. 3911-3921. [2] C. Wang, S. Ma, M. Zhou, P. Zhang, "Improved YOLOv5 for Traffic Sign Detection," in 2022 3rd International Conference on Artificial Intelligence and Machine Learning (AIML), Shanghai, China, Nov. 2022, pp. 531-535. [3] S. Liu, L. Qi, X. Shen, G. Wang, J. Jia, "Learning Spatial Fusion for Single-Shot Object Detection," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, Jun. 2019, pp. 5440-5449. [4] H. Yang, B. Mao, X. Cai, Y. Chen, "SimAM: A Simple, Parameter-Free Attention Module for Convolutional Neural Networks," in International Conference on Machine Learning (ICML), PMLR, Jul. 2021, pp. 11843-11853. [5] N. Dalal, B. Triggs, "Histograms of oriented gradients for human detection," in 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), San Diego, CA, USA, Jun. 2005, vol. 1, pp. 886-893. [6] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, D. Ramanan, "Object detection with deformable part models," IEEE Transactions on Pattern Analysis and Machine Intelligence, Sep. 2010 vol. 32, no. 9, pp. 1627-1643. [7] C. Wang, Y. Li, L. Li, "A survey on YOLO object detection," Journal of Systems Engineering and Electronics, Dec. 2020, vol. 31, no. 6, pp. 1121-1130. [8] U. Nepal, E. Eslami, "A review of object detection and classification approaches for YOLO-based algorithms," SN Computer Science, Apr. 2022, vol. 3, no. 4, pp. 314. [9] Z. Tian, C. Shen, H. Chen, T. He, "FCOS: Fully Convolutional One-Stage Object Detection," in 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South), Oct. 2019, pp. 9626-9635. [10] J. Hu, L. Shen, G. Sun, "Squeeze-and-Excitation Networks," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, Jun. 2018, pp. 7132-7141. [11] S. Woo, J. Park, J. Y. Lee, I. So Kweon, "CBAM: Convolutional Block Attention Module," in Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, Sep. 2018, pp. 3-19. [12] T. -Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, S. Belongie, "Feature Pyramid Networks for Object Detection," in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, Jul. 2017, pp. 936-944. [13] S. Liu, D. Huang, Y. Wang, "Receptive Field Block Net for Accurate and Fast Object Detection," in Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, Sep. 2018, pp. 385-400.
Copyright @ 2020-2035 STEMM Institute Press All Rights Reserved