STEMM Institute Press
Science, Technology, Engineering, Management and Medicine
Application and Optimization of NLP Pre-trained Models in Image Recognition
DOI: https://doi.org/10.62517/jsse.202508405
Author(s)
Xiujun Bai*
Affiliation(s)
Intelligent City Research Institute, China United Network Communications Group Co., Ltd., Beijing, China *Corresponding Author
Abstract
In the era of digitalization and intelligent manufacturing, ensuring workplace safety and optimizing operational efficiency are crucial. However, traditional manual inspection methods are inefficient and fail to meet real-time monitoring needs, while current computer vision techniques have limitations in data processing and multimodal integration. This paper presents a novel approach using natural language processing (NLP) to convert image recognition tasks into text classification problems via large multimodal models. Specifically, we propose HRG-BERT (Hybrid Representation Graph BERT), a lightweight model with a 4-layer Transformer architecture, which undergoes domain-specific pre-training and various training optimizations like dynamic masking and Chinese-term-level masking. Experimental results show that HRG-BERT outperforms BERT-base in both accuracy and F1-score for tasks such as safety helmet detection, work uniform verification, and employee misconduct monitoring, while requiring fewer computational resources. This research provides an effective intelligent solution for industrial safety management, demonstrating the feasibility and superiority of integrating NLP pre-trained models in image recognition to enhance safety and efficiency .The abstract is to be in fully-justified text as it is here, below the author information. Use the word “Abstract” as the title, in 11-point Times New Roman, boldface type, centered relative to the column, initially capitalized. The abstract is to be in 11-point, single-spaced type. Leave one blank line after the abstract, and then begin the main text. All manuscripts must be in English.
Keywords
Natural Language Processing; Pre-trained Models; Image Recognition; Industrial Safety; Multimodal Integration; Lightweight Model
References
[1]Deng, H. Exploration and Thoughts on the Digitalization of Safety and Environmental Protection in Traditional Chemical Enterprises. Digital Transformation, 2025, 2(04): 94 - 99. [2]Jin, Y., Yang, C. Z., Shao, K. W., et al. Application of Image Recognition Technology in Manufacturing Enterprises. Engineering Construction & Design, 2019, (20): 102 - 103. DOI: 10.13616/j.cnki.gcjsysj.2019.10.246. [3]Fei, J., Wang, T., Zhang, J., et al. Transferable decoding with visual entities for zero - shot image captioning//Proceedings of the IEEE/CVF international conference on computer vision. 2023: 3136 - 3146. [4]Jawahar, G., Sagot, B., Seddah, D. What does BERT learn about the structure of language?//ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics. 2019. [5]Sun, K. L., Luo, X. D., Luo, Y. R. A Review of the Applications of Pre - trained Language Models. Computer Science, 2023, 50(01): 176 - 184. [6]Vaswani, A., Shazeer, N., Parmar, N., et al. Attention Is All You Need//Advances in Neural Information Processing Systems. 2017, 30: 5998-6008. [7]Mankar, S. (2024). Domain Specific Adaptation of an Open-Source LLM (Large Language Model). International Journal for Research in Applied Science and Engineering Technology. https://doi.org/10.22214/ijraset.2024.58734 [8]Qin Donghong, Li Zhengtao, Bai Fengbo, et al. A Survey of Parameter-Efficient Fine-Tuning Techniques for Large Language Models. Computer Engineering and Applications, 1-30[2025-05-05]. [9]Liu Huan, Zhang Zhixiong, Wang Yufei. A Survey of Main Optimization Methods for BERT Model. Data Analysis and Knowledge Discovery, 2021, 5(01): 3-15. [10]Sun Ying, Wu Yanyong, Ding Derui, et al. Vehicle Trajectory Prediction Method Based on Edge Update and Multi-Head Interactive Fusion Transformer. Application Research of Computers, 1-7[2025-05-05]. [11]Yang Y, Zhang W, Lin H, et al. Applying masked language model for transport mode choice behavior prediction. Transportation Research Part A, 2024, 184104074-. [12]Zhang Tao, Ma Haiqun, Jiang Lei. Visual Explanation Research on BERT Algorithm for Intelligence Analysis Based on Gradient Salience. Library and Information Service, 2025, 69(01): 80-91. DOI: 10.13266/j.issn.0252-3116.2025.01.008. [13]Shushanta Pudasaini, Subarna Shakya. Question Answering on Biomedical Research Papers using Transfer Learning on BERT-Base Models//2023 7th International Conference on I-SMAC: IoT in Social, Mobile, Analytics and Cloud: I-SMAC 2023, Kirtipur, Nepal, 11-13 October 2023, [v.1]. 2023:496-501. [14]Quan Yiqi, Zhang Haitao, Yang Bin, et al. A Survey of Deep Learning-Based Named Entity Recognition. Microelectronics & Computer, 1-11[2025-05-05]. [15]Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. [16]Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention Is All You Need. [17]Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You Only Look Once: Unified, Real-Time Object Detection. Proceedings of the IEEE conference on computer vision and pattern recognition, 779–788. [18]Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., & Stoyanov, V. (2019). RoBERTa: A Robustly Optimized BERT Pretraining Approach. [19]Howard, J., & Ruder, S. (2018). Universal Language Model Fine-Tuning for Text Classification. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 328–339. [20]Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft COCO: Common Objects in Context. European conference on computer vision, 740–755. [21]Girshick, R. (2015). Fast R-CNN. Proceedings of the IEEE international conference on computer vision, 1440–1448. [22]Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. Advances in neural information processing systems, 28.
Copyright @ 2020-2035 STEMM Institute Press All Rights Reserved