Research on Low Significance Text Detection Algorithm in Text Translation
DOI: https://doi.org/10.62517/jike.202404312
Author(s)
Jianyu Li
Affiliation(s)
Artificial Intelligence Translation Shaanxi University Engineering Research Center, Xi'an Fanyi University, Xi'an, Shaanxi, China
Abstract
In order to address the issue of inconspicuous feature response resulting from the resemblance of certain text and background colours during the text translation process, a low saliency text detection algorithm has been proposed. Firstly, a channel spatial attention module is incorporated into the network, which concurrently assigns attention to the text target from both the channel and spatial dimensions to augment the features of the text target region. Secondly, a multi-scale feature fusion module is designed to connect different levels of information on the fusion feature map through a jump connection, thereby acquiring the fusion features with comprehensive multi-scale information. The results of the experiments demonstrate that the algorithm proposed in this paper is effective in detecting low-saliency text. Additionally, it is capable of directly outputting the enhanced image, which enhances the quality of the text translation.
Keywords
Attention Mechanism; Text Detection; Machine Translation; Multi-scale Feature Fusion
References
[1] Girshick R, Donahue J, Darrwll T, et al. rich feature hierarchies for accurate object detection and semantic segmentation[C]//IEEE Conference on Computer Vision & Pattern Recognition. ieee computer society, 2014: 580-587.
[2] Girshick R. Fast R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision. 2015: 1440-1448.
[3] He KM, Gkioxari G, Dollar P, et al. Mask R-CNN[A]. Proceedings of the IEEE International Conference on Computer Vision[C]. Venice: IEEE Press, 2017. 2961-2969.
[4] Redmon J, Divvala S, Girshick R, et al. You only look once: unified, real-time object detection[J]. IEEE, 2016: 779-788.
[5] Liu W, Anguelov D, Erhan D, et al. SSD: single shot multibox detector[C]//Proceedings of the European Conference on Computer Vision (ECCV),2016: 21-37.
[6] Lin TY, Goyal P, Girshick R, et al. Focal loss for dense object detection[C]//Proceedings of the IEEE International Conference on Computer Vision(ICCV), 2017: 2980-2988.
[7] Ultrarytics. YOLOv5 [EB/OL]. (2020-6-3) [2021-4-15]. https: // github. Com / Ultraytic / YOLOv5.
[8] Liu J, Zhu X, Song MM. Improved Chinese character detection for end-to-end natural scenes in Yolov2[J]. Control and Decision 2021:2483-2489.
[9] Yin H, Zhang Z, Wang YL. Chinese Text Detection based on YOLOv3 and MSER [J]. Computer Applications and Software 2013,8(10):168-172, 195.
[10] ao MT, Sun H, Tang YQ, etc. Fingerprint secondary feature detection method based on improved YOLOv5[J]. Advances in laser and optoelectronics: 1-19[2022-07-25] .
[11] Dong YS, Li ZX, Guo JY, etc. An improved YOLOV5 x-ray contraband detection model [J]. Advances in laser and optoelectronics: 1-17[2022-07-25] .
[12] Feng Z, Xie Z, Bao Z, et al. Real-time dense small target detection algorithm for uav based on improved YOLOv5[J]. Acta Aeronaut. Sin, 2022: 1-15.
[13] He K, Zhang X, Ren S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE transactions on pattern analysis and machine intelligence 2015, 37(9):1904-1916.
[14] Zhang YF, Ren W, Zhang Z, et al. Focal and efficient IOU loss for accurate bounding box regression[J/OL]. arXiv:2101.08158v1.
[15] Woo S, Park J, Lee JY, et al. CBAM: Convolutional block attention module [C] // Proceedings of the Europe⁃an conference on computer vision. Munich. Germany: springer, 2018:3-19.
[16] Lin TY, Dollár P, et al. Feature pyramid networks for object detection [C]// IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2017,5936-5944.
[17] [17] Li H, Xiong P, An J, et al. Pyramid attention network for semantic segmentation [C]//IEEE Conference on Computer Vision and Pattern Recognition. 2018, 1701-1709.
[18] Tan MX, Pang RM, Quoc V. Le. EfficientDet: scalable and efficient object detection[J]. arXiv:1911.09070 [cs.CV].
[19] Hu J, Li S. Squeeze and excitation networks [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence 2017, 99(10):7132-7141.
[20] Wang QL, Wu BG, Zhu PF, et al. ECA-net: efficient channel attention for deep convolutional neural networks[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, WA, USA. ieee,2022:11531-11539.
[21] Hou Q, Zhou D, Feng J. Coordinate attention for efficient mobile network design [C] //IEEE. Proceedings of the 2021 IEEE Conference on Computer Vision and Pattern Recognition. atlanta: ieee, 2021:4510-4522.
[22] Yu T, Li X, Cai YF, et al. S2-MLPv2: Improved spatial-shift mlp architecture for vision[C]//IEEE conference on Computer Vision and Pattern Recognition. 2021, arXiv:2108.01072 [cs.CV].
[23] Zhou B, Khosla A, Lapedriza A, et al. Learning deep features for discriminative localization[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. Las Vegas, NV, USA: IEEE, 2016: 2921-2929.
[24] Redmon J, Farhadi A. Yolov3: An incremental improvement[C]//IEEE conference on Computer Vision and Pattern Recognition, 2018, arXiv: 1804.0276.
[25] Bochkovskiy A, Wang CY, Liao HYM. YOLOv4: optimal speed and accuracy of object detection[C]//IEEE conference on Computer Vision and Pattern Recognition. 2020. arXiv:2004.10934v1 [cs.CV].