STEMM Institute Press
Science, Technology, Engineering, Management and Medicine
Review of Object Detection
DOI: https://doi.org/10.62517/jike.202604207
Author(s)
Jiwei Ye
Affiliation(s)
School of Intelligent Manufacturing, Hebei Polytechnic University of Industry, Shijiazhuang, China
Abstract
Object detection, as a fundamental and critical problem in the field of computer vision, holds significant theoretical research value and demonstrates extensive industrial application value in numerous practical scenarios. This technology has been applied across various domains, including facial recognition, industrial defect detection, and remote sensing detection. Based on a comprehensive review of relevant domestic and international literature, this paper presents a survey of object detection techniques. It begins by outlining the key developmental stages of object detection technology. Subsequently, it discusses aspects such as datasets, evaluation metrics, and performance analysis, classifies existing object detection algorithms, and analyzes and compares the performance of different algorithms on datasets. Finally, the paper provides an outlook on future research directions in object detection.
Keywords
Object Detection; Convolutional Neural Network; Deep Learning
References
[1] Pravallika, M. F. Hashmi and A. Gupta, "Deep Learning Frontiers in 3D Object Detection: A Comprehensive Review for Autonomous Driving," in IEEE Access, vol. 12, pp. 173936-173980, 2024. [2] Gui, S., Song, S., Qin, R., & Tang, Y. (2024). Remote sensing object detection in the deep learning era—a review. Remote Sensing, 16(2), 327. [3] Bakhtiyorov, Sanjar, et al. "Real-Time Object Detector for Medical Diagnostics (RTMDet): A High-Performance Deep Learning Model for Brain Tumor Diagnosis." Bioengineering 12.3 (2025): 274. [4] Lowe, David G. "Distinctive image features from scale-invariant keypoints." International journal of computer vision 60.2 (2004): 91-110. [5] Viola, Paul, and Michael Jones. "Rapid object detection using a boosted cascade of simple features." Proceedings of the 2001 IEEE computer society conference on computer vision and pattern recognition. CVPR 2001. Vol. 1. Ieee, 2001. [6] Dalal, Navneet, and Bill Triggs. "Histograms of oriented gradients for human detection." 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR'05). Vol. 1. Ieee, 2005. [7] Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2014: 580-587. [8] Uijlings J R R, Van De Sande K E A, Gevers T, et al. Selective search for object recognition[J]. International journal of computer vision, 2013, 104(2): 154-171. [9] Girshick R. Fast r-cnn[C]//Proceedings of the IEEE international conference on computer vision. 2015: 1440-1448. [10] Faster, R. C. N. N. (2015). Towards real-time object detection with region proposal networks. Advances in neural information processing systems, 9199(10.5555), 2969239-2969250. [11] He K, Gkioxari G, Dollár P, et al. Mask r-cnn[C]//Proceedings of the IEEE international conference on computer vision. 2017: 2961-2969. [12] Redmon, Joseph, et al. "You only look once: Unified, real-time object detection."Proceedings of the IEEE conference on computer vision and pattern recognition. 2016. [13] Redmon J, Farhadi A. YOLO9000: better, faster, stronger[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 7263-7271. [14] Redmon J, Farhadi A. Yolov3: An incremental improvement[J]. arXiv preprint arXiv:1804.02767, 2018. [15] Bochkovskiy, Alexey, Chien-Yao Wang, and Hong-Yuan Mark Liao. "Yolov4: Optimal speed and accuracy of object detection." arXiv preprint arXiv:2004.10934 (2020). [16] Liu W, Anguelov D, Erhan D, et al. Ssd: Single shot multibox detector[C]//European conference on computer vision. Cham: Springer International Publishing, 2016: 21-37. [17] Fu C Y, Liu W, Ranga A, et al. Dssd: Deconvolutional single shot detector[J]. arXiv preprint arXiv:1701.06659, 2017. [18] Li Z, Yang L, Zhou F. FSSD: feature fusion single shot multibox detector[J]. arXiv preprint arXiv:1712.00960, 2017. [19] Lin, Tsung-Yi, et al. "Focal loss for dense object detection." Proceedings of the IEEE international conference on computer vision. 2017. [20] Robinson, Isaac, et al. "RF-DETR: neural architecture search for real-time detection transformers." arXiv preprint arXiv:2511.09554 (2025). [21] Carion, Nicolas, et al. "End-to-end object detection with transformers." European conference on computer vision. Cham: Springer International Publishing, 2020. [22] Zhu, Xizhou, et al. "Deformable detr: Deformable transformers for end-to-end object detection." arXiv preprint arXiv:2010.04159 (2020). [23] Lyu, Chengqi, et al. "Rtmdet: An empirical study of designing real-time object detectors." arXiv preprint arXiv:2212.07784 (2022). [24] Everingham M, Van Gool L, Williams C K I, et al. The pascal visual object classes (voc) challenge[J]. International journal of computer vision, 2010, 88(2): 303-338. [25] Deng J, Dong W, Socher R, et al. Imagenet: A large-scale hierarchical image database[C]//2009 IEEE conference on computer vision and pattern recognition. Ieee, 2009: 248-255. [26] Lin, TY. et al. (2014). Microsoft COCO: Common Objects in Context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds) Computer Vision – ECCV 2014. ECCV 2014. Lecture Notes in Computer Science, vol 8693. Springer, Cham. https://doi.org/10.1007/978-3-319-10602-1_48 [27] Kuznetsova, A., Rom, H., Alldrin, N. et al. The Open Images Dataset V4. Int J Comput Vis 128, 1956–1981 (2020). https://doi.org/10.1007/s11263-020-01316-z [28] Shaik A N, Villarini B, Argyriou V. A Deep Learning Approach for Facial Attribute Manipulation and Reconstruction in Surveillance and Reconnaissance[J]. arXiv preprint arXiv:2506.06578, 2025. [29] Sun Z, Liu Z. Ensuring privacy in face recognition: a survey on data generation, inference and storage[J]. Discover Applied Sciences, 2025, 7(5): 441. [30] Zhang L, Wang Z, Ma Y, et al. Steel surface defect detection algorithm based on improved YOLOv10[J]. Scientific Reports, 2025, 15(1): 32827. [31] Li Q B, Fan Z C, Zhao X Y. An advanced adaptive detector for oriented objects in remote sensing imagery[J]. Scientific Reports, 2025, 15(1): 33877.
Copyright @ 2020-2035 STEMM Institute Press All Rights Reserved