STEMM Institute Press
Science, Technology, Engineering, Management and Medicine
Improved Pose Estimation Network Based on Spatial Registration Model
DOI: https://doi.org/10.62517/jes.202402307
Author(s)
Zexiang Liu1,2, Ziao Dong1,2, Xin Yin1,2, Yanbing Liang1,2,*
Affiliation(s)
1Hebei Key Laboratory of Data Science and Application, North China University of Science and Technology, Tangshan, Hebei, China 2College of Science, North China University of Science and Technology, Tangshan, Hebei, China *Corresponding Author
Abstract
Position and attitude estimation refer to estimating the distance and attitude between the object to be measured and the sensor device from the input information captured by the sensor device. The traditional computer vision processing method to optimize the feature vector extracted by the feature extraction algorithm often requires a large number of resources to optimize the model. The introduction of deep learning provides a new solution. In this paper, we use the depth learning Mask-R-CNN framework to recognize and segment the object, and obtain the size and contour features of the object; Then VGG-16 algorithm is used to extract RGB image features; Then, the mask information extracted by Mask R-CNN is fused for convolution and pooling; Finally, the convolved feature map is collected into two fully connected layers to predict the translation matrix and rotation matrix respectively. At the same time, after the fusion, the feature map is sampled up and convolved, and finally the feature map with the same size as the original input image is obtained. The output is expected to be consistent with the input mask. By establishing the sensor space registration model, the least square method and the generalized least square method are used to estimate the error parameters and compensate them to the sensor system, so as to reduce the error impact caused by the equipment observation data and obtain more accurate position and attitude information. The experimental comparison shows that the average proportion of the traditional Pose CNN to predict the object's position and attitude error is 64.6 within 8 cm. The average accuracy of the position and attitude estimation network studied in this paper is 81.5, and the position and attitude estimation network have better generalization ability.
Keywords
Position and Attitude Estimation; System Error; Spatial Registration Model; Target Detection and Segmentation; Deep Learning
References
[1] Togias, T., Gkournelos, C., Angelakis, P., Michalos, G., & Makris, S. (2021). Virtual reality environment for industrial robot control and path design. Procedia CIRP, 100, 133-138. [2] Song, R., Li, F., Fu, T., & Zhao, J. (2020). A robotic automatic assembly system based on vision. Applied Sciences, 10(3), 1157. [3] Malik, J., Elhayek, A., Nunnari, F., Varanasi, K., Tamaddon, K., Heloir, A., & Stricker, D. (2018, September). Deephps: End-to-end estimation of 3d hand pose and shape by learning from synthetic depth. In 2018 International Conference on 3D Vision (3DV) (pp. 110-119). IEEE. [4] Liu, Y., & Lew, M. S. (2016). Learning relaxed deep supervision for better edge detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 231-240). [5] Lu, H., & Shi, H. (2020). Deep learning for 3d point cloud understanding: a survey. arXiv preprint arXiv:2009.08920. [6] Yu, C., Liu, Z., Liu, X. J., Xie, F., Yang, Y., Wei, Q., & Fei, Q. (2018, October). DS-SLAM: A semantic visual SLAM towards dynamic environments. In 2018 IEEE/RSJ international conference on intelligent robots and systems (IROS) (pp. 1168-1174). IEEE. [7] Li, C., Xia, Y., Yang, M., & Wu, X. (2022). Study on TLS point cloud registration algorithm for large-scale outdoor weak geometric features. Sensors, 22(14), 5072. [8] Zhao, B., Chen, X., Le, X., Xi, J., & Jia, Z. (2021). A comprehensive performance evaluation of 3-D transformation estimation techniques in point cloud registration. IEEE Transactions on Instrumentation and Measurement, 70, 1-14. [9] Sun, Y. (2021, April). Design and Research on Distributed Control System of Humanoid Robot Based on Automation Technology. In 2021 IEEE Asia-Pacific Conference on Image Processing, Electronics and Computers (IPEC) (pp. 1063-1066). IEEE. [10] Wang, J., Zeng, Y., Wei, S., Wei, Z., Wu, Q., & Savaria, Y. (2021). Multi-sensor track-to-track association and spatial registration algorithm under incomplete measurements. IEEE Transactions on Signal Processing, 69, 3337-3350. [11] Yang, A., Yang, X., Liu, W., Han, Y., & Zhang, H. (2019). Research on 3D positioning of handheld terminal based on particle swarm optimization. Journal of Internet Technology, 20(2), 563-572. [12] Li, H., & Zhang, B. (2021). Application of integrated binocular stereo vision measurement and wireless sensor system in athlete displacement test. Alexandria Engineering Journal, 60(5), 4325-4335. [13] Brown, J. D., & Brown, J. D. (2018). Generalized Least Squares Estimation. Advanced Statistics for the Behavioral Sciences: A Computational Approach with R, 189-217. [14] Brownlee, J. (2019). Deep learning for computer vision: image classification, object detection, and face recognition in python. Machine Learning Mastery. [15] Zou, Z., Chen, K., Shi, Z., Guo, Y., & Ye, J. (2023). Object detection in 20 years: A survey. Proceedings of the IEEE, 111(3), 257-276. [16] Gao, H., Zhang, Y., Chen, Z., Xu, S., Hong, D., & Zhang, B. (2023). A multidepth and multibranch network for hyperspectral target detection based on band selection. IEEE Transactions on Geoscience and Remote Sensing, 61, 1-18. [17] Zou, X. (2019, August). A review of object detection techniques. In 2019 International conference on smart grid and electrical automation (ICSGEA) (pp. 251-254). IEEE. [18] Kurani, A., Doshi, P., Vakharia, A., & Shah, M. (2023). A comprehensive comparative study of artificial neural network (ANN) and support vector machines (SVM) on stock forecasting. Annals of Data Science, 10(1), 183-208. [19] Xu, X., Zhao, M., Shi, P., Ren, R., He, X., Wei, X., & Yang, H. (2022). Crack detection and comparison study based on faster R-CNN and mask R-CNN. Sensors, 22(3), 1215. [20] Josifovski, J., Kerzel, M., Pregizer, C., Posniak, L., & Wermter, S. (2018, October). Object detection and pose estimation based on convolutional neural networks trained with synthetic data. In 2018 IEEE/RSJ international conference on intelligent robots and systems (IROS) (pp. 6269-6276). IEEE. [21] Qiao, L., Zhao, Y., Li, Z., Qiu, X., Wu, J., & Zhang, C. (2021). Defrcn: Decoupled faster r-cnn for few-shot object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 8681-8690). [22] Bharati, P., & Pramanik, A. (2020). Deep learning techniques—R-CNN to mask R-CNN: a survey. Computational Intelligence in Pattern Recognition: Proceedings of CIPR 2019, 657-668. [23] Vukicevic, A. M., Macuzic, I., Mijailovic, N., Peulic, A., & Radovic, M. (2021). Assessment of the handcart pushing and pulling safety by using deep learning 3D pose estimation and IoT force sensors. Expert Systems with Applications, 183, 115371. [24] Kirillov, A., Girshick, R., He, K., & Dollár, P. (2019). Panoptic feature pyramid networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 6399-6408).
Copyright @ 2020-2035 STEMM Institute Press All Rights Reserved