A Survey of Deep Learning-Based Multi-Sensor Fusion for 3D Object Detection in Autonomous Driving_Vol. 4 No. 2 (JIKE 2026)_Journal of Intelligence and Knowledge Engineering (ISSN: 2959-0620)

Home > Journal of Intelligence and Knowledge Engineering (ISSN: 2959-0620) > Vol. 4 No. 2 (JIKE 2026) >

A Survey of Deep Learning-Based Multi-Sensor Fusion for 3D Object Detection in Autonomous Driving

Download PDF

DOI: https://doi.org/10.62517/jike.202604219

Author(s)

Yining Liu*

Affiliation(s)

Stony Brook Institute at Anhui University, Anhui University, Hefei, Anhui, China *Corresponding Author.

Abstract

Multi-sensor fusion has become a fundamental framework for robust 3D object detection in autonomous driving, as it can mitigate the inherent limitations of single-modal perception under challenging real-world conditions. This paper presents a systematic review of mainstream cross-modal fusion paradigms (early, mid-level, and late fusion) tailored to 3D detection tasks, with a focus on their core mechanisms and deployment implications. We elaborate on the critical challenges of coordinate alignment and shared representation interfaces, highlighting the Bird’s-Eye View (BEV) as a promising unified fusion space that balances spatial awareness and planning compatibility. A comprehensive comparison is conducted among three dominant 3D representations-point-based, voxel-based, and BEV-based-revealing their inherent trade-offs in computational cost, memory consumption, spatial structure preservation, and deployment efficiency. Furthermore, we categorize typical deployment-facing failure modes, including domain shift, calibration drift, sensor dropout, and adversarial attacks, and summarize corresponding layered mitigation strategies such as health monitoring, consistency checking, data augmentation, and graceful degradation. This work provides a structured overview of the state-of-the-art in multi-sensor fusion for 3D detection, offering practical insights for both academic research and industrial deployment.

Keywords

Autonomous Driving; 3D Object Detection; Multi-Sensor Fusion; BEV; LiDAR-Camera; Deep Learning; Transformer

References

[1] B. Badjie, J. Cecílio, and A. Casimiro, “Adversarial Attacks and Countermeasures on Image Classification-based Deep Learning Models in Autonomous Driving Systems: A Systematic Review,” ACM Computing Surveys, vol. 57, no. 1, pp. 20:1-20:52, 2024. [2] X. Hua, J. Zeng, H. Li, et al., “A Review of Automobile Brake-by-Wire Control Technology,” Processes, vol. 11, no. 4, p. 994, 2023. [3] D. Coelho and M. Oliveira, “A Review of End-to-End Autonomous Driving in Urban Environments,” IEEE Access, vol. 10, pp. 75296-75311, 2022. [4] J. Hu, Y. Wang, S. Cheng, et al., “A Survey of Decision-Making and Planning Methods for Self-Driving Vehicles,” Frontiers in Neurorobotics, vol. 19, p. 1451923, 2025. [5] S. Y. Alaba and J. E. Ball, “Deep Learning-Based Image 3-D Object Detection for Autonomous Driving: Review,” IEEE Sensors Journal, vol. 23, no. 4, pp. 3378-3394, 2023. [6] H. Li, J. Y. Wang, L. W. Xu, et al., “Efficient and Accurate Object Detection for 3D Point Clouds in Intelligent Visual Internet of Things,” Multimedia Tools and Applications, vol. 80, no. 24, pp. 31297-31334, 2021. [7] Q. P. Chen, Y. F. Xie, S. F. Guo, et al., “Sensing System of Environmental Perception Technologies for Driverless Vehicle: A Review of State of the Art and Challenges,” Sensors and Actuators A: Physical, vol. 319, Art. no. 112566, 2021. [8] N. Kurian and K. Vadivukkarasi, “Sensor Data Fusion Methods for Driverless Vehicle System: A Review,” in Lecture Notes in Networks and Systems, vol. 475, pp. 333-344, 2023. [9] H. Wang, J. Y. Li, and H. R. Dong, “A Review of Vision-Based Multi-Task Perception Research Methods for Autonomous Vehicles,” Sensors, vol. 25, no. 8, p. 2611, 2025. [10] J. Karangwa, J. Liu, and Z. X. Zeng, “Vehicle Detection for Autonomous Driving: A Review of Algorithms and Datasets,” IEEE Transactions on Intelligent Transportation Systems, vol. 24, no. 11, pp. 11568-11594, 2023. [11] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature Pyramid Networks for Object Detection,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2017, pp. 936-944. [12] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You Only Look Once: Unified, Real-Time Object Detection,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2016, pp. 779-788. [13] A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, “YOLOv4: Optimal Speed and Accuracy of Object Detection,” arXiv preprint arXiv:2004.10934, 2020. [14] K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask R-CNN,” in Proc. IEEE Int. Conf. Computer Vision (ICCV), 2017, pp. 2980-2988. [15] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” in Advances in Neural Information Processing Systems (NeurIPS), vol. 28, 2015, pp. 91-99. [16] A. Geiger, P. Lenz, and R. Urtasun, “Are We Ready for Autonomous Driving? The KITTI Vision Benchmark Suite,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2012. [17] H. Caesar, V. Bankiti, A. H. Lang, et al., “nuScenes: A Multimodal Dataset for Autonomous Driving,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2020. [18] P. Sun, H. Kretzschmar, X. Dotiwalla, et al., “Scalability in Perception for Autonomous Driving: Waymo Open Dataset,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2020. [19] P. Sun, W. Wang, Y. Liu, et al., “PIXOR: Real-Time 3D Object Detection from Point Clouds,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2018. [20] A. H. Lang, S. Vora, H. Caesar, L. Zhou, J. Yang, and O. Beijbom, “PointPillars: Fast Encoders for Object Detection from Point Clouds,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2019. [21] Y. Yan, Y. Mao, and B. Li, “SECOND: Sparsely Embedded Convolutional Detection,” Sensors, vol. 18, no. 10, p. 3337, 2018. [22] T. Yin, X. Zhou, and P. Krähenbühl, “CenterPoint: A Center-based 3D Object Detector,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), 2021. [23] V. A. Si dagi, Y. Zhou, and O. Tuzel, “MVX-Net: Multimodal VoxelNet for 3D Object Detection,” in Proc. IEEE Int. Conf. Robotics and Automation (ICRA), 2019. [24] Y. Liu, T. Tian, Y. Zhang, et al., “BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird’s-Eye View Representation,” in Proc. IEEE Int. Conf. Robotics and Automation (ICRA), 2023. [25] K. Chitta, A. Prakash, and A. Geiger, “TransFuser: Imitation with Transformer-Based Sensor Fusion for Autonomous Driving,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.