STEMM Institute Press
Science, Technology, Engineering, Management and Medicine
PCA-Integrated LightGBM and XGBoost Model for Pattern Recognition and Interpretability Analysis of Telecom Fraud
DOI: https://doi.org/10.62517/jbdc.202501407
Author(s)
Donghao Li*, Xiaohan Wang, Xiaoyu Lu, Bing He
Affiliation(s)
Henan University of Technology, Zhengzhou, Henan, China *Corresponding Author
Abstract
In response to the increasing complexity of telecom network fraud and issues such as high-dimensional imbalanced data, an integrated model based on LightGBM and XGBoost is proposed in this paper. The prediction results are fused using Principal Component Analysis (PCA), and model interpretability is enhanced through SHAP values. First, raw transaction data are preprocessed and subjected to feature engineering. Then, model parameters are optimized via cross-validation, constructing a fraud detection pathway of "identification–interpretation–integration". Experimental results show that the PCA-fused model outperforms individual models in both detection performance and interpretability, providing an effective intelligent solution for accurate telecom fraud detection.
Keywords
Telecom Fraud; Ensemble Learning; PCA Fusion; SHAP Interpretation; LightGBM; XGBoost
References
[1]China Academy of Information and Communications Technology. White Paper on Prevention and Governance of Telecom and Online Fraud. 2023. [2]Tianpei XU, Yongsheng LUO. A Credit Card Fraud Detection Model Based on Ensemble Learning. Information System Engineering, 2024, (01): 129-132. [3]Y Zhang, et al. Gradient Boosting Machines: A Survey. ACM Computing Surveys, 2020, 53(5): 1–30. [4]LightGBM Documentation. Microsoft, 2023. [5]T Chen, Guestrin C. XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016: 785–794. [6]Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, 2001. [7]IEEE-CIS Fraud Detection Dataset. Kaggle, 2019. [8]Heng WANG, Yanan JIANG, Xin ZHANG, et al. A Lithology Identification Method Based on the Gradient Boosting Algorithm. Journal of Jilin University (Earth Science Edition), 2021, (03): 940-950. [9]Lundberg S M, Lee S I. A Unified Approach to Interpreting Model Predictions. Advances in Neural Information Processing Systems, 2017: 4765–4774. [10]Wei ZHAO, Ming LI, Yue SUN. An Interpretable Fraud Detection Framework Combining GBDT and SHAP for Highly Imbalanced Data. Journal of Electronics & Information Technology, 2023, 45(8): 2801-2810. [11]Jolliffe I T. Principal Component Analysis. Springer, 2002.
Copyright @ 2020-2035 STEMM Institute Press All Rights Reserved