STEMM Institute Press
Science, Technology, Engineering, Management and Medicine
Optimal NIPT Timing Prediction Based on K-Means Clustering and XGBoost Regression
DOI: https://doi.org/10.62517/jes.202602114
Author(s)
Junnan Yang1, Hanbing Liu1, Yichen Ma1, Wanwan Wang2
Affiliation(s)
1School of Artificial Intelligence and Big Data, Henan University of Technology, Zhengzhou, Henan, China 2iFLYTEK Co., Ltd., Hefei, Anhui, China
Abstract
With the deep application of artificial intelligence in smart healthcare, utilizing data mining techniques to solve complex clinical decision-making problems has become a research hotspot. Addressing the challenge of high failure rates in Non-Invasive Prenatal Testing (NIPT) caused by maternal physical heterogeneity, this study proposes a data-driven stratified augmented prediction framework. Firstly, the K-Means clustering algorithm is introduced to perform unsupervised stratification on multi-dimensional clinical data, effectively resolving distribution differences among samples. Subsequently, a feature-augmented XGBoost ensemble learning model is constructed to accurately fit the detection baselines of different populations by capturing non-linear interactions among physiological features. Finally, combined with Monte Carlo simulation technology, the uncertainty of prediction results is quantified, and historical data biases are corrected. Experimental results demonstrate that the proposed framework performs exceptionally well in high-risk populations with severe obesity. Compared with traditional linear models, the prediction error is significantly reduced, and the interpretability is greatly improved. The differentiated sampling recommendation table generated by this study provides a scientific basis for realizing "precision triage" and optimizing the allocation of medical resources in clinical practice.
Keywords
Clustering Stratification; Ensemble Learning; Stochastic Simulation; Intelligent Triage
References
[1] S. Qin, H. Wang, Y. Liu, et al., "Performance evaluation of NIPT on 24 chromosomes in 118,969 pregnant women in Sichuan, China," J. Int. Med. Res., vol. 52, no. 9, p. 3000605241274584, 2024. [2] C. Deng and S. Liu, "Factors Affecting the Fetal Fraction in Noninvasive Prenatal Screening," Front. Pediatr., vol. 10, p. 812781, 2022. [3] L. Qiao, Q. Zhang, Y. Liang, et al., "Sequencing of short cfDNA fragments in NIPT improves fetal fraction with higher maternal BMI and early gestational age," Am. J. Transl. Res., vol. 11, no. 9, pp. 4450–4459, 2019. [4] M. Zaki-Dizaji, M. Akbari, K. Kamali, et al., "Maternal and fetal factors affecting cfDNA fraction in prenatal screening: a systematic review," J. Reprod. Immunol., vol. 160, p. 103533, 2023. [5] Y. Hou, D. Lv, Y. Lai, et al., "Factors affecting cell-free DNA fetal fraction: statistical analysis of 13,661 maternal plasmas for NIPT," Hum. Genom., vol. 13, no. 1, p. 11, 2019. [6] H. W. Loh, C. P. Ooi, S. Seoni, et al., "Application of explainable artificial intelligence in healthcare: A systematic review of the last decade (2011–2022)," Comput. Methods Programs Biomed., vol. 226, p. 107161, 2022. [7] H. B. Lee, H. S. Lee, S. Kim, et al., "Development and performance evaluation of an artificial intelligence algorithm for non-invasive prenatal testing using cell-free DNA fragment distance," Front. Genet., vol. 13, p. 999587, 2022. [8] D. Chicco and G. Jurman, "The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation," BMC Genom., vol. 21, Art. no. 6, 2020. [9] S. P. Lloyd, "Least squares quantization in PCM," IEEE Trans. Inf. Theory, vol. 28, no. 2, pp. 129–137, Mar. 1982. [10] B. Zhou, B. Lu, and S. Saeidlou, "A hybrid clustering method based on the several diverse basic clustering and meta-clustering aggregation technique," Cybern. Syst., vol. 54, no. 3, pp. 1–27, 2022. [11]T. Chen and C. Guestrin, "XGBoost: A scalable tree boosting system," in Proc. 22nd ACM SIGKDD Int. Conf. Knowl. Discov. Data Min., 2016, pp. 785–794. [12]F. Rahmayanti, A. Pradana, B. M. W. Budiman, et al., "Comparison of machine learning algorithms for classification of fetal health using cardiotocogram data," J. Inf. Syst. Eng. Bus. Intell., vol. 8, no. 1, pp. 22–32, 2022. [13]A. Ogunleye and Q. G. Wang, "XGBoost model for chronic kidney disease diagnosis," IEEE/ACM Trans. Comput. Biol. Bioinform., vol. 17, no. 6, pp. 2131–2140, 2020. [14]J. H. Jones and N. Fleming, "Simulation with Monte Carlo methods to focus quality improvement efforts on interventions with the greatest potential for reducing PACU length of stay: a cross-sectional observational study," BMJ Open Qual., vol. 13, no. 4, p. e002933, 2024. [15]Y. Cao, M. P. Forssten, B. Sarani, et al., "Development and Validation of an XGBoost-Algorithm-Powered Survival Model for Predicting In-Hospital Mortality Based on 545,388 Isolated Severe Traumatic Brain Injury Patients from the TQIP Database," J. Pers. Med., vol. 13, no. 9, p. 1401, 2023. [16]S. M. Lundberg, B. Nair, M. S. Vavilala, et al., "Explainable machine learning predictions for the prevention of hypoxaemia during surgery," Nat. Biomed. Eng., vol. 2, no. 10, pp. 749–760, 2018.
Copyright @ 2020-2035 STEMM Institute Press All Rights Reserved