Diabetes Prediction Models Based on Machine Learning
DOI: https://doi.org/10.62517/jbdc.202301102
Author(s)
Liu Bobo1, Kang Xiaofei1, Zhang Zongyue2, Hou Liying2,*, Wang Yidan1,*
Affiliation(s)
1College of Science, North China University of Technology, Tangshan, Hebei, China
2School of Public Health, North China University of Technology, Tangshan, Hebei, China
*Corresponding Author
Abstract
Objective to compare the predictive efficacy of random forest, BP neural network, gradient boosting tree and plain Bayesian models for the prevalence of diabetes. Practical application: by measuring the basic indicators such as individual height, weight, triglyceride, etc., the model can be used to predict the probability of individual disease, and then targeted to improve some indicators of the body, to achieve the effect of diabetes prevention intervention, and to provide new ideas for diabetes prevention research. Methods Using the 2009 survey data from the China Health and Nutrition Survey (CHNS), the data for men and women were statistically analyzed by dividing them into four groups according to the visceral fat index (VAI). Subsequently, the processed samples were divided into training sets and test sets by 4:1, and four machine learning models, namely, random forest, BP neural network, gradient lifting tree, and naive Bayes, were constructed. The experiment was conducted using a five-fold cross validation method, and the prediction effect was evaluated through indicators such as sensitivity, accuracy, and AUC. Results One-way ANOVA showed that the differences in height, weight, waist circumference, triglycerides, high-density lipoprotein cholesterol, body mass index, fasting blood glucose, and glycosylated hemoglobin among different VAI quartile groups were statistically significant (P<0.05). Comparison of prediction effects of four models: sensitivity 75.75%, 90.77%, 76.31%, 98.57%, accuracy 74.80%, 87.82%, 74.64%, 92.00%, AUC 0.713, 0.716, 0.668, 0.676, and Jorden index 0.34, 0.27, 0.22 and 0.21. Conclusion Based on the CHNS 2009 survey data, the BP neural network model has a better effect and stability in predicting diabetes.
Keywords
Visceral Adiposity Index; Diabetes; Random Forest; BP Neural Network; Gradien Boosting Decision Tree; Naive Bayes Model
References
[1] Lin XL. Analysis of global, regional and national disease burden of diabetes from 1990 to 2017 [D]. Zhejiang University, 2020. DOI:10.27461/d. cnki. gzjdx. 2020.003332.
[2] Yang JJ, Yu D, Wen W, et al. Association of Diabetes With All-Cause and Cause-Specific Mortality in Asia: A Pooled Analysis of More Than 1 Million Participants. JAMA Netw Open. 2019; 2(4):e192696. Published 2019 Apr 5. doi:10.1001/jamanetworkopen. 2019.2696
[3] Zhang LW, Ruan MH, Liu JL, et al. Analysis of research and development trend in the field of diabetes [J/OL]. HEREDITY:1-26[2022-10-21]. DOI:10.16288/j. yczz. 22-272.
[4] Ma SL, Xv YJ, Meng RL, et al. Effects of blood pressure and overweight/obesity on diabetes in Guangdong residents aged 40 and above [J]. Chinese Journal of Disease Control, 2022, 26(04):397-400+429. DOI:10.16462/j. cnki. zhjbkz. 2022.04.006.
[5] Sun H, Saeedi P, Karuranga S, et al. IDF Diabetes Atlas: Global, regional and country-level diabetes prevalence estimates for 2021 and projections for 2045. Diabetes Res Clin Pract. 2022; 183:109119. DOI:10.1016/j. diabres. 2021.109119
[6] Yv C, Wang MZ, Jin YF, et al. Influence of interaction between family history and obesity on the incidence of diabetes in pre diabetes population [J]. Chinese Journal of Disease Control, 2020, 24(09):997-1002. DOI:10.16462/j. cnki. zhjbkz. 2020.09.002.
[7] Zou DJ, Zhang Z, Ji LN. Chinese Expert Consensus on Alleviating Type 2 diabetes [J]. China General Practice, 2021, 24(32):4037-4048.
[8] AMATO M C, GIORDANO C, GALIA M, etal. Visceral adiposity index:a reliable indicator of visceral fat function associated with cardiometabolic risk [J]. Diabetes Care, 2010, 33(4):920-922. DOI:10.2337/dc09-1825.
[9] Cao YY, Tang X, Sun KX, et al. Relationship between glycemic control and visceral adiposity index in patients with type 2 diabetes mellitus [J]. Journal of Peking University (Medical Edition), 2017, 49(03):446-450.
[10] Yue FR, Tian YH. Visceral fat index and its predictive value in middle-aged and elderly patients with diabetes [J]. Ningxia Med J, 2021, 43(10):923-926+864. DOI:10.13621/j. 1001-5949.2021.10.0923.
[11] Miao Y, Chen P, Yan PJ, et al. Study on the correlation between visceral fat index and the prognosis of pre diabetes patients to diabetes [J]. Journal of the Third Military Medical University, 2020, 42(21):2154-2161. DOI:10.16016/j. 1000-5404.2202006124.
[12] Deng LF, Liang J, Lu D, et al. Analysis and comparison of risk prediction models for intractable postpartum urinary retention constructed by three statistical methods [J]. Journal of Guangxi Medical University, 2022, 39(09):1442-1447. DOI:10.16190/j. cnki. 45-1211/r. 2022.09.015.
[13] Li LW, Huang Q, Shi JC, et al. Analysis and Comparison of Hypertension Incidence Prediction Models for Overweight and Obese People Based on Three Statistical Methods [J]. Modern Preventive Medicine, 2021, 48(11):2061-2066.
[14] Liu YH, Song J, Li MJ, et al. Study on the Classification Model of Hypoproliferative Myelodysplastic Syndrome and Aplastic Anemia Based on Data Mining [J]. Modern Preventive Medicine, 2021, 48(17):3254-3258.
[15] Mei Z. HbA1c included in the diagnostic criteria of diabetes [J]. Jiangsu Health Care, 2021(07):50.
[16] Sun J, Fan M, Cui XD, et al. A Multi beam Seabed Sediment Classification Method Based on the Combination of ReliefF and Stochastic Forest Model [J]. Marine Science Bulletin, 2022, 41(02):131-139.
[17] Yu F, Wang KJ, Zhang WL, et al. Prediction of coagulant dosage for in-situ turbidity control in water ecological restoration based on genetic algorithm optimized BP neural network [J/OL]. Journal of Environmental Engineering:1-12[2022-10-14].
[18] Ma LF, Xiao HM, Tao JW, et al. Intelligent classification of lithology based on gradient lifting decision tree algorithm [J]. Petroleum Geology and Recovery Efficiency, 2022, 29(01):21-29. DOI:10.13673/j. cnki. cn37-1359/te. 2022.01.003.
[19] Li SQ, Lv WY, Deng X, et al. Naive Bayesian Classification Algorithm Based on Improved PCA [J]. Statistics and decision-making, 2022, 38(01):34-37. DOI:10.13546/j. cnki. tjyjc. 2022.01.007.
[20] Tong R, Kan LH, Zhu ZS. Prognostic Modeling of Heart Failure Based on Logistic Regression and Random Forest [J/OL]. Journal of Fudan University (Medical Edition):1-9[2022-10-14].