Construction of a Prevalence Prediction Model for COVID-19 in Counties of Guizhou Province: Based on the Random Forest Algorithm
DOI: https://doi.org/10.62517/jmhs.202605208
Author(s)
Shouyan Chen1, Li Xi1, Junhua Wang1, Jiangping Zhang1,2,*
Affiliation(s)
1School of Public Health, Key Laboratory of Environmental Pollution Monitoring and Disease Control, Ministry of Education, Guizhou Medical University, Guiyang, Guizhou, China
2Guiyang Public Health Treatment Center, Guiyang, Guizhou, China
*Corresponding Author
Abstract
This research intends to construct a county-level prediction model for COVID-19 prevalence in Guizhou Province based on the Random Forest algorithm. The purpose was to offer a foundationfor regional epidemic risk early-warning and the optimal allocation of prevention and control resources. From February to March 2020, a survey on COVID-19 protection knowledge was carried out among 1,987 residents in Guizhou Province through “Wenjuanxing” platform. Data regarding demographic characteristics and eight knowledge dimensions were gathered. The Random Forest algorithm was utilized for feature selection and model building. Optimal parameters were ascertained via 5-fold cross-validation. SHAP values were utilized for feature interpretation, and the model’s performance was further compared with XGBoost and Linear Regression.On the test dataset, the Random Forest model yielded an MSE of 0.0930, an RMSE of 0.3049, and an MAE of 0.1841. These results indicated that it had better predictive performance comparedto the reference models. The five most important features were: Knowledge of mental health intervention (Q8.3), Total knowledge score (point), Basic knowledge score (Q1), Knowledge of transmission routes (Q1.1), and Knowledge of community prevention and control (Q7). SHAP analysis further confirmed the positive impacts of thesefeatures on the prediction results. The prediction model constructed with the RandomForest algorithm shows high accuracy and interpretability. It can pinpoint the key factors influencing the risk of COVID-19 prevalence in Guizhou's counties, thereby providing scientific backing for formulating differentiated prevention and control strategies.
Keywords
COVID-19; Prediction of Prevalence; Random Forest; Machine Learning
References
[1] HUANG C, WANG Y, LI X, et al. Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China. Lancet, 2020, 395(10223): 497-506.
[2] WANG C, HORBY P W, HAYDEN F G, et al. A novel coronavirus outbreak of global health concern. Lancet, 2020, 395(10223): 470-473.
[3] WANG QIAN. Spatiotemporal Analysis and Transmission Risk Prediction Model of COVID-19 in China. Yunnan Normal University, 2022.
[4] PING K, LEI M, GOU Y, et al. Epidemiologic Characteristics of COVID-19 in Guizhou Province, China. J Infect Dev Ctries, 2021, 15(3): 389-397.
[5] CHU XIANGYUAN, ZHANG RONG, YUAN PING, et al. Investigation and Analysis of KAP and Satisfaction on COVID-19 Prevention and Control among Urban and Rural Residents in Zunyi City. Journal of Jianghan University (Natural Science Edition), 2023, 51(02): 44-50.
[6] GELDSETZER P. Knowledge and Perceptions of COVID-19 among the General Public in the United States and the United Kingdom: A Cross-sectional Online Survey. Ann Intern Med, 2020, 173(2): 157-160.
[7] LI R, PEI S, CHEN B, et al. Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (SARS-CoV-2). Science, 2020, 368(6490): 489-493.
[8] LI Q, GUAN X, WU P, et al. Early Transmission Dynamics in Wuhan, China, of Novel Coronavirus-Infected Pneumonia. N Engl J Med, 2020, 382(13): 1199-1207.
[9] MORGAN R D, YOUSSI B W, CACAO R, et al. Random Forest Prognostication of Survival and 6-Month Outcome in Pediatric Patients Following Decompressive Craniectomy for Traumatic Brain Injury. World Neurosurg, 2025, 193: 861-867.
[10]FENG QING, HE PEIFENG. Construction of a Risk Prediction Model for Carbapenem-Resistant Acinetobacter baumannii Ventilator- Associated Pneumonia Based on Random Forest Model. Chinese Nursing Research, 2024, 38(19): 3410-3416.
[11]LE T, KIM H, KANG H, et al. Classification and Explanation for Intrusion Detection System Based on Ensemble Trees and SHAP Method. Sensors (Basel), 2022, 22(3).
[12]DEJUS B, CACIVKINS P, GUDRA D, et al. Wastewater-based prediction of COVID-19 cases using a random forest algorithm with strain prevalence data: A case study of five municipalities in Latvia. Sci Total Environ, 2023, 891: 164519.
[13]REN JIANQIANG, CUI YAPENG, NI SHUNJIANG. COVID-19 Epidemic Trend Prediction Method Based on Machine Learning. Journal of Tsinghua University (Science and Technology), 2023, 63(06): 1003-1011.
[14]DENG LIFANG. A Study on the Relationship between Residents' Health Literacy and COVID-19 Prevention Knowledge and Behavior Based on Structural Equation Modeling. Nanchang University, 2023.
[15]CHEN Y, OUYANG L, BAO F S, et al. A Multimodality Machine Learning Approach to Differentiate Severe and Nonsevere COVID-19: Model Development and Validation. J Med Internet Res, 2021, 23(4): e23948.
[16]TIWARI A, DADHANIA A V, RAGUNATHRAO V A B, et al. Using machine learning to develop a novel COVID-19 Vulnerability Index (C19VI). Sci Total Environ, 2021, 773: 145650.
[17]ZHANG X, MAGGIONI V, HOUSER P, et al. The impact of weather condition and social activity on COVID-19 transmission in the United States. J Environ Manage, 2022, 302 (Pt B): 114085.
[18]ZHANG MING, DU SHAOLING, HUANG LONG, et al. Investigation and Analysis of COVID-19 Vaccine Knowledge, Attitude and Practice among Residents in East China Based on Network Survey. Journal of Jinzhou Medical University (Social Science Edition), 2021, 19(04): 15-19.
[19]JIA LILI, OUYANG JING, QIAO YOULIN, et al. Investigation on COVID-19 Related Knowledge Cognition of Different Populations in Shaanxi Province. Medicine and Society, 2021, 34(02): 94-98.
[20]LU MINGXIA, LI ZHIYUAN, BAI YIRAN, et al. Analysis of COVID-19 Vaccine Cognitive Level, Willingness to Receive Second Booster Immunization and Its Influencing Factors Among 12,117 People Based on New Media. Modern Disease Prevention and Control, 2024, 35(01): 6-10.
[21]ZHENG TAO, HE XINRAN, YUAN HONGXU, et al. Investigation on cognitive level of COVID-19 among medical staff in Anhui Province. Chinese Journal of Disease Control & Prevention, 2021, 25(04): 427-431.
[22]CHEN TAO, YANG YANBING, CHANG YUWEN, et al. Investigation on Mental Health Status and COVID-19 Related Knowledge and Behavior Among the Elderly in Shanghai Under Normalized Epidemic Prevention and Control. South China Journal of Preventive Medicine, 2022, 48(02): 247-250.
[23]XIONG QIAN, HE LIN, LI XIAOYAN. Investigation on Knowledge, Attitude and Practice of COVID-19 Among the Public in Deyang City and Analysis of Related Factors. Western Journal of Traditional Chinese Medicine, 2021, 34(02): 1-5.
[24]GALASSO J, CAO D M, HOCHBERG R. A random forest model for forecasting regional COVID-19 cases utilizing reproduction number estimates and demographic data. Chaos Solitons Fractals, 2022, 156: 111779.
[25]ZHENG H, AN S, QIAO B, et al. A data-driven interpretable ensemble framework based on tree models for forecasting the occurrence of COVID-19 in the USA. Environ Sci Pollut Res Int, 2023, 30(5): 13648-13659.
[26]LUO Y, YAN J, MCCLURE S. Distribution of the environmental and socioeconomic risk factors on COVID-19 death rate across continental USA: a spatial nonlinear analysis. Environ Sci Pollut Res Int, 2021, 28(6): 6587-6599.