Machine Learning–Based Identification of Depressive Symptoms in Patients with Cancer: Multidimensional Feature Analysis and Model Development
DOI: https://doi.org/10.62517/jmpe.202618201
Author(s)
Shuai Wei, Xingcai Gao*
Affiliation(s)
The Fifth Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan, China
*Corresponding Author
Abstract
This study sought to determine the factors linked to depressive symptoms among cancer patients and to establish a machine learning model for identifying those at high risk of such symptoms, with interpretability supported by SHAP. The dataset derived from the 2018 wave of the China Health and Retirement Longitudinal Study (CHARLS) was used, and a total of 466 middle-aged and elderly cancer patients were enrolled. Feature selection was conducted via LASSO regression, and the data were randomly partitioned into training (60%) and test (40%) subsets. Eight machine learning models were built and compared. Their performance was evaluated using the ROC curves, PR curves, Brier score, and decision curve analysis, while SHAP was applied to interpret the model outputs. Seven key features were identified. Among all models, XGBoost achieved the optimal overall performance in the test cohort, presenting an AUC of 0.771, PR-AUC of 0.790, Brier score of 0.1596, and favorable net clinical benefit. SHAP analysis indicated that self-rated health, life satisfaction, and IADL limitation were the most important contributors to model predictions. Subgroup analyses showed stable performance across age and sex strata, with all AUCs above 0.70. Overall, the XGBoost model demonstrated good discrimination and interpretability, suggesting its potential as an auxiliary tool for early warning, further screening, and risk stratification of depressive symptoms in cancer patients.
Keywords
Patients with Cancer; Symptoms of Depression; Machine Learning; XGBoost; SHAP
References
[1] GETIE A, AYALNEH M, BIMEREW M. Global prevalence and determinant factors of pain, depression, and anxiety among cancer patients: an umbrella review of systematic reviews and meta-analyses. BMC Psychiatry, 2025, 25(1): 156.
[2] WALKER J, MULICK A, MAGILL N, et al. Major Depression and Survival in People With Cancer. Psychosom Med, 2021, 83(5): 410-6.
[3] SPIEGEL D. Cancer and depression. Br J Psychiatry Suppl, 1996, (30): 109-16.
[4] XU H, TANG W, LIANG Y, et al. Risk prediction models for depression in cancer survivors: A systematic review and meta-analysis. Medicine (Baltimore), 2025, 104(34): e43978.
[5] ALHUMAIDI N H, DERMAWAN D, KAMARUZAMAN H F, et al. The Use of Machine Learning for Analyzing Real-World Data in Disease Prediction and Management: Systematic Review. JMIR Med Inform, 2025, 13: e68898.
[6] THOTTAKKARA P, OZRAZGAT-BASLANTI T, HUPF B B, et al. Application of Machine Learning Techniques to High-Dimensional Clinical Data to Forecast Postoperative Complications . PLoS One, 2016, 11(5): e0155705.
[7] ZHAO Y, HU Y, SMITH J P, et al. Cohort profile: the China Health and Retirement Longitudinal Study (CHARLS). Int J Epidemiol, 2014, 43(1): 61-8.
[8] AMANI O, MAZAHERI M A, MALEKZADEH MOGHANI M, et al. Mediating effects of rumination on insomnia in cancer survivors: Influences of cancer-related fatigue, fear of recurrence, and psychological distress. Cancer Med, 2024, 13(18): e70189.
[9] STORENG S H, SUND E R, KROKSTAD S. Factors associated with basic and instrumental activities of daily living in elderly participants of a population-based survey: the Nord-Trondelag Health Study, Norway . BMJ Open, 2018, 8(3): e018942.
[10] CHEN X, TAN S, LI Y, et al. Financial toxicity, social support, and negative emotions among caregivers of children with cancer: a cross-sectional study in Western China . Front Public Health, 2025, 13: 1677962.
[11] PALAGINI L, MINIATI M, RIEMANN D, et al. Insomnia, Fatigue, and Depression: Theoretical and Clinical Implications of a Self-reinforcing Feedback Loop in Cancer. Clin Pract Epidemiol Ment Health, 2021, 17(1): 257-63.