Country Development Level Analysis Based on Multidimensional Feature Clustering and LightGBM Classification
DOI: https://doi.org/10.62517/jike.202504408
Author(s)
Lingyu Zhao, Xiaofeng Li*
Affiliation(s)
College of Artificial Intelligence and Big Data, Henan University of Technology, Zhengzhou, Henan, China
*Corresponding Author
Abstract
As globalization accelerates, significant differences emerge across countries in multi-dimensional indicators such as economy, education, health, and social welfare, making traditional single indices inadequate for comprehensively capturing their development patterns. This paper proposes a national development classification framework that combines unsupervised clustering with supervised classification enhanced by monotonicity. First, using seven macroeconomic indicators include per capita GDP, urbanization rate, higher education enrollment rate, life expectancy, minimum wage, fertility rate, and CPI change rate—the K-means algorithm is applied to classify countries into four development tiers (A/B/C/D), enabling an interpretable classification of the clustering results; Using the clustering levels as supervised labels, a LightGBM multi-classification model with monotonicity constraints is constructed for prediction. Positive correlation constraints are applied to positive indicators, while negative correlation constraints are applied to negative indicators, such as fertility rates, ensuring compliance with the corresponding monotonicity logic. Subsequently, consistency tests were conducted against the United Nations Human Development Index (HDI) categories, with correlation coefficients reaching high levels. This validated that the clustering classification generally maintains monotonic consistency with the HDI but is not entirely identical, demonstrating the superiority of this clustering method. The final model demonstrated excellent classification performance on the validation set. This method provides a comprehensive and logically sound assessment classification scheme for evaluating national development levels while balancing interpretability and predictive accuracy.
Keywords
K-means; LightGBM; Country classification; United Nations Human Development Index
References
[1]Programme U N D ,UNDP.Human Development Report 1990: Concept and Measurement of Human Development. Undp, 1990.
[2]Selim, Shokri Z., and M. A. Ismail. "K-Means-Type Algorithms: A Generalized Convergence Theorem and Characterization of Local Optimality." IEEE Transactions on Pattern Analysis and Machine Intelligence 6.1(1984):81-87.
[3]Meng, Qi. "LightGBM: A Highly Efficient Gradient Boosting Decision Tree." Neural Information Processing Systems Curran Associates Inc. 2017.
[4]Stern, S., Wares, A., Orzell, S., & O’Sullivan, P. (2014). Social progress index 2014. Methodol Approach Wash Soc Prog Imp, 6.
[5]Kira, K., & Rendell, L. A. (1992). A Practical Approach to Feature Selection. Proceedings of the Ninth International Workshop on Machine Learning (ML 1992), Aberdeen, Scotland, UK, July 1-3, 1992. Morgan Kaufmann Publishers Inc.
[6]Hastie T, Tibshirani R, Friedman J .The elements of statistical learning. 2001. Journal of the Royal Statistical Society, 2004, 167(1):192-192.
[7]Freedman D, Pisani R, Purves R. Statistics: Fourth international student edition. WW Nort Co Httpswww Amaz ComStatistics-Fourth-Int-Stud-Free Accessed, 2020, 22.
[8]Chen M, Zhang H , Liu W ,et al. The Global Pattern of Urbanization and Economic Growth: Evidence from the Last Three Decades. Plos One, 2014, 9.
[9]Miladinov G .Socioeconomic development and life expectancy relationship: evidence from the EU accession candidate countries. Genus, 2020, 76(1):1-20.
[10]Villarreal K .Urban Institute. John Wiley & Sons, Inc. 2017.
[11]Barro R J .Determinants of Economic Growth: A Cross-Country Empirical Study. American Political Science Review, 2003, 92(2):145-477.
[12]Ott J. World Bank World Development Indicators//Encyclopedia of Quality of Life and Well-Being Research. Cham: Springer International Publishing, 2024: 7858-7858.
[13]Cornia G A. Economic integration, inequality and growth: Latin America versus the European economies in transition. Review of Economics and Institutions, 2011, 2(2).
[14]Batini C, Scannapieca M .Data Quality: Concepts, Methodologies and Techniques. 2006.
[15]Shu, Xiaoling, and Yiwan Ye. "Knowledge Discovery: Methods from data mining and machine learning." Social Science Research 110 (2023): 102817.
[16]Fahmiyah I, Ningrum R A. Human development clustering in Indonesia: Using K-Means method and based on Human Development Index categories. Journal of Advanced Technology and Multidiscipline, 2023, 2(1): 27-33.
[17]do Nascimento E R, de Albuquerque M A, de Oliveira Barros K N N, et al. Cluster analysis applied to the human development index (HDI) of Brazilian states. Research, Society and Development, 2022, 11(2): e18011225747-e18011225747.
[18]Hamilton, Martin A., R. C. Russo, and R. V. Thurston. "Trimmed Spearman-Karber method for estimating median lethal concentrations in toxicity bioassays." Environmental Science & Technology
[19]Lindskog, Filip, A. Mcneil, and U. Schmock. "Kendall's Tau for Elliptical Distributions." Contributions to Economics (2003).
[20]Mylevaganam S. The analysis of Human Development Index (HDI) for categorizing the member states of the United Nations (UN). Open Journal of Applied Sciences, 2017, 7(12): 661-690.