The Establishment and Research of Random Forest Quantitative Classification Mode
DOI: https://doi.org/10.62517/jbdc.202501120
Author(s)
Junling Sun1, Gaoping Wang2
Affiliation(s)
1College of Computer and Artificial Intelligence, Henan Finance University, Zhengzhou, Henan, China
2Department of Computer, Guangzhou College of Applied Science and Technology, Guangzhou, Guangdong, China
Abstract
With the advancement of China's financial deepening reforms, research on the application of quantitative investment in the A-share market has gradually increased. Exploring the application of the Random Forest algorithm in stock price classification is of great significance for advancing the development of quantitative stock selection theories. This paper proposes an ensemble learning method based on Random Forest for the problem of quantitative classification. Multiple technical indicators of stock data are selected to construct feature vectors, with a prediction horizon of 5 days, and the target vector prediction results are categorized into 6 classes. By constructing multiple decision trees and incorporating feature importance evaluation, a Random Forest model corresponding to the stocks is established. Experimental results demonstrate that this model can reduce the risk of overfitting while improving classification accuracy. Compared with traditional classification algorithms, this model outperforms them in terms of accuracy, recall rate, and other metrics.
Keywords
Random Forest Algorithm; Quantitative Classification; Price Prediction; Feature Selection; Financial Data Analysis
References
[1] John Doe. The Role of Nutrition in Maintaining a Healthy Lifestyle."Journal of Nutrition and Health, 2022, 50(2), 123-124.
[2] American Diabetes Association. (2023). Nutrition Therapy Recommendations for the Management of Adults With Diabetes. Diabetes Care, 2023, 46(1), 105-123.
[3] Van der Ploeg HP, Thomas EL, Bartels M, et al. Personalized Nutrition for the Management of Type 2 Diabetes: A Review of the Evidence. Nutrients, 2021, 13(6): 1951. https://doi.org/10.3390/nu13061951
[4] Haoxuan Li, Xueyan Zhang, Ziyan Li, et al. Overview of Machine Learning for Stock Selection Based on Multi-Factor Models. E3s Web of Conferences, 2020:214.
[5] Yihua Zhong, Lan Luo, Xinyi Wang, et al. Multi-factor Stock Selection Model Based on Machine Learning. Engineering Letters, 2021, 29 (1):20-26.
[6] Domitr P, Włostowski M. The use of machine learning for inverse uncertainty quantification in TRACE code based on Marviken experiment. Nuclear Engineering and Design, 2021, 384: 111498.
[7] Domitr P, Włostowski M, LASKOWSKI R, et al. Comparison of inverse uncertainty quantification methods for critical flow test. Energy, 2023, 263:125640.
[8] Carbon stock variability of Setiu Lagoon mangroves and its relation to the environmental parameters. Mohamad Saiful Imran Sahari; Nadiatul Azimah Mohd Razali; Nurul Shahida Redzuan; Amri Md Shah; Nor Aslinda Awang; Lee Hin Lee; Hafizan Juahir; Siti Mariam Muhammad Nor. Global Ecology and Conservation.2024
[9] Nijman S, Leeuwenberg A M, Beekers I, et al. Missing data is poorly handled and reported in prediction model studies using machine learning: a literature review. J Clin Epidemiol, 2022, 142: 218-229.
[10] High-resolution mapping of forest structure and carbon stock using multi-source remote sensing data in Japan. Hantao Li; Takuya Hiroshima; Xiaoxuan Li; Masato Hayashi; Tomomichi Kato. Remote Sensing of Environment.2024.