STEMM Institute Press
Science, Technology, Engineering, Management and Medicine
Music Video Play Volume Prediction and Influence Factor Analysis Based on Random Forest Model
DOI: https://doi.org/10.62517/jnme.202610209
Author(s)
Xingyu Li
Affiliation(s)
Beijing Normal-Hong Kong Baptist University (BNBU), Zhuhai, Guangdong , China
Abstract
In order to explore some of the critical variables that determine the number of music videos views and have precise predictions, this research has examined 16,939 music video data to conduct a systematic analysis of how audio and dissemination features influence the number of views. Data cleaning and feature engineering were used to create a standardized dataset. A random forest model was created to predict the views using correlation analysis and feature importance evaluation to identify the core influencing factors. The findings showed that the random forest model had a high predictive accuracy with an R² coefficient of 0.672. The factors that contribute to the view counts are the characteristics of dissemination that accounted to 79.4 percent with the main determinants being the official status, streaming platform, and like rate. Audio features such as Acousticness, Valence, and Energy Liveness were significantly important. The current research not only gives evidence to optimize the approaches to creating and disseminating content in the music industry but also adds value to the empirical literature on digital music dissemination.
Keywords
Music Video; View Count Prediction; Random Forest; Feature Importance; Dissemination Effect
References
[1] Yuntong Liangda. (2025). RWA's in-depth analysis of album (29) — Music Copyright: RWA's technological revolution in real-time royalty distribution for creators. CSDN Blog. https://blog.csdn.net/yuntongliangda/article/details/148570370 [2] Choudhary, Y., Rao, P., & Bhattacharyya, P. (2025). Who will top the charts? Multimodal music popularity prediction via adaptive fusion of modality experts and temporal engagement modeling. arXiv Preprint arXiv:2512.06259. https://doi.org/10.48550/arXiv.2512.06259 [3] Breiman, L.(2001). Random forests. Machine Learning, 45(1), 5-32. https://doi.org/10.1023/A:1010933404324 [4] IBM. (2025). Random forest. IBM Documentation. https://www.ibm.com/id-en/topics/random-forest [5] Cabansag, I. J., & Ntegeka, P. N. (2025). Prediction of Spotify chart success using audio and streaming features [Preprint]. arXiv. https://doi.org/10.48550/arXiv.2508.11632 [6] Predict-the-Hit Research Team. (2025). Predict-the-hit: Prediction of hit songs based on multimodal data. ResearchGate. https://www.researchgate.net/publication/364707093_Predict-the-Hit_Prediction_of_Hit_Songs_based_on_Multimodal_Data [7] Chaudhury, M., Karami, A., & Ghazanfar, M. A. (2022). Large-scale music genre analysis and classification using machine learning with Apache Spark. Electronics, 11(16), 2567. https://doi.org/10.3390/electronics11162567 [8] Choudhary, Y., Rao, P., & Bhattacharyya, P. (2025). Lyrics matter: Exploiting the power of learnt representations for music popularity prediction [Preprint]. arXiv. https://doi.org/10.48550/arXiv.2512.05508
Copyright @ 2020-2035 STEMM Institute Press All Rights Reserved