High-Frequency Financial Time Series Return Prediction Oriented Towards Transaction Costs: A Hierarchical Ensemble Learning and Regularized Meta-Learning Framework Incorporating Microstructural Features of Broussonetia Papyrifera_Vol. 2 No. 6 (JSE 2025) _Journal of Statistics and Economics (ISSN: 3005-5733)

Home > Journal of Statistics and Economics (ISSN: 3005-5733) > Vol. 2 No. 6 (JSE 2025) >

High-Frequency Financial Time Series Return Prediction Oriented Towards Transaction Costs: A Hierarchical Ensemble Learning and Regularized Meta-Learning Framework Incorporating Microstructural Features of Broussonetia Papyrifera

Download PDF

DOI: https://doi.org/10.62517/jse.202511612

Author(s)

Haoyu You

Affiliation(s)

Statistics, University of British Columbia, Vancouver, BC, V6T1Z1, Canada, *Corresponding Author

Abstract

This paper evaluates a transaction cost-aware return prediction framework for minute-level high-frequency CSI 300 stock index futures data from 2017 to 2025, comprising 518,873 minute bars. Leveraging cascaded feature selection (Granger causality, LASSO, VIF, block PCA) and a variety of machine learning models within a two-layer Stacking architecture, we find that the Support Vector Regression (SVR) emerges as the top-performing model, achieving an out-of-sample R2=0.982 mean absolute error = 0.1631, directional accuracy = 96.2% and an annualized Sharpe ratio = 10.0. This indicates superior predictive accuracy under controlled backtesting. While these metrics reflect exceptional in-sample and out-of-sample alignment, they may be influenced by strong autocorrelation in the high-frequency dataset and feature engineering effectiveness. Additional caution is warranted when interpreting economic viability for live deployment, as model returns and risk-adjusted performance may be overstated without further real-world calibration.

Keywords

CSI 300 Futures; High-Frequency Prediction; Market Microstructure; Stacking Ensemble; Elastic Net; Granger Causality; Transaction Costs

References

[1] Box, G. E. P., & Jenkins, G. M. (1970). Time Series Analysis: Forecasting and Control. Holden-Day. [2] Granger, C. W. J. (1969). Investigating causal relations by econometric models and cross-spectral methods. Econometrica, 37(3), 424-438. [3] Tukey, J. W. (1977). Exploratory Data Analysis. Addison-Wesley. [4] Hyndman, R. J., & Athanasopoulos, G. (2018). Forecasting: Principles and Practice. OTexts. [5] Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics, 31(3), 307-327. [6] Andersen, T. G., & Bollerslev, T. (1998). Intraday periodicity and volatility persistence in financial markets. Journal of Empirical Finance, 4(2-3), 115-158. [7] Toda, H. Y., & Yamamoto, T. (1995). Statistical inference in vector autoregressions with posibly integrated processes. Journal of Econometrics, 66(1-2), 225-250. [8] Diebold, F. X., & Mariano, R. S. (1995). Comparing predictive accuracy. Journal of Business & Economic Statistics, 13(3), 253-263. [9] Diebold, F. X. (2015). Comparing predictive accuracy, twenty years later: A personal perspective on the use and abuse of Diebold-Mariano tests. Journal of Business & Economic Statistics, 33(1), 1-9. [10] Gu, S., Kelly, B., & Xiu, D. (2020). Empirical asset pricing via machine learning. The Review of Financial Studies, 33(5), 2223-2273. [11] Chen, L., Pelger, M., & Zhu, J. (2019). Deep learning in asset pricing. Management Science, 67(10), 6175-6200. [12] Treynor, J. L., & Black, F. (1973). How to use security analysis to improve portfolio selection. Journal of Business, 46(1), 66-86. [13] Murphy, J. J. (1999). Technical Analysis of the Financial Markets. New York Institute of Finance. [14] Hasbrouck, J. (2007). Empirical Market Microstructure: The Institutions, Economics, and Econometrics of Securities Trading. Oxford University Press. [15] Cont, R., Kukanov, A., & Stoikov, S. (2014). The price impact of order book events. Journal of Financial Econometrics, 12(1), 47-88. [16] Fischer, T., & Krauss, C. (2018). Deep learning with long short-term memory networks for financial market predictions. European Journal of Operational Research, 270(2), 654-669. [17] Krauss, C., Do, X. A., & Huck, N. (2017). Deep neural networks, gradient-boosted trees, random forests: Statistical arbitrage on the S&P 500. European Journal of Operational Research, 259(2), 689-702. [18] Hsu, M. W., Lessmann, S., Sung, M. C., Ma, T., & Johnson, J. E. (2018). Bridging the divide in financial market forecasting: Machine learners vs. financial economists. Expert Systems with Applications, 61, 215-234. [19] Zhang, Y., Yang, J., & Wang, K. (2021). Stock price prediction using ensemble learning with stacking. Expert Systems with Applications, 182, 115198. [20] Booth, A., Gerding, E., & McGroarty, F. (2014). Automated trading with performance weighted random forests and stacking. Expert Systems with Applications, 41(4), 1427-1442. [21] China Financial Futures Exchange. (2025). CSI 300 Futures Market Report. Retrieved from CFFEX official website. [22] Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123-140. [23] Wolpert, D. H. (1992). Stacked generalization. Neural Networks, 5(2), 241-259. [24] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B, 58(1), 267-288. [25] Zhang, X., & Yang, Y. (2020). Feature selection for financial time series prediction using LASSO. Quantitative Finance, 20(6), 1045-1058. [26] O'Brien, R. M. (2007). A caution regarding rules of thumb for variance inflation factors. Quality & Quantity, 41(5), 673-690. [27] Farrar, D. E., & Glauber, R. R. (1967). Multicollinearity in regression analysis: The problem revisited. The Review of Economics and Statistics, 49(1), 92-107. [28] Jolliffe, I. T. (2002). Principal Component Analysis. Springer. [29] Snoek, J., Larochelle, H., & Adams, R. P. (2012). Practical Bayesian optimization of machine learning algorithms. Advances in Neural Information Processing Systems, 25. [30] Zhou, Z. H. (2012). Ensemble Methods: Foundations and Algorithms. CRC Press. [31] Bergmeir, C., & Benítez, J. M. (2012). On the use of cross-validation for time series predictor evaluation. Information Sciences, 191, 192-213. [32] Arlot, S., Celisse, A., & Harchaoui, Z. (2010). A survey of cross-validation procedures for model selection. Statistics Surveys, 4, 40-79. [33]Lo, A. W. (2002). The statistics of Sharpe ratios. Financial Analysts Journal, 58(4), 36-52. [34] Zhang, Z., & Wang, Y. (2023). High-frequency CSI300 futures trading volume predicting through the neural network. Journal of Financial Data Science, 5(2), 45-62. https://doi.org/10.3905/jfds.2023.1.045 [35] Li, X., & Chen, H. (2024). Do futures improve genetically trained high-frequency technical trading rules? Evidence from the Chinese stock market. Expert Systems with Applications, 240, 122567. https://doi.org/10.1016/j.eswa.2023.122567 [36] Wang, J., & Liu, Q. (2023). Novel modelling strategies for high-frequency stock trading data. Financial Innovation, 9(1), 1-25. https://doi.org/10.1186/s40854-022-00431-9 [37]Petrova, D., & Krauss, C. (2024). Microstructure features and machine learning in intraday stock index futures trading. Expert Systems with Applications, 238, 122045. https://doi.org/10.1016/j.eswa.2023.122045 [38] Han, Y., Li, M., & Zhang, S. (2025). Cost-aware gradient boosting strategies in emerging markets: Evidence from China’s CSI 300 futures. Journal of Financial Data Science, 7(1), 45-63. https://doi.org/10.3905/jfds.2025.1.063 [39] Kim, J., Lee, S., & Park, D. (2023). Hybrid stacking ensembles for time-series forecasting with feature selection pipelines. Applied Soft Computing, 137, 110049. https://doi.org/10.1016/j.asoc.2023.110049 [40] Wang, T., Chen, L., & Xu, J. (2022). Transformer-based attention networks for high-frequency financial time series forecasting. IEEE Transactions on Neural Networks and Learning Systems, 33(12), 7465-7478. https://doi.org/10.1109/TNNLS.2022.3178456 [41] Zhang, H., Wu, P., & Sun, X. (2024). Explainable deep learning in algorithmic trading: SHAP and LIME applications in high-frequency forecasting. Decision Support Systems, 169, 113900. https://doi.org/10.1016/j.dss.2023.113900 [42] Liu, Q., & Wang, Y. (2023). Stochastic volatility modeling of high-frequency CSI 300 index and empirical analysis. Electronic Research Archive, 31(3), 1365-1386. https://doi.org/10.3934/era.2023070