STEMM Institute Press
Science, Technology, Engineering, Management and Medicine
Identification and Authenticity Classification of Abnormal Signals Based on High-Frequency Trading Data
DOI: https://doi.org/10.62517/jse.202611202
Author(s)
Ziheng Zeng
Affiliation(s)
University of Queensland. Nanjing, Jiangsu, China
Abstract
In an environment where high-frequency trading (HFT) has become a dominant force in financial markets, extreme changes in prices and trading volumes often occur: prices, trading volumes, and trading intensities can experience sharp peaks, jumps, or structural changes within a very short period. These abnormal signals may be triggered by real information shocks (such as macroeconomic news or sudden changes in liquidity), or by market microstructure noise or suspicious manipulation. To achieve interpretable anomaly detection and authenticity discrimination in the absence of "true value labels", this study proposes a process of "interpretable features → weakly supervised validation → scoring of uncertain samples". Taking the 1-minute BTCUSDT data provided by Binance (from June 2023 to September 2023) as an example, we construct basic features and trigger combinations on 19,402 abnormal candidate samples obtained through rule triggering, and extract sub-window statistical features (such as log-return, |log-return|, trading volume, and the mean, variance, and ratio features of the NTR in the pre- and post-window) from both sides of the abnormal center time point. Subsequently, we quantify and rank the systematic differences between different pseudo-label groups and select the 8 sub-window features with the highest discrimination. Under the premise of using only likely_true and likely_fake as weak labels, we train an interpretable Logistic Regression as a "scorer" and provide decision points for different business preferences (high recall/high precision) through threshold scanning. Finally, we output continuous risk scores and rankings for uncertain samples, providing high-value candidates for subsequent manual review or more complex models.
Keywords
High-Frequency Trading; Anomaly Detection; Explainable Features; Weakly Supervised Learning; Logistic Regression; Threshold Selection
References
[1] Cornell Law School, Legal Information Institute. 7 U.S.C. §6c(a)(5)(C) (Commodity Exchange Act anti-spoofing provision). [2] U.S. Commodity Futures Trading Commission (CFTC). (2013). Interpretive Guidance and Policy Statement Regarding Disruptive Practices (including spoofing). [3] Cartea, Á., Jaimungal, S., & Penalva, J. (2015). Algorithmic and High-Frequency Trading. Cambridge University Press. [4] Andersen, T. G., Bollerslev, T., Diebold, F. X., & Vega, C. (2003). Micro effects of macro announcements: real-time price discovery in foreign exchange. American Economic Review. [5] Chaboud, A. P., et al. (2004). The High-Frequency Effects of Macroeconomic Announcements on Prices and Trading Activity in the Foreign Exchange Market. Finance and Economics Discussion Series, Board of Governors of the Federal Reserve System. [6] Breunig, M. M., Kriegel, H.-P., Ng, R. T., & Sander, J. (2000). LOF: Identifying density-based local outliers. SIGMOD. [7] Liu, F. T., Ting, K. M., & Zhou, Z.-H. (2008). Isolation Forest. ICDM. [8] Ratner, A., et al. (2020). Snorkel: Rapid training data creation with weak supervision. VLDB Journal. [9] Bekker, J., & Davis, J. (2018). Learning from Positive and Unlabeled Data: A Survey. arXiv:1811.04820. [10] Platt, J. (1999). Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. [11] Zulkifley, M. A., Munir, N. N. A., Sukor, N. S. A., & Ameerbakhsh, O. A. (2023). Survey on Stock Market Manipulation Detectors Using Artificial Intelligence. Computers, Materials & Continua. (Tech Science Press)
Copyright @ 2020-2035 STEMM Institute Press All Rights Reserved