GDD-K-Means Text Clustering Algorithm Based on Grid Filtering Distance and Density of Outliers
DOI: https://doi.org/10.62517/jike.202404315
Author(s)
Yao Wang, Bin Wang*, Xiuwen Qi
Affiliation(s)
School of Mathematics and Data Science, Changji College, Changji, Xinjiang, China
*Corresponding Author.
Abstract
In the era of big data, fully mining and utilizing the value of big data in line with the requirements of big data strategy plays a significant role in social development. Clustering algorithm can effectively partition unlabeled data sets through unsupervised learning process, and traditional K-Means algorithm is still the most widely used algorithm at present. By studying and learning various improved algorithms of traditional K-Means clustering algorithm, this paper has optimized the problems such as unsatisfactory clustering results caused by outliers and disadvantages of initial center point affecting initial partitioning. Good results have been obtained. Firstly, the grid filtering and LOF detection method of weighing distance and density are used to remove outliers. Then, the randomness of initial center selection is better eliminated by combining the "max-min principle" with the strategy of maximum weight, and the number of clusters is determined according to the BWP index. Experimental results have shown that compared with the currently popular clustering algorithms, the proposed GDD-K-Means clustering algorithm has achieved better results in different data sets, and the accuracy and F-number and other evaluation indexes are improved to a certain extent, and the calculation time complexity is effectively reduced.
Keywords
Data Mining; K-Means Algorithm; Grid Filtering Outlier; Number of Class Centers
References
[1] Juwaied A, Strumillo J L. Improving Performance of Cluster Heads Selection in DEC Protocol Using K-Means Algorithm for WSN. Sensors, 2024, 24(19): 6303-6303.
[2] Shi J. Optimization of frozen goods distribution logistics network based on k-means algorithm and priority classification. Scientific reports, 2024, 14(1): 22477.
[3] Asha A, Rajesh A, Lenin M K. Adaptive fuzzy-based node communication performance prediction with hybrid heuristic Cluster Head selection framework in WSN using enhanced K-means clustering mechanism. Journal of Ambient Intelligence and Smart Environments, 2024, 16(3): 309-335.
[4] Sabbagh A A, Hamze K, Khan S, et al. An Enhanced K-Means Clustering Algorithm for Phishing Attack Detections. Electronics, 2024, 13(18): 3677-3677.
[5] Klen M A, Bonduà S, Kasmaeeyazdi S, et al. A fuzzy K-Means algorithm based on Fisher distribution for the identification of rock discontinuity sets. International Journal of Rock Mechanics and Mining Sciences, 2024, 182105879-105879.
[6] Ahmad W, Singh A, Kumar S, et al. Optimizing Energy Efficiency in Wireless Sensor Networks using Enhanced K-Means Cluster Head Selection. International Journal of Communication Networks and Information Security, 2024, 16(3): 565-573.
[7] Zeng B, Li S, Gao X. Threshold-driven K-means sector clustering algorithm for wireless sensor networks. EURASIP Journal on Wireless Communications and Networking, 2024, 2024(1): 68-68.
[8] Kaizheng W, Yitong F, Shunzhen Z, et al. Cloud detection from Himawari-8 spectral images using K-means ++ clustering with the convolutional module. International Journal of Remote Sensing, 2024, 45(3): 930-953.
[9] Preciado J L A, Aké C S, Martínez V F. Identification of Patterns in CO2 Emissions among 208 Countries: K-Means Clustering Combined with PCA and Non-Linear t-SNE Visualization. Mathematics, 2024, 12(16): 2591-2591.
[10] Jahandoost A, Torghabeh A F, Hosseini A S, et al. Crude oil price forecasting using K-means clustering and LSTM model enhanced by dense-sparse-dense strategy. Journal of Big Data, 2024, 11(1): 117-117.
[11] Nowak A B, Czesław H. Outliers in Covid 19 data based on Rule representation - the analysis of LOF algorithm. Procedia Computer Science, 2021, 1923010-3019.
[12] Alok M, Bradford T. Applied Unsupervised Learning with R: Uncover hidden relationships and patterns with k-means clustering, hierarchical clustering, and PCA. Packt Publishing Limited: 2019-03-27.