GDD-K-Means Text Clustering Algorithm Based on Grid Filtering Distance and Density of Outliers_Vol. 2 No. 3 (JIKE 2024)_Journal of Intelligence and Knowledge Engineering (ISSN: 2959-0620)

Home > Journal of Intelligence and Knowledge Engineering (ISSN: 2959-0620) > Vol. 2 No. 3 (JIKE 2024) >

GDD-K-Means Text Clustering Algorithm Based on Grid Filtering Distance and Density of Outliers

Download PDF

DOI: https://doi.org/10.62517/jike.202404315

Author(s)

Yao Wang, Bin Wang*, Xiuwen Qi

Affiliation(s)

School of Mathematics and Data Science, Changji College, Changji, Xinjiang, China *Corresponding Author.

Abstract

In the era of big data, fully mining and utilizing the value of big data in line with the requirements of big data strategy plays a significant role in social development. Clustering algorithm can effectively partition unlabeled data sets through unsupervised learning process, and traditional K-Means algorithm is still the most widely used algorithm at present. By studying and learning various improved algorithms of traditional K-Means clustering algorithm, this paper has optimized the problems such as unsatisfactory clustering results caused by outliers and disadvantages of initial center point affecting initial partitioning. Good results have been obtained. Firstly, the grid filtering and LOF detection method of weighing distance and density are used to remove outliers. Then, the randomness of initial center selection is better eliminated by combining the "max-min principle" with the strategy of maximum weight, and the number of clusters is determined according to the BWP index. Experimental results have shown that compared with the currently popular clustering algorithms, the proposed GDD-K-Means clustering algorithm has achieved better results in different data sets, and the accuracy and F-number and other evaluation indexes are improved to a certain extent, and the calculation time complexity is effectively reduced.

Keywords

Data Mining; K-Means Algorithm; Grid Filtering Outlier; Number of Class Centers

References

[1] Juwaied A, Strumillo J L. Improving Performance of Cluster Heads Selection in DEC Protocol Using K-Means Algorithm for WSN. Sensors, 2024, 24(19): 6303-6303. [2] Shi J. Optimization of frozen goods distribution logistics network based on k-means algorithm and priority classification. Scientific reports, 2024, 14(1): 22477. [3] Asha A, Rajesh A, Lenin M K. Adaptive fuzzy-based node communication performance prediction with hybrid heuristic Cluster Head selection framework in WSN using enhanced K-means clustering mechanism. Journal of Ambient Intelligence and Smart Environments, 2024, 16(3): 309-335. [4] Sabbagh A A, Hamze K, Khan S, et al. An Enhanced K-Means Clustering Algorithm for Phishing Attack Detections. Electronics, 2024, 13(18): 3677-3677. [5] Klen M A, Bonduà S, Kasmaeeyazdi S, et al. A fuzzy K-Means algorithm based on Fisher distribution for the identification of rock discontinuity sets. International Journal of Rock Mechanics and Mining Sciences, 2024, 182105879-105879. [6] Ahmad W, Singh A, Kumar S, et al. Optimizing Energy Efficiency in Wireless Sensor Networks using Enhanced K-Means Cluster Head Selection. International Journal of Communication Networks and Information Security, 2024, 16(3): 565-573. [7] Zeng B, Li S, Gao X. Threshold-driven K-means sector clustering algorithm for wireless sensor networks. EURASIP Journal on Wireless Communications and Networking, 2024, 2024(1): 68-68. [8] Kaizheng W, Yitong F, Shunzhen Z, et al. Cloud detection from Himawari-8 spectral images using K-means ++ clustering with the convolutional module. International Journal of Remote Sensing, 2024, 45(3): 930-953. [9] Preciado J L A, Aké C S, Martínez V F. Identification of Patterns in CO2 Emissions among 208 Countries: K-Means Clustering Combined with PCA and Non-Linear t-SNE Visualization. Mathematics, 2024, 12(16): 2591-2591. [10] Jahandoost A, Torghabeh A F, Hosseini A S, et al. Crude oil price forecasting using K-means clustering and LSTM model enhanced by dense-sparse-dense strategy. Journal of Big Data, 2024, 11(1): 117-117. [11] Nowak A B, Czesław H. Outliers in Covid 19 data based on Rule representation - the analysis of LOF algorithm. Procedia Computer Science, 2021, 1923010-3019. [12] Alok M, Bradford T. Applied Unsupervised Learning with R: Uncover hidden relationships and patterns with k-means clustering, hierarchical clustering, and PCA. Packt Publishing Limited: 2019-03-27.