Open Access Open Access  Restricted Access Subscription or Fee Access

Speedy Algorithm for Clustering Imbalanced Data

M. H. Marghny, Ahmed I. Taloba, Rasha M. Abd El-Aziz

Abstract


Fast Balanced K-means (FBK-means) clustering approach is one of the most important consideration when one want to solve clustering problem of balanced data. Mostly, numerical experiments show that FBK-means is faster and more accurate than the K-means algorithm, Genetic Algorithm, and Bee algorithm. FBK-means Algorithm needs few distance calculations and fewer computational time while keeping the same clustering results. However, the FBK-means algorithm doesn’t give good results with imbalanced data. To resolve this shortage, a more efficient clustering algorithm, namely Fast K-means (FK-means), developed in this paper. This algorithm not only give the best results as in the FBK-means approach but also needs lower computational time in case of imbalance data.

Keywords


Clustering, K-means Algorithm, Bee Algorithm, Genetic Algorithm, FBK-means Algorithm, FK-means Algorithm.

Full Text:

PDF

References


M. H. Marghny, Rasha M. Abd El-aziz, and Ahmed I. Taloba, “ Differential Search Algorithm-based Parametric Optimization of Fuzzy Generalized Eigenvalue Proximal Support Vector Machine”, International Journal of Computer Applications, vol. 108, No. 19, pp. 38-46, 2014.

M. H. Marghny, M. M. Abdelsamea, "An efficient clustering based texture feature extraction for medical image", Computer and Information Technology, ICCIT, pp.88-93, 2008.

M. H. Margahny, and A. A. Mitwaly, “Fast Algorithm for Mining Association Rules”, AIML 05 Conference, Cairo, Egypt, 2005.

M. H. Marghny, A.A. Shakour, “Fast, Simple and Memory Efficient Algorithm for Mining Association Rules”, International Review on Computers & Software, vol. 2, No. 1, 2007.

M. H. Marghny, “Rules extraction from constructively trained neural networks based on genetic algorithms”, International Review on Computers & Software, vol. 2, No. 1, 2007.

H. Yasin, T. A. Jilani, and M. Danish, “Hepatitis-C Classification using Data Mining Techniques”, International Journal of Computer Applications, vol. 24, No. 3, pp. 1-6, 2011.

M.H. Marghny, and I.E. El-Semman, “Extracting logical classification rules with gene expression programming: microarray case study”, Proceedings of the International Conference on Artificial Intelligence and Machine Learning (AIML 05), Cairo, Egypt, pp.11–16, 2005.

M.H. Marghny, and I.E. El-Semman, “Extracting fuzzy classification rules with gene expression programming”, Proceedings of the International Conference on Artificial Intelligence and Machine Learning (AIML 05), Cairo, Egypt, 2005.

C. C. Aggarwal and C. K Reddy, “Data clustering: algorithms and applications”, Chapman and Hall/CRC Press, 2013.

M. H. Marghny and Ahmed I. Taloba, “Outlier Detection using Improved Genetic K-means”, International Journal of Computer Applications, vol. 28, No. 11, pp. 33-36, 2011.

M. H. Marghny, Rasha M. Abd El-Aziz and Ahmed I. Taloba, “An Effective Evolutionary Clustering Algorithm: Hepatitis C Case Study”, Computer Science Department, Egypt, International Journal of Computer Applications, vol. 34, No. 6, pp. 0975-8887, 2011.

R. Liu, L. Jiao, X. Zhang, Y. Li, “Gene transposon based clone selection algorithm for automatic clustering”, Information Sciences, vol. 204, pp. 1-22, 2012.

Z. Che, A. Unler, “Clustering and selecting suppliers based on simulated annealing algorithms”, Computers and Mathematics with Applications, vol. 63, No. 1, pp. 228–238, 2012.

R. Kuo, Y. Syu, Z. Chen, F. Tien, “Integration of particle swarm optimization and genetic algorithm for dynamic clustering”, Information Sciences, vol. 195, pp. 124–140, 2012.

Adel A. Sewisy, M. H. Marghny, Rasha M. Abd ElAziz and Ahmed I. Taloba, “Fast Efficient Clustering Algorithm for Balanced Data” International Journal of Advanced Computer Science and Applications, vol. 5, No. 6, pp. 123-129, 2014.

J. McQueen, "Some methods for classification and analysis of multivariate observations", Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281-296, 1967.

L. Ba, J. Liang, C. Sui and D. Dang, “Fast global k-means clustering based on local geometrical information”, Information Sciences, vol. 245, pp. 168-180, 2013.

“http://cs.joensuu.fi/sipu/datasets/”.

K. Bache, and M. Lichman, “UCI Machine Learning Repository” University of California, School of Information and Computer Science, http://archive.ics.uci.edu/ml.


Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.