Improved K-Means with Dimensionality Reduction Technique

Amit Thakkar; Nikita Bhatt; Amit Ganatra; Arpita Shah

Improved K-Means with Dimensionality Reduction Technique

Amit Thakkar, Nikita Bhatt, Amit Ganatra, Arpita Shah

Abstract

Clustering is the process of finding groups of objects such that the objects in a group will be similar to one another and different from the objects in other groups. K-means is a well known partitioning based clustering technique that attempts to find a user specified number of clusters represented by their centroid. K-means clustering algorithm often does not work well for high dimension; hence, to improve the efficiency, we apply PCA, dimensionality reduction technique, on data set and obtain a reduced dataset containing possibly uncorrelated variables. The challenging task for any clustering method is to determine the number of clusters beforehand. To find the number of cluster, we apply EM method that finds number of clusters user should choose by determining a mixture of Gaussians that fit a given data set. Finally the experiment results shows that the use of techniques such as PCA and EM, improve the efficiency of K-means clustering.

Keywords

Cluster, EM, K-Mean, PCA

Full Text:

PDF

References

Dr.Nethra Sambamoorthi. “CRM Data Mining: Methods of Dimensionality Reduction and Choosing A Right Technique”, 2010

D.Napoleon, S.Pavalakodi. “A New Method for Dimensionality Reduction using KMeans Clustering Algorithm for High Dimensional Data Set”, 2011

Dr. Edel Garcia, “Singular Value Decomposition (SVD) A Fast Track Tutorial”, 2006

Haifeng Chen. “Principal Component Analysis With Missing Data and Outliers”, 2010

Kiri Wagsta- Claire Cardie, “Constrained K-means Clustering with Background Knowledge”, 2004

Lei Yu, Binghamton University, Jieping Ye, Huan Liu, “Dimensionality Reduction for Data Mining Techniques, applications and Trends”, 2006

Neil Alldrin, Andrew Smith, “Clustering With EM and K-Means, Department of Computer Science”, University of California, San Diego.

Rajashree Dash1, Debahuti Mishra, Amiya Kumar Rath2, Milu Acharya3, “A hybridized K-means clustering approach for high dimensional dataset International Journal of Engineering, Science and Technology” , 2010, Vol. 2, pp. 59-66

Xindong Wu; “Data Mining: Opportunities and Challenges”, University of Vermont, USA, 2007.

Yan Jun, Zhang Benyu, Liu Ning, Yan Shuicheng, Cheng Qiansheng, Fan Weiguo, Yang Qiang, Xi Wensi, and Chen Zheng, “Effective and efficient dimensionality reduction for large-scale and streaming data preprocessing”, IEEE transactions on Knowledge and Data Engineering, Vol. 18, No. 3, 2006, pp. 320-333

http://webdocs.cs.ualberta.ca/~zaiane/courses

http://en.wikipedia.org/wiki/Dimensionreduction

http://en.wikipedia.org/wiki/Expectation-maximization_algorithm

http://www.cs.duke.edu/courses/fall03/cps260/notes/lecture18.pdf

Data Mining By Han & Kamber.

Refbacks

There are currently no refbacks.

This work is licensed under a Creative Commons Attribution 3.0 License.

Username
Password
Remember me