Open Access Open Access  Restricted Access Subscription or Fee Access

An Efficient Initialization Technique for K-Means Clustering using Spectral Biclustering

P. Logeshwari

Abstract


Clustering is used to classify data into groups of related objects. The most frequently used well-organized clustering technique is K-Means clustering. When the initial centroids are computed the efficiency of the K-Means can be highly increased. Initial starting points those generated randomly by the K-Means often make the clustering results reaching the local optima. The better clustering results of K-Means technique can be accomplished after computing more than one times. However, it is difficult to decide the computation limit, which can give the better result. In this paper, a new approach is proposed for computing the initial centroids for K-Means. The proposed method consists of two steps namely Spectral Biclustering and Semi-Unsupervised Gene Selection. Semi-Unsupervised Selection method based on cosine measure is used to compute the initial centroids for the K-Means algorithm. The proposed approach is tested on the microarray gene database. This approach performs better than the previous method. The proposed technique takes similar or slightly more clustering time but the clustering accuracy is very high. This proposed approach is well suited for the gene clustering.

Keywords


Spectral Biclustering, Semi-Unsupervised Gene Selection, K-Means, Initial Centroids

Full Text:

PDF

References


Shi Yong; Zhang Ge; ―Research on an improved algorithm for cluster analysis‖, International Conference on Consumer Electronics, Communications and Networks (CECNet), Pp. 598 – 601, 2011.

Chen, B.; Tai, P.C.; Harrison, R.; Yi Pan; ―Novel hybrid hierarchical-K-means clustering method (H-K-means) for microarray analysis‖, IEEE Computational Systems Bioinformatics Conference, Pp. 105 – 108, 2005.

P. S. Bradley, and U. M. Fayyad, ―Refining Initial Points for K-Means Clustering,‖ ACM, Proceedings of the 15th International Conference on Machine Learning, pp. 91-99, 1998.

Yan Zhu; Jian Yu; Caiyan Jia; ―Initializing K-means Clustering Using Affinity Propagation‖, Ninth International Conference on Hybrid Intelligent Systems (HIS '09), Vol. 1, Pp. 338 – 343, 2009.

Kohei Arai and Ali Ridho Barakbah, ―Hierarchical K-means: an algorithm for centroids initialization for K-means‖, Saga University, Vol. 36, No.1, 25-31, 2007.

Madhu Yedla, Srinivasa Rao Pathakota and T. M. Srinivasa, ―Enhancing K-Means Clustering Algorithm with Improved Initial Center‖, Vol. 1, 121-125, 2010.

Manjunath Aradhya, Francesco Masulli, and Stefano Rovetta ―Biclustering of Microarray Data based on Modular Singular Value Decomposition‖, Proceedings of CIBB 2009

Yuan F, Meng Z. H, Zhang H. X and Dong C. R, ―A New Algorithm to Get the Initial Centroids,‖ Proc. of the 3rd International Conference on Machine Learning and Cybernetics, pages 26–29, August 2004.

B. Borah, D. K. Bhattacharyya, ―An Improved Sampling-based DBSCAN for Large Spatial Databases‖. In Proceedings of the International Conference on Intelligent Sensing and Information, Pp. 92, 2004.

Brian S. Everitt, ―Cluster analysis‖. Third Edition, 1993.

M. Halkidi, Y. Batistakis and M. Vazirgiannis, ―Clustering Validity Checking Methods: Part II‖. In Proceedings of the ACM SIGMOD International Conference on Management of Data, Volume 31, Issue 3, pages19 – 27, September 2002.

Jieming Wu; Wenhu Yu; ―Optimization and Improvement Based on K-Means Cluster Algorithm‖, Second International Symposium on Knowledge Acquisition and Modeling (KAM '09), Vol. 3, Pp. 335 – 339, 2009.

Yuval Kluger, Ronen Basri, Joseph T. Chang, Mark Gerstein, ―Spectral Biclustering of Microarray Data: Coclustering Genes and Conditions‖, Genome Research - GENOME RES, Vol. 13, No. 4, Pp. 703-716, 2003.

Bing Liu, Chunru Wan, Lipo Wang, ―An Efficient Semi-Unsupervised Gene Selection Method via Spectral Biclustering‖, IEEE Transactions on Nanobioscience - IEEE TRANS NANOBIOSCI , Vol. 5, No. 2, Pp. 110-114, 2006.

Rui Xu, Georgios C. Anagnostopoulos, and Donald C. Wunsch II, ―Multi-class Cancer Classification by Semi-supervised Ellipsoid ARTMAP with Gene Expression Data‖ Proceedings of the 26th Annual International Conference of the IEEE EMBS San Francisco, CA, USA, September 1-5, 2004.


Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.