Open Access Open Access  Restricted Access Subscription or Fee Access

Hierarchical Clustering With Multi-View Point Based Similarity Measure

S.U. Meena, P. Parthasarathi

Abstract


Clustering is a technique for finding similarity groups in data, called clusters. It groups data instances that are similar to each other in one cluster and data instances that are very different from each other into different clusters. Clustering is often called an unsupervised learning. In this paper Hierarchical clustering is used to find the cluster relationship between data objects in the data set. We introduce a novel multi-viewpoint based similarity measure and two related clustering methods. The main difference of our novel method from the existing one is that it uses only single view point for which it is the base and where as the mentioned clustering with Multi-Viewpoint Based Similarity Measure uses many different viewpoints of objects and are assumed to not be in the same cluster with two objects being measured. Based on this novel method two criterion functions are proposed for document clustering. We compared this clustering algorithm with other measures in order to verify the improvement of novel method.


Keywords


Data Mining, Text Mining, Similarity Measure, Multi-Viewpoint Similarity Measure, Clustering Methods.

Full Text:

PDF

References


Duc Thang Nguyen, Lihui Chen, and Chee Keong Chan” Clustering with Multiviewpoint-Based Similarity Measure” IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 24, NO. 6, JUNE 2012.

A. Ahmad and L. Dey, “A Method to Compute Distance Between Two Categorical Values of Same Attribute in Unsupervised Learning for Categorical Data Set,” Pattern Recognition Letters, vol. 28, no. 1, pp. 110-118, 2007.

D. Ienco, R.G. Pensa, and R. Meo, “Context-Based Distance Learning for Categorical Data Clustering,” Proc. Eighth Int’l Symp. Intelligent Data Analysis (IDA), pp. 83-94, 2009.

H. Chim and X. Deng, “Efficient Phrase-Based Document Similarity for Clustering,” IEEE Trans. Knowledge and Data Eng.,vol. 20, no. 9, pp. 1217-1229, Sept. 2008.

H. Zha, X. He, C. Ding, H. Simon, and M. Gu, “Spectral Relaxation for K-Means Clustering,” Proc. Neural Info. Processing Systems (NIPS), pp. 1057-1064, 2001.

I. Dhillon and D. Modha, “Concept Decompositions for Large Sparse Text Data Using Clustering,” Machine Learning, vol. 42, nos. 1/2, pp. 143-175, Jan. 2001.

I.S. Dhillon, “Co-Clustering Documents and Words Using Bipartite Spectral Graph Partitioning,” Proc. Seventh ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining (KDD), pp. 269-274, 2001.

J. Shi and J. Malik, “Normalized Cuts and Image Segmentation,” IEEE Trans. Pattern Analysis Machine Intelligence, vol. 22, no. 8, pp. 888-905, Aug. 2000.

P. Willett. Document clustering using an inverted file approach. Journal of Information Science, 2:223–231, 1990.

S. Zhong, “Efficient Online Spherical K-means Clustering,” Proc. IEEE Int’l Joint Conf. Neural Networks (IJCNN), pp. 3180-3185, 2005.

S.C. Deerwester, S.T. Dumais, T.K. Landauer, G.W. Furnas, and R.A. Harshman, “Indexing by Latent Semantic Analysis,” J. Am. Soc. Information Science, vol. 41, no. 6, pp. 391-407, 1990.

S. Flesca, G. Manco, E. Masciari, L. Pontieri, and A. Pugliese, “Fast Detection of xml Structural Similarity,” IEEE Trans. Knowledge and Data Eng., vol. 17, no. 2, pp. 160-175, Feb. 2005.

Y. Gong and W. Xu, Machine Learning for Multimedia Content Analysis. Springer-Verlag, 2007.

Y. Zhao and G. Karypis, “Empirical and Theoretical Comparisons of Selected Criterion Functions for Document Clustering,” Machine Learning, vol. 55, no. 3, pp. 311-331, June 2004.


Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.