Open Access Open Access  Restricted Access Subscription or Fee Access

Clustering Data Based on Probability Distribution Similarity

J. Priyadharshini, S. Akila Devi, A. Askerunisa


Clustering on Distribution measurement is an essential task in mining methodology. The previous methods extend traditional partitioning based clustering methods like k-means and density based clustering methods like DBSCAN rely on geometric measurements between objects. The probability distributions have not been considered in measuring distance similarity between objects. In this paper, objects are systematically modeled in discrete domains and the Kullback-Leibler Divergence is used to measure similarity between the probabilities of discrete values and integrate it into partitioning and density based clustering methods to cluster objects. Finally the resultant execution time, Mean square Error and Noise Point Detection, is calculated and it is compared for Partitioning Based Clustering Algorithm and Density Based Clustering Algorithm. The Partitioning and Density Based clustering using KL divergence have reduced the execution time to 68 sec, Mean Square Error to 0.001and 22 Noise Points are detected. The efficiency of Distribution based measurement clustering is better than the Distance based measurement clustering.


Partitioning Based Clustering Methods, Density Based Clustering Method, Distribution Based Clustering, Kullback-Leibler Divergence

Full Text:



Alie J Sajid N.A,” Critical Analysis of DBSCAN Variations”, IEEE Transactions on information and Emerging Technology, Year: 2010, pages:258-269.

Huang, J.Z. Yunming Ye, “k Means: Automated Two-Level Variable weighting Clustering Algorithm for Multi View Data”, IEEE Transactions on Knowledge and Data Engineering, Volume: 25, Issue: 4, Publication Year:2013, Page(s):932-944

Lian Duan , Deyi Xiong; Jun Lee; Feng Guo, “A Local Density Based Spatial Clustering Algorithm with Noise”, IEEE Conference on Systems, Man, and Cybernetics,Volume:5, PublicationYear:2012, Page(s): 4061-4066 .

Sulaiman S.N,Isa, N.A.M, “Adaptive Fuzzy-K-means Clustering Algorithm for Image Segmentation”, IEEE Transactions on Consumer Electronics, Volume: 56, Issue: 4, Publication Year: 2010, Page(s): 2661-2668.

Pei,jein,tao, “ Clustering Uncertain Data Based on Probability Distribution Similarity”, IEEE Transactions on knowledge and data Engineering, Volume: 25, issue_4, Publication Year: 2013, Page(s): 721 – 733.

Bishnu,S.; Bhattacherjee, V,” Software Fault Prediction Using Quad Tree-Based K-Means Clustering Algorithm“, IEEE Transaction on Knowledge and Data Engineering, Volume: 24, Issue: 6; Publication Year: 2012, page(s):1146-1150.

Liping Jing ; Ng, M.K. ; Huang , J.Z., “An Entropy Weighting k-Means Algorithm for Subspace Clustering of High -Dimensional Sparse”, Volume: 19, Issue: 8, Publication Year: 2010, Page(s):1026-1041.

Grigorios,F. Tzortzis and Aristidis C. Likas, “The Global Kernel k-means Algorithm for Clustering in Feature Space”, IEEE Transaction on Neural networks, Volume: 20, Issue: 1, year: JULY 2011.

Isa, N.A.M. ; Salamah, S.A. ; Ngah, U.K, “Adaptive Fuzzy Moving K-means Clustering Algorithm for Image Segmentation”, IEEE Transactions on Consumer Electronics, Publication Year: 2009,Page(s):2145-2153.

M.Ester,H.P.Kriegel.J,” A Density Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise”, IEEE Conferences on Knowledge and Data Engineering, Publication Year: 2010

Maji, S.K. ; Patra, P.K.,”FDCA: A Fast density based Clustering algorithm for spatial database system”, In IEEE International Transactions on Computer and Communication Technology (ICCCT),Publication Year:2011, Page(s): 21 – 26

Amini, A. ; Teh Ying Wah ,,”A Study of density-grid Based clustering Algorithms on data streams”, In IEEE International Conferences on Fuzzy Systems and Knowledge Discovery (FSKD), Volume: 3 , Publication Year: 2011, Page(s): 1652 - 1656

Siyuan Liu ; Yunhuai Liu ; Ni, L.,”Detecting Crowdedness Spot in City Transportation”,IEEE Transactions on Vehicular Technology Volume: 62 ,Issue: 4 ,Publication Year: 2013 , Page(s): 1527 - 1539

Xiaopeng Yu ; Deyi Zhou ; Yan Zhou, “A new clustering Algorithm based on Distance and density”, In IEEE International Conference on Services Systems and Services Management,Volume: 2 , Publication Year: 005, Page(s):1016 –1021.

Osman, M.K. ; Mashor, M.Y, “Performance comparison of Clustering algorithms for Tuberculosis Bacilli Segmentation”, IEEE Transactions on Computer, Information, Publication Year: 2012, Page(s):1-5..

Wu Lingyu ; Gao Xuedong, “ A Density-based clustering Algorithm for Weighted Network with Attribute Information”, In IEEE International Conference on Advanced Computer Control, Publication Year: 2011 ,Page(s): 629 - 633

Cheng-Fa Tsai ; Chun-Yi Sung ,” DBSCALE: An Efficient Density-based clustering algorithm for data Mining in large databases”, IEEE International Conference on Circuits, Communications, Volume: 1, Publication Year: 2010 , Page(s): 98 - 101

Huang, X. ; Ye, Y.; Zhang, H,” Extensions of Kmeans-Type Algorithms: A New Clustering Framework by Integrating Intracluster Compactness and Intercluster Separation ,In IEEE Transactions on Neural Networks and Learning Systems, Volume: PP , Issue: 99 Publication Year: 2013 .

Yue Yang ; Zhuo Liu ; Jian-pei Zhang ; Jing Yang ,”Dynamic density-based clustering algorithm over Uncertain data streams”, In IEEE International Conference on Fuzzy Systems and Knowledge Discovery (FSKD),Publication Year: 2012, Page(s): 2664-2670

Jinhua Xu; Hong Liu,” Web user clustering analysis based on KMeans Algorithm “, IEEE International Conference on Information Networking and Automation (ICINA), Volume: 2, Publication Year: 2010 , Page(s): V2-6 - V2-9

Yi Hong; Sam Kwong,” Learning Assignment Order of Instances for the Constrained K-Means Clustering Algorithm,” In IEEE Transactions on Systems, Man, and Cybernetics , Volume: 39 , Issue: 2 ,Publication Year:2009, Page(s): 568 – 574

Czink, N., Cera, P.,” Improving clustering performance Using multipath component Distance,” In IEEE Transactions on Knowledge and Engineering, Volume: 42 , issue:1, PublicationYear:2006

Jie Cao; Zhiang Wu,” SAIL: Summation-bAsed Incremental Learning for Information-Theoretic Text Clustering”, In IEEE Transactions on knowledge and Engineering Volume: 43 , Issue: 2 ,Publication Year: 2013, Page(s): 570 – 584

Siddiqui, F.U. ; Isa, N.A.M.,” Enhanced moving K-means (EMKM) algorithm for image segmentation” ,IEEE Transactions on Consumer Electronics, Volume: 57 , Issue: 2 , Publication Year: 2011 , Page(s): 833 – 841,

Vijayalakshmi,S. Punithavalli, M., “Improved varied Density based spatial clustering algorithm with noise”, IEEE International Conference on Computational Intelligence and Computing Research, Publication Year: 2010, Page(


  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.