GB-NClust: A Pioneering Graph-Based Approach for Natural Clustering in Spatial Datasets
The exponential rate at which volume of data is increasing, it is actually impractical to avoid the need of analyzing data for decision-making purposes. Clustering is one of the very effective ways for the data analysis. It is a principal tool of data mining for extracting previously unknown patterns existing in spatial datasets. It is a process of gathering similar data objects into one group such that objects in different groups are dissimilar. Datasets having non-uniform data object distribution are not dealt properly by the available clustering algorithms in literature. In this paper, an innovative approach of generating clusters in spatial datasets (uniform or non-uniform) have been proposed which models dataset using Delaunay structure for capturing spatial proximities and the analysis of the Delaunay edges have been carried out based on a range which is computed automatically by the algorithm for determining whether the two vertices are attracted towards each other or there is a repulsion between them. The edges corresponding to the vertices that are repulsive to each other are removed from the Delaunay structure thus obtaining the desired clusters of arbitrary shapes without any human interaction. The edges are classified as very strong edges, weak edges and floating edges. The dense and sparse clusters present in the same dataset have also been identified effectively even in the presence of bridges between the two clusters. The experimental study is conducted on sample datasets, which shows encouraging results.
C. Eldershaw, M. Hegland, “Cluster Analysis using triangulation,” in CTAC97,World Scientific,Singapore,1997, pp. 201-208.
C.S. Chen, J. Choo, R. Jiamthapthaksin, O.U. Celepcikay., C. Giusti, C.F. Eick, “MOSAIC: A Proximity Graph Approach for Agglomerative Clustering”, In 9th Intl. Conf. on Data Warehousing and Knowledge Discovery, Germany, 2007, pp. 231-240.
D. Fisher, “Improving inference through conceptual clustering,” in proc. 1987 National Conf. Artificial Intelligence(AAAI’87), Seattle, pp. 461-465.
D. Liu., O. Sourina., “Free parameters clustering of spatial data with non-uniform density,” in proc. of Intl. Conf. on Cybernetics and Intelligent Systems, vol. I, 2004, pp. 387-392.
E. Vladimir, I. Lee, “Autoclust: Automatic clustering via boundary extraction for mining massive point- data sets”, in proc. 5th International Conf. On Geocomputation, UK, 2000, pp. 23-25.
G. Karypis, E. Han, V. Kumar, “CHAMELEON: A hierarchical Clustering Algorithm Using Dynamic Modeling,” IEEE Computer Special Issue on data analysis and mining , 1999, pp. 68-75.
G. Sheikholeslmi, S. Chatterjee, A. Zhang, “WAVECLUSTER: A multi-resolution clustering approach for very large spatial data bases”, in Proc. of Intl. Conf. Very Large Databases, New York, Aug. 1998, pp. 428-439.
J. Gennari, P. Langley, D. Fisher, “Models of incremental concept formation”, In J. Artificial Intelligence, vol. 40, issues 1-3, 1989, pp. 11-61.
J. Han , M. Kamber, Data Mining : Concepts And Techniques, Morgan Kaufmann Publishers, 2006, ch.7.
J. MacQueen, “Some methods for classification and analysis of multivariate observations,” in proc. 5th Berkley Symp. Math. Statistical Prob., vol.1, 1967, pp.281-297.
K.H. Anders., “ A Hierarchical Graph-clustering Approach to find Group of objects,” Technical paper, in ICA Commission on Map Generalizations, 5th Workshop on Progress in Automated Map Generalization, 2003, pp. 5-10.
M. Ankerst, M. Breunig, H.P. Kriegel, J. Sander, “OPTICS: Ordering points to identify the clustering structure,” Proc. 1999 ACM-SIGMOD Int. Conf. Management of Data, PA,June 1999, pp. 49-60.
M. Ester, H.P Kriegel., J. Sander, X. Xu ,“Clustering for mining in large spatial databases,” in special issue on Data Mining ,KI-Journal,vol. 1, 1998.
M. Ester, H.P. Kriegel, J. Sander, X..Xu, “ A density-based algorithm for discovering clusters in large spatial databases”, Proc. 1996 Intl. Conf. Knowledge Discovery and data mining, Portland, Aug. 1996, pp.226-231.
M. Ester, H.P. Kriegel, J. Sander, X. Xu, “Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and its Applications,” in Intl. J. of Data Mining and Knowledge Discovery, vol.2, 2004, pp. 169-194.
Sudipto, K.S. Guha, R. Rastogi., “CURE:An efficient clustering algorithm for large databases”, In Proc. ACM SIGMOD Intl. Conf. on Management of Data, USA, 1998, pp. 73-84.
T. Zhang, R. Ramakrishnan, M. Linvy, “BIRCH: An efficient data clustering method for very large databases”, in Proc. ACM SIGMOD Intl. Conf. On Management of Data, ACM Press, New York, 1996, pp.103 -114.
Voronoi Website: www.voronoi.com.
W. Wang, J. Yang, R. Muntz, “STING: A Statistical information grid approach to spatial data mining”, In Proc. Intl. Conf. Very large Data Bases, Greece, Aug.1997, pp.186-195.
Estivill-Castro, I.J. Lee “AUTOCLUST+: Automatic Clustering of Point-Data Sets in the Presence of Obstacles”, In Proc. of the Intl. Workshop on Temporal, Spatial and Spatial-Temporal Data Mining, Lyon, France, 2000, pp. 133-146.
A.K.H Tung., J. Hou, J. Han “Spatial Clustering in the Presence of Obstacles”, In Proc. of Intl. Conf. on Data Engineering (ICDE'01), Heidelberg, Germany, 2001, pp. 359-367.
L. Dongquan, V.N Gleb., S. Olga “Effective clustering and boundary detection algorithm based on Delaunay triangulation” , In Pattern Recognition Letters vol. 29, July 2008, NY, USA, ISSN:0167-8655, pp. 1261-1273.
L. Dongquan, V.N. Gleb, S. Olga “Automatic clustering and boundary detection algorithm based on adaptive influence function” In Pattern Recognition, vol.41, Sept. 2008, NY, USA, ISSN: 0031-3203, pp.2757-2776.
A.K. Sharma, A. Arya “A novel graph based clustering approach for spatial datasets”, In J. of Pure and Applied Mathematika Sciences, vol. LXIII, no. 1-2, March 2006, pp. 23-35.
A.K. Sharma, A. Arya, M. Sharma, “A graph based mechanism for clustering multidimensional spatial datasets”, In Proc. of Intl. Conf. On Data Management, IMT, Ghaziabad, 2008.
Abraham, S. Das, S. Roy “Swarm Intelligence Algorithms for Data Clustering”, In Proc. of Soft Computing for Knowledge Discovery and Data Mining, 2008, pp.279-313.
G. Folino, Spezzano, A. Forestiero “An adaptive flocking algorithm for performing approximate clustering”, In J. of Information Sciences, Vol 179, Issue 18, 2009, pp. 3059-3078.
L. Duan , L. Xu ,F. Guo,J. Lee, B. Yan “ A local density based spatial clustering algorithm with noise”, In J. of Information Systems, Vol. 32, 2007, pp 978-986.
J. Aldstadt, A. Getis “ Using AMOEBA to create a spatial weights matrix and identify spatial clusters”, In J. of Geographical Analysis, ISSN0016-7363, 2006, pp 327-343.
D.B. Neill, A.W. Moore “Anomalous spatial cluster detection” In Proc. of AD-KDD’05, Chicago, Illinois, 2005.
D. Agarwal, A. McGregor, J.M. Phillips, S. Venkatsubramanium, Z. Zhu “Spatial Scan Statistics: approximations and performance study”, In Proc. of Intl. Conf. on Knowledge Discovery and Data Mining, , PA, USA, 2006, pp. 24-33.
J. Kennedy, R.C. Eberhart “Particle Swarm Optimization”, IEEE Intl. Conf. On Neural Networks, vol. IV, Perth Australia, 1995, pp. 1942-1948.
M.M. Breunig, H.P. Kriegel , R.T. Ng., J. Sander “LOF: Identifying density-based local outliers”, In ACM SIGMOD Intl. Conf. Management of Data, 2000.
M. Rasmussen, G. Karypis “gCLUTO- An Interactive Clustering, Visualization and Analysis System”, University of Minnesota, Dept. of Computer Science and Engineering, CSE/UMN Tech. Report TR04-021, 2004.
L. Anselin, I. Syabri, Y. Kho “ GeoDA: An Introduction to Spatial Data Analysis. In Journal of Geographical Analysis 38, 2006, pp. 5-22.
- There are currently no refbacks.
This work is licensed under a Creative Commons Attribution 3.0 License.