Open Access Open Access  Restricted Access Subscription or Fee Access

A Data Clustering Using Visual Assessment of Cluster Tendency Algorithm of Data Partitioning Methods

R. Tamilselvan, V. Hariharaprabu, R. Bhaskaran, Dr.C. Palanisamy

Abstract


This paper proposes a new innovative algorithm is called Visual assessment of cluster tendency, uses a visual approach to find the number of clusters in data. The Visual Assessment of(cluster) Tendency (VAT) method readily displays cluster tendency for small data sets as grayscale images, but is too computationally costly for larger data sets. We first study an important visual methods have been widely studied and used in data cluster analysis. The basis of the method is to regard D as a subset of known values that is part of a larger, unknown N×N dissimilarity matrix, and then impute the missing values from D . The VAT algorithm generally represent D as an N×N Image I(D) where the objects are reordered to reveal hidden cluster structure along the diagonal of the image. This paper addresses the limitation by proposing a VAT algorithm, where D is mapped D in a graph embedding space and then reordered to D using VAT algorithm. Two important points: i) because VAT is
scalable by sVAT to data sets of arbitrary size, and because coVAT depends explicitly on VAT, this new approach is immediately scalable to say, the sVAT model, which works for even very large(unloadable) data sets without alteration; and ii) VAT, sVAT and coVAT are autonomous, parameter free models—no “hidden values”are needed to make them work. A sampling-based extended scheme is also proposed to enable visual cluster analysis for large data sets.Extensive experimental results on several synthetic and real-world data sets validate our VAT algorithms.


Keywords


Clustering, Cluster Analysis, Cluster Tendency, Hidden Values, VAT, sVAT, coVAT,

Full Text:

PDF

References


J.C. Bezdek ,R. Hathaway, and J. Huband, "Scalable Visual Assessment of Cluster Tendency," Pattern Recognition, vol. 39, pp. 1315-1324,2006.

J.C. Bezdek and R.J. Hathaway, "VAT: A Tool for Visual Assessment of (Cluster) Tendency," Proc. Int'l Joint Conf. Neural Networks, pp.2225-2230, 2002.

J.C. Bezdek, R.J. Hathaway, and J. Huband, "Visual Assessment of Clustering Tendency for Rectangular Dissimilarity Matrices," IEEE Trans. Fuzzy Systems, vol. 15, no. 5, pp. 890-903, Oct. 2007.

J.Bezdek, C.Leckie, L.Wang and X.Geng, “Enhanced Visual Analysis for cluster Tendency Assessment and Data Partitioning,” IEEE Trans.Knowledge and Data Eng., vol. 22, no. 10, pp. 1401-1414, Oct 2010.

D. Cai, X. He, and J. Han, "Document Clustering Using Locality Preserving Indexing," IEEE Trans. Knowledge and Data Eng., vol. 17,no. 12, pp. 1624-1637, Dec. 2005.

W.S. Cleveland, Visualizing Data. Hobart Press, 1993.

Dhillon, D. Modha, and W. Spangler, "Visualizing Class Structure of Multidimensional Data," Proc. 30th Symp. Interface: Computing Science and Statistics, 1998.

S. Guattery and G.L. Miller, "Graph Embeddings and Laplacian Eigenvalues," SIAM J. Matrix Analysis and Applications, vol. 21, no. 3,pp. 703-723, 2000.

X. Hu and L. Xu, "A Comparative Study of Several Cluster Number Selection Criteria," Intelligent Data Engineering and Automated Learning, pp. 195-202, Springer, 2003.

J. Huband, J.C. Bezdek, and R. Hathaway, "Bigvat: Visual Assessment of Cluster Tendency for Large Data Sets," Pattern Recognition, vol. 38,no. 11, pp. 1875-1886, 2005.

L.Jing, M.K.Ng and J.Z.Huang, “An Entropy Weighting k-Means Algorithm for Subspace Clustering of High-Dimensional Sparse Data,”IEEE Trans. Knowledge and Data Eng., vol. 19, no. 8, pp. 1401-1414,Aug 2007.

R. Ling, "A Computer Generated Aid for Cluster Analysis," Comm. ACM, vol. 16, pp. 355-361, 1973.

A. Ng, M. Jordan, and Y. Weiss, "On Spectral Clustering: Analysis and an Algorithm," Advances in Neural Information Processing Systems. MIT Press, 2002.

H.Z. Ning, W. Xu, Y. Chi, and T.S. Huang, "Incremental Spectral Clustering with Application to Monitoring of Evolving Blog Communities," Proc. SIAM Int'l Conf. Data Mining, 2007.

N. Otsu, "A Threshold Selection Method from Gray-Level Histograms,"IEEE Trans. Systems, Man, and Cybernetics, vol. 9, no. 1, pp. 62-66,Jan. 1979.

P.J. Rousseeuw, "A Graphical Aid to the Interpretations and Validation of Cluster Analysis," J. Computational and Applied Math., vol. 20, pp.53-65, 1987.

T. Tran-Luu, "Mathematical Concepts and Novel Heuristic Methods for Data Clustering and Visualization," PhD thesis, Univ. of Maryland,1996.

U. von Luxburg, "A Tutorial on Spectral Clustering," technical report, Max Planck Inst. for Biological Cybernetics, 2006.

Y. Weiss, "Segmentation Using Eigenvectors: A Unifying View," Proc.IEEE Int'l Conf. Computer Vision, pp. 975-982, 1999.

R. Xu and D. Wunsch,II, "Survey of Clustering Algorithms," IEEE Trans. Neural Networks, vol. 16, no. 3, pp. 645-678, May 2005.

W. Xu, X. Liu, and Y. Gong, "Document Clustering Based on Non- Negative Matrix Factorization," Proc. ACM SIGIR, 2003.

L. Wang, J. Bezdek, C. Leckie, and R. Kotagiri, "Selective Sampling for Approximate Clustering of Very Large Data Sets," Int'l J. Intelligent Systems, vol. 23, no. 3, pp. 313-331, 2008.


Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.