High Dimensional Data Mining Using Clustering

A. Bharathi; Dr. A. M. Natarajan

High Dimensional Data Mining Using Clustering

A. Bharathi, Dr. A. M. Natarajan

Abstract

Clustering is one of the major tasks in data mining Clustering algorithms are based on a criterion that maximizes inter cluster distance and minimize intra cluster distance. In higher dimensional feature spaces, the performance and efficiency deteriorates to a greater extent. Large dimensions confuse the clustering algorithms and it is difficult to group similar data points becomes almost the same and is usually called as the “dimensionality curse” problem. These algorithms find a subset of dimensions by removing irrelevant and redundant dimensions on which clustering is performed. Dimensionality reduction technique such as Principal Component Analysis (PCA) is used for feature reduction. If different subsets of the points cluster well on different subspaces of the feature space, a global dimensionality reduction will fail. To overcome these problems, recent directions in research proposed to compute subspace cluster. The algorithms have two common limitations. First, they usually have problems with subspace clusters of different dimensionality. Second, they often fail to discover clusters of different shape and dimensionalities. The goal of this project is to develop new efficient and effective methods for high dimensional clustering.

Keywords

Data mining, High Dimensional Clustering, Distance Measure

Full Text:

PDF

References

Christian Baumgartner, Karin Kailing, Hans-Peter Kriegel, Peer Kroger,“Subspace Selection for Clustering High-Dimensional Data” Proceedings of the Fourth IEEE International Conference on Data Mining (ICDM’04) IEEE, pp: 0-7695-2142-8.

Hans-Peter Kriegel, Peer Kroger, Matthias Renz, Sebastian Wurst, “A Generic Framework for Efficient Subspace Clustering of High-Dimensional Data”. Proceedings of the Fifth IEEE International Conference on Data Mining (ICDM’05) pp: 1 – 8.

Haiyun Bian, Raj Bhatnagar, “A Level wise Search Algorithm for Interesting Subspace Clusters”, Proceedings of the Fifth IEEE International Conference on Data Mining (ICDM’05) IEEE, pp: 1550-4786.

Hong-bin wang i, cheng-bo wang, li-feng zhang i, dong-ru Zhou, “Data Clustering Algorithm Based On Binary Subspace division”.2004 IEEE PP:1249 – 1253.

Ioannis A. Sarafis, Phil W. Winder, Ali M. S. Zalzala, “Towards Effective Subspace Clustering with an Evolutionary Algorithm”, 2003 IEEE PP: 797 – 806.

Jian Yin, Zhilan Huang, Yubao Liu, Gearing Cai, Jian Chen, “An Effective Maximal Subspace Clustering Algorithm Based on Enumeration Tree”,National Natural Science Foundation of China, Research Foundation of National Science and Technology Plan Project.

Jinze Liu, Karl Strohmaier, and Wei Wang, “Revealing True Subspace Clusters in High Dimensions”, Proceedings of the Fourth IEEE International Conference on Data Mining (ICDM’04) 0-7695-2142 .

Jyoti Pawar, P.R.Rao, “An Attribute Based Storage Method for Speeding up CLIQUE Algorithm for Subspace Clustering”, 10th International Database Engineering and Applications Symposium (IDEAS'06).

Liping Jing, Michael K. Ng, and Joshua Zhexue Huang,“An Entropy Weighting k-Means Algorithm for Subspace Clustering of High-Dimensional Sparse Data” 2007 IEEE PP:1041-4347.

Maria Kontaki, Yannis Manolopoulos , “Efficient Incremental Subspace Clustering in Data Streams”, (IDEAS'06) PP: 0-2577.

Nalazs Balasko, Janos Abonyi and Balazs Feil, “Fuzzy Clustering and Data Analysis Toolbox”, PP: 1653-2379.

Hinneburg, C. C. Aggarwal, and D. A. Keim,”What is the nearest neighbor in high dimensional spaces?”Intl. Conf. Very Large Data Bases, Cairo,Egypt, Sept.2000, pp. 506-515.

R. Agrawal, J. Gehrke, D. Gunopulos, and P.Raghavan, “Automatic subspace clustering of highdimensional data for data mining applications”,ACMSIGMOD Intl. Conf. Management of Data, Seattle, WA, June 1998,pp. 94-105.

Hinneburg and D. A. Keim, “Optimal grid clustering:Towards breaking the curse of dimensionality in high-dimensional clustering”, Intl. Con. Very Large Data Bases, Edinburgh, UK, Sept. 1999, pp. 506-517.

L. Milenova and M. M. Campos, “O-Cluster: Scalable clustering of large high-dimensional data sets”, IEEE Intl. Conf. on Data Mining, Maebashi City, Japan, Dec. 2002, pp. 290-297.

G. German, M. Gahegan, and G. West, “Predictive assessment of neural network classifiers for applications in GIS”, Conf. of Geocomputation,Otago, New Zealand, Aug. 1997.

Refbacks

There are currently no refbacks.

This work is licensed under a Creative Commons Attribution 3.0 License.

Username
Password
Remember me