Open Access Open Access  Restricted Access Subscription or Fee Access

Improvisation of Clustering by Attribute Reduction Using Bayesian Theorem

S. Senthamarai Kannan, Dr. N. Ramaraj, Dr. S. Baskar

Abstract


Data reduction aims to reduce the dimensionality of large scale data with out losing useful information, is an important topic of knowledge discovery, data clustering and classification. This Paper introduces a novel concept of dependency based attribute reduction using Bayes Theorem. Bayesian Theory is of great interest in Data reduction. Attribute reduction is a data mining approach for detecting and characterizing combinations of attributes or independent variables that interact to influence a dependent or class variable. The basis of this attribute reduction is a method that converts two or more variables or attributes to a single attribute and by calculating the probabilities of their values in deciding the value of class attribute. nHence, the dependent attributes are found and are removed from the original dataset. The end goal is to improve the classification accuracy such that prediction of the class variable is improved over that of the original data with initial attribute set and also reduces the computational time.


Keywords


Attributes Reduction, Data Classification, Bayesian Theory, Clustering, Simple K-Means, Cobweb, EM.

Full Text:

PDF

References


G. H. John, R. Kohavi, and K. Peger. Irrelevant features and the subset selection problem. In Proceedings of the 11th International Conference on Machine Learning, pages 121{129, New Brunswick, NJ, 1994.Morgan Kaufmann.

Borg and P. Groenen. Modern Multidimensional Scaling: Theory and Applications. Springer, 1997.

K. Fukunaga. Introduction to Statistical Pattern Recognition. Academic Press, San Diego, CA, USA, 1990.

L.O. Jimenez and D.A. Landgrebe. Supervised classification in high-dimensional space: geometrical, statistical, and asymptotical properties of multivariate data. IEEE Transactions on Systems, Man and Cybernetics, 28(1):39–54, 1997.

H. Choi and S. Choi. Robust kernel Isomap. Pattern Recognition, 40(3):853–862, 2007.

M.L. Raymer, W.F. Punch, E.D. Goodman, L.A. Kuhn, and A.K. Jain. Dimensionality reduction using genetic algorithms. IEEE Transactions on Evolutionary Computation, 4:164–171, 2000.

J. Novovi_cov_a, A. Mal__k, and P. Pudil. Feature selection using improved mutual information for text classification. In A. L. N. Fred, T. Caelli, R. P. W. Duin, A. C. Campilho, and D. de Ridder, editors, SSPR/SPR, volume 3138 of Lecture Notes in Computer Science, pages 1010{1017. Springer, 2004.

A. Blum and P. Langley. Selection of relevant features and examples in machine learning. Artificial Intelligence, 97(1-2):245{271, 1997}.

M. West. Bayesian factor regression models in the large p, small n" paradigm. Bayesian Statistics, 7:723{732, 2003}.

D.J. Newman, S. Hettich, C.L. Blake, and C.J. Merz. UCI repository of machine learning databases, 1998.

James Joyce(2003),”Bayes theorem”, Standford encyclopedia of philosophy.

Langley, P. (1994). Selection of relevant features in machine learning. Proceedings of the AAAI Fall Symposium on Relevance. AAAI Press.

Liu, H., Motoda, H., & Yu, L. (2002b). Feature selection with selective sampling. Proceedings of the Nineteenth International Conference on Machine Learning (pp. 395 - 402).

Yang, Y., & Pederson, J. O. (1997). A comparative study on feature selection in text categorization. Proceedings of the Fourteenth International Conference on Machine Learning (pp. 412{420).

T. Hastie, R. Tibshirani, and J. Friedman, the Elements of Statistical Learning Data Mining, Inference, and Prediction, Springer Series in Statistics. Springer Verlag, New York, 2000.

Xing, E., Jordan, M., & Karp, R. (2001). Feature selection for high-dimensional genomic microarray data. Proceedings of the Eighteenth International Conference on Machine Learning (pp. 601{608).

L.O. Jimenez and D.A. Landgrebe. Supervised classification in high-dimensional space: geometrical, statistical, and asymptotical properties of multivariate data. IEEE Transactions on Systems, Man and Cybernetics, 28(1):39–54, 1997.

S. Senthamarai Kannan and N. Ramaraj:A modified Correlation Based Algorithm for Attribute Reduction in Data Clustering: Data Science Journal Volume8: 125-138 (2009)


Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.