Open Access Open Access  Restricted Access Subscription or Fee Access

Classification using Generalization Based Decision Tree Induction along with Relevance Analysis Based on Relational Database

Amit Thakkar, Yogeshwar P Kosta, Amit Ganatra

Abstract


Classification is a process of sorting unknown values of certain attributes-of-interest based on the values of other attributes, and is a major challenge in data mining. A commonly used method is the decision tree. The efficiency of decision tree algorithms has been well established for relatively small data sets. However, this method of classification has problems when handling larger data sets, data having continuous numerical values, and has the tendency to favor multiplicity in terms of values associated with the attributes in the data set while making selection of the final determining attribute. In data mining applications, large training sets are common; therefore decision tree algorithms have limitations of scalability. Also in most data mining application, users have a little knowledge regarding which signature attribute should be selected for effective mining and the user is more dependent upon the capability of the algorithm. In this paper, we address selection of two things, one, the right signature attribute and the second, handle large data set. This we accomplish by proposing a new data classification method through integration of a set of sequential process that involves steps such as data cleaning; attribute oriented induction (identifying the signature attribute), relevance analysis as the preprocessing steps followed by induction of decision trees. This stepwise approach helps us to set simple extraction rules at multiple levels of abstraction and easily handles large data sets and continuous numerical values in a scalable way.

Keywords


Data Mining, Classification, Data Cleaning, Decision Tree Induction, Relevance Analysis.

Full Text:

PDF

References


Petra Perner. Improving the Accuracy of Decision Tree Induction by Feature Pre-Selection. Applied Artificial Intelligence 2001, vol. 15, No. 8, p. 747-760.

Dianhong Wang; Liangxiao Jiang (2007). “An Improved Attribute Selection Measure for Decision Tree Induction.” Fuzzy Systems and Knowledge Discovery, 2007. FSKD 2007. Fourth International Conference on Volume 4, Issue , 24-27 Aug. 2007.

Hendrik Blockeel, Jan Struyf (2002) “Efficient Algorithms for Decision Tree Cross-validation”. Journal of Machine Learning Research 3 (2002) 621-650.

Petra Perner, “Improving the accuracy of decision tree induction by feature preselection”, Applied Artificial Intelligence: An International Journal, 1087-6545, Volume 15, Issue 8, 2001, Pages 747 – 760.

S Rasoul Safavian and David Landgrebe, “A Survey of Decision Tree Classifier Methodology”, IEEE Transactions on System, Man and Cybernetics, Vol.21 , No p. 660-674, May 1991.

J.R.Quinlan, “Induction of Decision Trees”, Machine Learning 1: 81-106, 1986.

Machine learning dataset http://www.ics.uci.edu/~mlearn/MLRepository.html

Maybin K. Muyeba and John A. Keane. Interestingness in Attribute-Oriented Induction (AOI): Multiple-Level Rule Generation Lecture Notes in Computer Science, 2000, Volume 1910/2000, 2-7, DOI: 10.1007/3-540-45372-5_64

Y. Cai, N. Cercone and J Han , Attribute Oriented Induction in Relational Database ,Proceeding of IJCAI -89 Workshop on knowledge Diccovery in Databases, August 1989 ,26-36.

Liu, H., Dougherty, E., Dy, J., Torkkola, K., Tuv, E., Peng, H., Ding, C., Long, F., Berens, M., Parsons, L., Zhao, Z., Yu, L., Forman, G.: Evolving feature selection. IEEE Intelligent Systems 20(6), 64–76 (2005)

Y.Saeys, I.Inza, and P. LarrANNaga, "A Review of Feature Selection Techniques in Bioinformatics", Bioinformatics, 23(19), pp.2507-2517, (2007).

Lawrence O. Hall, Nitesh Chawla and Kevin W. Bowyer, Decision Tree Learning on Very Large Data Sets, Department of Computer Science and Engineering, ENB 118, University of South Florida ,4202 E. Fowler Ave.

S. Rasoul Safavian and David Landgrebe, A Survey of Decision Tree Classifier Methodology, School of Electrical Engineering ,Purdue University, West Lafayette, IN 47907, Phone 317-494-3486; Fax 317-494-3358, landgreb@ecn.purdue.edu

Wei Wang, Predictive Modeling Based on Classification and Pattern Matching Methods B.Sc. Beijing Polytechnic University- 1992 , a Thesis Submitted in the patial fulfillment of the requirement of Master of Science in the School of Computing Science.


Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.