A Review of Splitting Criteria for Decision Tree Induction

N. S. Sheth; A. R. Deshpande

A Review of Splitting Criteria for Decision Tree Induction

N. S. Sheth, A. R. Deshpande

Abstract

Decision Tree techniques are used to build classification models in data mining. A decision tree is a sequential hierarchical tree structure which is composed of decision nodes corresponding to attributes. A decision tree model is based on attribute selection measure. The paper represents splitting criterion like Information Gain, Gain Ratio, Gini Index, Jaccard Coefficient, and Least Probable Intersections. In decision tree construction, the splitting criterion is heuristic for best attribute selection that partitions node dataset. Attribute with best score is chosen as a splitting attribute for a node. Best score is based on either impurity reduction or purity gain. This paper gives comparative study of attribute selection measures for top down induction of decision tree.

Full Text:

PDF

References

J.R. Quinlan, “Induction of Decision Trees,” Machine Learning, vol. 1, Mar. 1986.

J.R. Quinlan, “C4.5: Programs for Machine Learning. Morgan Kaufmann,” 1993.

Leland Wilkinson, “Classification and Regression Trees,” Chapter 3.

S. Guha, R. Rastogi, and K. Shim, “Rock: A Robust Clustering Algorithm for Categorical Attributes,” Information Systems, July 2000.

A. Gershman et al., “A Decision Tree Based Recommender System,” Proc. 10th Int’l Conf. Innovative Internet Community Services 2010.

Ma’ayan Dror, Asaf Shabtai, Lior Rokach, and Yuval Elovici, “OCCT: A One -Clustering Tree for Implementing One-To-Many Data Linkage.”IEEE Transactions On Knowledge And Data Engineering, Vol. 26,No. 3, March 2014

B.H. Jun, C.S. Kim, J. Kim, “A new criterion in selection and discretization of attributes for the generation of decision trees,” IEEE Transactions on Pattern Analysis and Machine Intelligence 19 (12) 1997.

W. Dianhong, J. Liangxiao, “An improved attribute selection measure for decision tree induction,” 4th International Conference Proceedings on FSKD, IEEECS, 2007.

Lior Rokach, Oded Maimon, “DECISION TREES,” Chapter-9.

S. B. Kotsiantis, “Decision trees: a recent overview,” Springer Science+Business Media B.V. 2011.

Khaled Alsabti, Sanjay Ranka, Vineet Singh, “CLOUDS: A Decision Tree Classifier for Large Datasets,” October 1998.

Suphakit Niwattanakul, Jatsada Singthongchai, Ekkachai Naenudorn and Supachanun Wanapu, “Using of Jaccard Coefficient for Keywords Similarity,” IMECS 2013, March 13 - 15, 2013.

Manish Mehta, Rakesh Agrawal, Jorma Rissanen, “SLIQ: A Fast Scalable Classifier for Data Mining,”1996.

J.R.Quinlan, “Improved Use of Continuous Attributes in C4.5,” 1996.

S. Ivie, G. Henry, H. Gatrell, and C. Giraud-Carrier, “A Metric- Based Machine Learning Approach to Genealogical Record Linkage,” 2007.

J. Han, M. Kamber, J. Pei, “Data Mining, Concepts and Techniques,” Third Edition, Elsevier.

M. H. Dunham, “Data Mining, Introductory and Advanced Topics,” Pearson Education.

Refbacks

There are currently no refbacks.

This work is licensed under a Creative Commons Attribution 3.0 License.

Username
Password
Remember me