Open Access Open Access  Restricted Access Subscription or Fee Access

A Survey on Classification Methods Based on Decision Tree Algorithms in Data Mining

B. Rosiline Jeetha, Dr.M. Punithavalli

Abstract


Data mining resides in the junction of traditional statistics and computer science. As distinct from statistics, data mining is more about searching for hypotheses in data that happens to be available instead of verifying research hypotheses by collecting data from designed experiments. Data mining is also characterized as being oriented toward problems with a large number of variables and/or samples that makes scaling up algorithms important. This means developing algorithms with low computational complexity, using parallel computing, partitioning the data into subsets, or finding effective ways to use relational data bases. The process- and utility-centered thinking in data mining and knowledge discovery is manifested also in the reported, commercial systems. Decision Trees are considered to be one of the most popular approaches for representing classifiers. Researchers from various disciplines such as statistics, machine learning, pattern recognition, and data mining considered the issue of growing a decision tree from available data. The technology for building Knowledge based system by decision tree algorithms has been demonstrated successfully in several practical applications. This paper summarizes an approach to synthesizing decision trees that has been used in variety of systems, and it describes such system ID3, C4.5 and CART. Results from recent studies show ways in which the methodology can be modified to deal with information that is noisy and/or incomplete.

Keywords


Decision Tree, ID3, C4.5 and CART

Full Text:

PDF

References


R. Kohavi and J. R. Quinlan. Decision-tree discovery. In Will Klosgen and Jan M. Zytkow, editors, Handbook of Data Mining and Knowledge Discovery, chapter 16.1.3, pages 267-276. Oxford University Press, 2002.

S. Grumbach and T. Milo: Towards Tractable Algebras for Bags. Journal of Computer and System Sciences 52(3): 570-588, 1996. IEEE TRANSACTIONS ON SYSTEMS, MAN AND CYBERNETICS: PART C, VOL. 1, NO. 11, NOVEMBER 2002 11

T. R. Hancock, T. Jiang, M. Li, J. Tromp: Lower Bounds on Learning Decision Lists and Trees. Information and Computation 126(2): 114-122, 1996.

H. Zantema and H. L. Bodlaender, Finding Small Equivalent Decision Trees is Hard, International Journal of Foundations of Computer Science, 11(2):343-354, 2000.

G.E. Naumov. NP-completeness of problems of construction of optimal decision trees. Soviet Physics: Doklady, 36(4):270-271, 1991.

J. R. Quinlan, C4.5: Programs For Machine Learning. Morgan Kaufmann, Los Altos, 1993.

S. B. Gelfand, C. S. Ravishankar, and E. J. Delp. An iterative growing and pruning algorithm for classification tree design. IEEE Transaction on Pattern Analysis and Machine Intelligence, 13(2):163-174, 1991.

J.R. Quinlan, Decision Trees and Multivalued Attributes, J. Richards, ed., Machine Intelligence, V. 11, Oxford, England, Oxford Univ. Press, pp. 305-318, 1988.

R. Lopez de Mantras, A distance-based attribute selection measure for decision tree induction, Machine Learning 6, 81-92, 1991.

U. M. Fayyad and K. B. Irani. The attribute selection problem in decision tree generation. In proceedings of Tenth National Conference on Artificial Intelligence, pages 104–110, Cambridge, 1992. MA: AAAI Press/MIT Press.

P. E. Utgoff and J. A. Clouse, A Kolmogorov-Smirnoff Metric for Decision Tree Induction, Technical Report 96-3, University of Massachusetts, Department of Computer Science, Amherst, MA

P. C. Taylor and B. W. Silverman. Block diagrams and splitting criteria for classification trees. Statistics and Computing, 3(4):147-161, December 1993.

J. K. Martin. An exact probability metric for decision tree splitting and stopping. An Exact Probability Metric for Decision Tree Splitting and Stopping, Machine Learning, 28 (2-3):257-291, 1997.


Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.