Open Access Open Access  Restricted Access Subscription or Fee Access

A Study on Similarity Function for Tree Structured Data

K. Simila, R. Srividhya

Abstract


We have several distance or similarity functions for trees, but their performance is not always adequate in different applications. In the base paper the Extended Sub tree (EST) function, where a new sub tree mapping is proposed. This similarity function is to compare tree structured data by defining a new set of mapping rules where sub trees are mapped rather than nodes. To reduce the time complexity as well as computational complexity of the system, efficient pruning algorithm is proposed. In the proposed system the unnecessary computation is reduced in the tree structured data by using the lossless pruning strategy. This paper provides major advancement in efficiency. This pruning strategy is ignoring the node or sub tree which has greater value than the ignoring probability. By using this technique, we can reduce the extra computation complexity.


Keywords


Tree Distance, Tree Structured Data, Pruning Strategy, EST (Extended Sub Tree).

Full Text:

PDF

References


M.J. Zaki, “powerful Mining Frequent Trees n a Forest: Algorithms and Applications,” IEEE Transactions Knowledge and Data Engineering. vol. 17, no. 8, pp. 1021-1035, Aug. 2005.

J. Punin, M. Krishnamoorthy, and M. Zaki, “LOGML: Log Markup Language for Web Usage Mining,” Proceeding. Revised Papers from the Third International Workshop Mining Web Log Data across All Customers Touch Points (WEBKDD ’01), pp. 273-294, and 2002.

M.J. Zaki and C.C. Aggarwal, “XRules: An Effective Structural Classifier for XML Data,” Proceeding. Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 316-325, 2003.

W. Lian, D. W. -l. Cheung, N. Mamoulis, and S.-M. Yiu, “An Efficient and Scalable Algorithm for Clustering XML Documents by Structure,” IEEE Transactions on Knowledge and Data Engineering vol. 16, no. 1, pp. 82-96, January. 2004.

M. Kouylekov and B. Magnini, “Recognizing Textual Entailment with Tree Edit Distance Algorithms,” Proceeding First Challenge Workshop Recognizing Textual Entailment, pp. 17-20, 2005.

A. Mesbah and M.R. Prasad, “Automated Cross-Browser Compatibility Testing,” Proceeding. 33rd International Conference on Software Engineering. (ICSE), pp. 561-570, 201

A. Mesbah, A. van Deursen, and D. Roest, “Invariant-Based Automatic Testing of Modern Web Applications,” IEEE Transactions on Software Engineering., vol. 38, no. 1, pp. 35-53, Jan./Feb. 2012.

R. Connor, F. Simeoni, M. Iakovos, and R. Moss, “A Bounded Distance Metric for Comparing Tree Structure,” Information Systems, vol. 36, no. 4, pp. 748-764, 2011.

S. Flesca, G. Manco, E. Masciari, L. Pontieri, and A. Pugliese, “Fast Detection of XML Structural Similarity,” IEEE Transaction on Knowledge and Data Engineering., vol. 17, no. 2, pp. 160-175, Feb. 2005.

G. Valiente, “An Efficient Bottom-Up Distance between Trees,” Proceeding. Eighth International Symposium String Processing and Information Retrieval (SPIRE ‘01), pp. 212-219, 2001

K. Zhang, “Algorithms for the Constrained Editing Distance between Ordered Labeled Trees and Related Problems,” Pattern Recognition, vol. 28, no. 3, pp. 463-474, 1995.

T. Jiang, L. Wang, and K. Zhang, “Alignment of Trees—An Alternative to Tree Edit,” Theoretical Computer Science, vol. 143, no. 1, pp. 137-148, 1995.

W. Zuo, D. Zhang, and K. Wang, “On Kernel Difference-Weighted k-Nearest Neighbor Classification,” Pattern Analysis & Applications, vol. 11, no. 3, pp. 247-257, 2008.

E. Alpaydin, “Support Vector Machines,” Introduction to Machine Learning, second ed., pp. 218-225, MIT Press, 2004.

C.C. Aggarwal, N. Ta, J. Wang, J. Feng, and M. Zaki, “Xproj: A Framework for Projected Structural Clustering of XML Documents,” Proceeding. 13th ACM SIGKDD International Conference Knowledge on Discovery Data Mining, pp. 46-55, 2007.

F. Hadzic and M. Hecker, “Alternative Approach to Tree-Structured Web Log Representation and Mining,” Proc. IEEE/WIC/ ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT) , vol. 1, pp. 235-242, 2011.

K. Zhang and D. Shasha, “Simple Fast Algorithms for the Editing Distance between Trees and Related Problems,” SIAM J. Computing, vol. 18, no. 6, pp. 1245-1262, 1989.


Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.