Open Access Open Access  Restricted Access Subscription or Fee Access

A Patricia-Trie Approach for Incremental Mining of Frequent Itemsets on Indexed Data Blocks

K. Naresh Babu, K.F. Bharati, P. Nagendra, T. Chalapathi Rao

Abstract


Patricia-Trie based I-Forest index is a novel index structure that supports efficient item set mining into a relational DBMS. The The Patricia-Trie based I-Forest index provides a complete and compact representation of transactional data. It is a general structure that efficiently supports different algorithmic approaches to item set extraction. Selective access of the physical index blocks significantly reduces the I/O costs and efficiently exploits DBMS buffer management strategies. This approach, albeit implemented into a relational DBMS, yields performance better than the state-of-the-art algorithms accessing data on a flat file and is characterized by a linear scalability also for large data sets.

Keywords


Datamining, Frequent Itemsets, Patricis-Trie, XML, I ndex fabric, Sparse Datasets

Full Text:

PDF

References


A. V. Aho, J. E. Hopcrofi and J. D. Ullman, data Structures and Algorithms," Addison-Wesley, Reading, Mass., 1983.

J. Aoe, *' Computer Algorithms-Key Search Strategies," IEEE Comput. Society Press, 1991

A. Blumer, J. Blumer, D. Haussler and R. Mcconnel, "Complete inverted files hr efficient text retrieval and analysis," J. ACM, vo1.34, no.;, G. H. Gonnet, R. A. Baeza-Yates and T. Snider,"New indices fbr text : Pat trees and Pat arrays," in Infirmation Retrieval -Data Structures & Algorithms-,pp.66-82, Prentice Hall, New Jersey, 1992.

W. D. Jonge, A. S. Tanenbaum and R. P. Reit, Two access methods using compact binary trees,"IEEE Trans. Software Eng., vol. 13, no.7, pp.799-809, July 1987.

D. R. Morrison, '' PATRICIA-practical algorithm to retrieve inbrmation coded in alpha-numeric," J. M. Shishibori, K. Ando, H. lriguchi and J. Aoe, "An order-preserving access method using an improved BDS-tree," Proc. Third Intemational Conkrence on NLPRS'95, pp.473-479, Dec. 1995.pp.578-595, 1987. ACM, VOI. 15, pp.5 14-534, 1968.

J. Liu, Y. Pan, K. Wang, and J. Han. Mining frequent item sets by opportunistic projection. In Proc. of the 8th ACM SIGKDD Intl. Conference on Knowledge Discovery and Data Mining, pages 229–238, July 2002.

H. Mannila, H. Toivonen, and I. Verkamo, “Efficient Algorithms for Discovering Association Rules,” Proc. ACM SIGKDD Int‟l Conf. Knowledge Discovery and Data Mining, pp. 181-192, July 1994.

H. Mannila, H. Toivonen, and I. Verkamo, “Discovery of Frequent Episodes in Event Sequences,” Data Mining and Knowledge Discovery, vol. 1, no. 3, pp. 259-289, 1997

J.S. Park, M.S. Chen and P.S. Yu, “An Effective Hash Based Algorithmfor Mining Association Rules,” in Proc. 5th SIGMOD Intl. orkshop.Management of Data, California, 1995, pp. 175–186.

R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules. 20th Intl.Conf. Very Large Data Bases, Santiago, 1994, pp. 487–499.

A. Savasere, E. OmiecinskiandS.Navathe, “An Efficient Algorithm for Mining Association Rules in LargeDatabases,”in Proc. 21th Intl. Conf. Very Large Data Bases, Santiago, 1995, pp. 487–499.

S. Brin, R. Motwani, J. Ullmanand S. Tsur, “Dynamic itemset counting andimplicationrulesfor market basket data,” in Proc. 7th SIGMOD Intl. rkshop. Management of Data, Arizona, 1997, pp. 255–264.

K. Wang, L. Tang, J. Han and J. Liu, “Top Down FP-Growth for AssociationRule Mining,” in Proc. th Pacific-Asia Conf. Advances in Knowledge Discovery and Data Miningy, Taipei, 2002, pp. 334–370.

J.Pei, J. Han, H. Lu, S. Nishio, S. Tang and D. Yang, “H-Mine: Hyper-Structure Mining of Frequent Patterns in Large Databases,” in Proc. 1st IEEE Intl. Conf. Data Mining, California, 2001, pp. 441–448.

A. Pietracaprina and D. Zandolin, “MiningFrequent Itemsets using Patricia Tries,” in Proc. 1st FIMI Workshop. FrequentItemsetMining Implementations, Florida, 2003.

N.Pasquier, Y. Bastide, R. Taouil and L. Lakhal, “Discovering Frequent Closed Itemsets for Association Rules,” in Proc. 7th Intl. Conf. Database Theory, Jerusalem, 1999, pp. 398–416.

M.J. Zaki and C. Hsiao, “CHARM: An EfficientAlgorithmforClosed Itemset Mining,” in Proc. 2nd SIAM Intl. Con. Data Mining, Virginia, 2002, pp. 398–416.

J.Pei,J.Han, R. Mao, “CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets,” in Proc. 9th SIGMOD Intl. Workshop. Data Mining and Knowledge Discovery, Dallas, 2000, pp. 11–20.

J. Wang, J. Han and J. Pei, “CLOSET+: Searching for the Best Strategies for Mining FrequentClosedItemsets,”in Proc. 12th Intl. Conf. Knowledge Discovery and DataMining, Washington, 2003, pp. 236–245.

G. Grahne and J. Zhu,“Efficientlyusing prefix-trees in mining frequent itemsets,” in Proc. 1st FIMI Workshop. Frequent Itemset Mining Implementations, Florida, 2003.

M. Ben Hadj Hamida, “Patricia-Tree based algorithm to find frequent closeditemsets,”Master.dissertation, Dept. Comp. Sci., Faculty of Sciences of Tunis., Tunis, Tunisia, 2005.

Roy Goldman, Jennifer Widom. DataGuides: enabling query formulation and optimization in semistructured databases[C].Proceedings of the 23rd VLDB Conference, Athens, Greece,1997.

Paul F Dietz. Maintaining order in a linked list[C].Proceedings of the Fourteenth Annual ACM Symposium on Theory of Computing, San Francisco, California, 1982.

Xu Haiyan,Wu Quanyuan,Wang Huaimin, etal. Containment based XML indexing [J]. Acta Electronica Sinica,2003(in Chinese).Lu J H , Wang G R ,Yu G. Optimizing path expression queries of XML data[J]. Journal of Software,2003.

Masami Shishibori,Kazuaki Ando, Makoto Okada and Jun-ichi Aoe. A Key Search Algorithm Using the Compact Patricia Trie. IEEE lntcrnational Conference on Intelligent Processing Systems , China,1997.

Yang Wenfeng , Chen Guangying , LiXing. PATRICIA-tree based Dictionary Mechanism for Chinese Word Segmentation. Journal of Chinese Information Processing, 2000, Vol.15 , No.3(in Chinese).

Alberto O. Mendelzon, Flavio Rizzolo, Alejandro Vaisman. Indexing Temporal XML Documents.Proceedings of the 30th VLDB Conference, Canada, 2004.ss


Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.