Open Access Open Access  Restricted Access Subscription or Fee Access

Efficient Sequential Pattern Mining Based on Pattern Growth

A.S. Ibrahim Sadiq, P.S.M. Imran Khan

Abstract


The goal of this paper is to make efficient sequential pattern mining by increasing the pattern growth. Traditional pattern-growth based approaches for sequential pattern mining allows unidirectional pattern-growth. Here only the suffix of the detected pattern is increased. But UDDAG allows bidirectional pattern-growth along both end of the detected patterns. It makes fewer levels of recursion and faster pattern growth in the detected patterns. Sequential pattern mining derive length-(k + 1) patterns based on the projected databases of length-k patterns recursively, used by traditional pattern growth-based approaches. At each level of recursion, they unidirectionally grow the length of detected patterns by one along the suffix of detected patterns, which needs k levels of recursion to find a length-k pattern. In this paper, a novel data structure, UpDown Directed Acyclic Graph (UDDAG), is invented for efficient sequential pattern mining. UDDAG allows bidirectional pattern growth along both ends of detected patterns. Thus, a length-k pattern can be detected in log2[k+ 1] levels of recursion at best, which results in fewer levels of recursion and faster pattern growth. When minSup is large such that the average pattern length is close to 1, UDDAG and PrefixSpan have similar performance because the problem degrades into frequent item counting problem. However, UDDAG scales up much better. UDDAG is also considerably faster than Spade and LapinSpam. UDDAG uses comparable memory to that of PrefixSpan and less memory than Spade and LapinSpam

Keywords


UpDown Directed Acyclic Graph (UDDAG),GSP,PSP,SPADE,Lapin Spam and MEMISP

Full Text:

PDF

References


Rakesh Agrawal, and Ramakrishnan Srikanth have published the paper,” mining sequential pattern” in the year 1995. They have proposed that Apriorisome, and Aprioriall algorithms to find the maximal sequences.

The paper “Fast algorithm for mining association rules”, is also published by the above two authors in the year 1994. Here, they have produced one new algorithm “AprioriHybrid” by combining Apriori and AprioriTid algorithms.

The authors M.Garofalakis, R.Ratogi, and K.Shim have published the paper “Sequential pattern mining with regular expression constraints (SPIRIT)” in the year 1999. They have proposed that Regular Expression (RE) constraints for mining sequential patterns that also satisfy the user-specified RE constraints.

The paper “Frequent pattern-projected sequential pattern mining (Freespan)” have been published by the authors J.Han, J.Pei,B.Mortazavi-Asl, Q.Chen,U.Dayal, and M.C.Hsu in the year 2000. In this paper, they have proposed that Apriori based GSP algorithm for mining the subsequence patterns and to project those patterns.

The authors J.Ayres, J.Gehrke,T.Yu, and J.Flannick have published the paper “sequential pattern mining using a bitmap presentation” in the year 2002.They have proposed novel depth-first search strategy to produce frequent itemsets.

The authors J.Chen and K.Xiao have published the paper, “A BISC approach towards efficient frequent itemset mining”. They have proposed “FIM” to discover frequent itemsets and “one-stage Bisc and two-stage Bisc” to minimize the cost and memory.

The paper “Efficient Sequential pattern mining based on pattern growth” have been published by the author Jinlin Chen in the year 2010. In this paper, they have proposed that UDDAG algorithm for mining the subsequence patterns and to project those patterns.

M.Y. Lin and S.Y. Lee, "Fast Discovery of Sequential Patterns through Memory Indexing and Database Partitioning," J. Information Science and Eng., vol. 21, pp. 109-128, 2005.

F. Masseglia, F. Cathala, and P. Poncelet, "The PSP Approach for Mining Sequential Patterns," Proc. European Symp. Principle of Data Mining and Knowledge Discovery, pp. 176-184, 1998.

J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal, and M.C. Hsu, "PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth," Proc. 2001 Int'l Conf. Data Eng. (ICDE '01), pp. 215-224, 2001.

J. Pei, J. Han, B. Mortazavi-Asl, J. Wang, H. Pinto, Q. Chen, U.Dayal, and M.C. Hsu, "Mining Sequential Patterns by Pattern- Growth: The PrefixSpan Approach," IEEE Trans. Knowledge and Data Eng., vol. 16, no. 11, pp. 1424-1440, Nov. 2004.

K. Wang, Y. Xu, and J.X. Yu, "Scalable Sequential Pattern Mining for Biological Sequences," Proc. 2004 ACM Int'l Conf. Information and Knowledge Management, pp. 178-187, 2004.

J. Wang, Y. Asanuma, E. Kodama, T. Takata, and J. Li, "Mining Sequential Patterns More Efficiently by Reducing the Cost of Scanning Sequence Databases," IPSJ Trans. Database, vol. 47, no. 12, pp. 3365-3379, 2006.

Z. Zhang and M. Kitsuregawa, "LAPIN-SPAM: An Improved Algorithm for Mining Sequential Pattern," Proc. Int'l Special Workshop Databases for Next Generation Researchers, pp. 8-11, Apr.2005.

Z. Zhang, Y. Wang, and M. Kitsuregawa, "Effective Sequential Pattern Mining Algorithms for Dense Database," Proc. Japanese Nat'l Data Eng. Workshop (DEWS '06), 2006.


Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.