A General Model for Sequential Pattern Mining with a Progressive Database

K. Venkatesh Sharma; K. Hanumantha Rao; A .Shiva Kumar

A General Model for Sequential Pattern Mining with a Progressive Database

K. Venkatesh Sharma, K. Hanumantha Rao, A .Shiva Kumar

Abstract

Although there have been many recent studies on the mining of sequential patterns in a static database and in a database with increasing data, these works, in general, do not fully explore the effect of deleting old data from the sequences in the database. When sequential patterns are generated, the newly arriving patterns may not be identified as frequent sequential patterns due to the existence of old data and sequences. Even worse, the obsolete sequential patterns that are not frequent recently may stay in the reported results. In practice, users are usually more interested in the recent data than the old ones. To capture the dynamic nature of data addition and deletion, we propose a general model of sequential pattern mining with a progressive database while the data in the database may be static, inserted, or deleted. In addition, we present a progressive algorithmPisa, which stands for Progressive mIning of Sequential pAtterns, to progressively discover sequential patterns in defined time period of interest (POI). The POI is a sliding window continuously advancing as the time goes by.Pisautilizes a progressive sequential tree to efficiently maintain the latest data sequences, discover the complete set of up-to-date sequential patterns, and delete obsolete data and patterns accordingly. The height of the sequential pattern tree proposed is bounded by the length of POI, thereby effectively limiting the memory space required byPisathat is significantly smaller than the memory needed by the alternative method, Direct Appending (DirApp). Note that the sequential pattern mining with a static database and with an incremental database are special cases of the progressive sequential pattern mining. By changing Start time and End time of the POI,Pisacan easily deal with a static database or an incremental database as well. Complexity of algorithms proposed is analyzed. The experimental results show thatPisanot only significantly outperforms the prior methods in execution time by orders of magnitude but also possesses graceful scalability.

Keywords

Progressive Sequential Pattern

Full Text:

PDF

References

R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules,” Proc. 20th Int’l Conf. Very Large Data Bases (VLDB ’94), pp. 478-499, Sept. 1994.

R. Agrawal and R. Srikant, “Mining Sequential Patterns,” Proc. 11th Int’l Conf. Data Eng. (ICDE ’95), pp. 3-14, Feb. 1995.

S. Aseervatham, A. Osmani, and E. Viennet, “Bitspade: A Lattice-Based Sequential Pattern Mining Algorithm Using Bitmap Representation,” Proc. Sixth Int’l Conf. Data Mining (ICDM), 2006.

J. Ayres, J. Gehrke, T. Yiu, and J. Flannick, “Sequential Pattern Mining Using a Bitmap Representation,” Proc. Eighth ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining (KDD ’02), pp. 429-435, July 2002.

A. Balachandran, G.M. Voelker, P. Bahl, and P.V. Rangan, “Characterizing User Behavior and Network Performance in a Public Wireless LAN,” Proc. ACM SIGMETRICS Int’l Conf. Measurement and Modeling of Computer Systems (SIGMETRICS ’02), June 2002.

H. Cao, N. Mamoulis, and D.W. Cheung, “Mining Frequent Spatio-Temporal Sequential Patterns,” Proc. Fifth Int’l Conf. Data Mining (ICDM ’05), pp. 82-89, Nov. 2005.

G. Chen, X. Wu, and X. Zhu, “Sequential Pattern Mining in Multiple Streams,” Proc. Fifth Int’l Conf. Data Mining (ICDM ’05), pp. 585-588, Nov. 2005.

M.-S. Chen, J. Han, and P.S. Yu, “Data Mining: An Overview from Database Perspective,” IEEE Trans. Knowledge and Data Eng., vol. 8, no. 6, pp. 866-883, Dec. 1996.

M.-S. Chen, J.-S. Park, and P.S. Yu, “Efficient Data Mining for Path Traversal Patterns,” IEEE Trans. Knowledge and Data Eng., vol. 10, no. 2, pp. 209-221, Mar./Apr. 1998.

H. Cheng, X. Yan, and J. Han, “INCSPAN: Incremental Mining of Sequential Patterns in Large Database,” Proc. 10th ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining (KDD ’04), pp. 527-532, 2004.

S. Cong, J. Han, and D. Padua, “Parallel Mining of Closed Sequential Patterns,” Proc. 11th ACM SIGKDD Int’l Conf. Knowl¬ edge Discovery and Data Mining (KDD ’05), pp. 562-567, Aug. 2005.

U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurasamy, Advances in Knowledge Discovery and Data Mining. MIT Press, 1996.

M.N. Garofalakis, R. Rastogi, and K. Shim, “Spirit: Sequential Pattern Mining with Regular Expression Constraints,” Proc. 25th Int’l Conf. Very Large Data Bases (VLDB ’99), pp. 223-234, 1999.

J.-K. Guo, B.-J. Ruan, and Y.-Y. Zhu, “A Top-Down Algorithm for Web Log Sequential Pattern Mining,” Proc. Ninth Pacific-Asia Conf. Knowledge Discovery and Data Mining (PAKDD), 2005.

J. Han and M. Kamber, Data Mining: Concepts and Techniques. Morgan Kaufmann, 2000.

J. Han, J. Pei, B. Mortazavi-Asl, Q. Chen, U. Dayal, and M.C. Hsu, “Freespan: Frequent Pattern-Projected Sequential Pattern Mining,” Proc. Sixth ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining (KDD ’00), pp. 355 359, 2000.

Y. Hirate and H. Yamana, “Sequential Pattern Mining with Time Interval,” Proc. 10th Pacific-Asia Conf. Knowledge Discovery and Data Mining (PAKDD ’06), pp. 775-779, 2006.

C.-C. Ho, H.-F. Li, F.-F. Kuo, and S.-Y. Lee, “Incremental Mining of Sequential Patterns over a Stream Sliding Window,” Proc. IEEE Int’l Workshop Mining Evolving and Streaming Data (IWMESD ’06), Dec. 2006.

KDD CUP 2007, http://www.cs.uic.edu/~liub/Netflix KDD-Cup-2007.html, 2007.

C.-R. Lin, C.-H. Yun, and M.-S. Chen, “Utilizing Slice Scan and Selective Hash for Episode Mining,” Proc. Seventh ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining (KDD ’01), Aug. 2001.

M.-Y. Lin and S.-Y. Lee, “Incremental Update on Sequential Patterns in Large Databases by Implicit Merging and Efficient Counting,” Information System, vol. 29, no. 5, pp. 385-404, July 2004.

C. Luo and S.M. Chung, “Efficient Mining of Maximal Sequential Patterns Using Multiple Samples,” Proc. Fifth SIAM Int’l Conf. Data Mining (SDM), 2005.

H. Mannila, H. Toivonen, and A.I. Verkamo, “Discovery of Frequent Episodes in Event Sequences,” Data Mining and Knowl-edge Discovery, vol. 1, no. 3, pp. 259-289, 1997.

A. Marascu and F. Masseglia, “Mining Sequential Patterns from Temporal Streaming Data,” Proc. First ECML/PKDD Workshop Mining Spatio-Temporal Data (MSTD ’05), Oct. 2005.

A. Marascu and F. Masseglia, “Mining Sequential Patterns from Data Streams: A Centroid Approach,” J. Intelligent Information Systems, vol. 27, no. 3, pp. 291-307, Nov. 2006.

F. Masseglia, P. Poncelet, and M. Teisseire, “Incremental Mining of Sequential Patterns in Large Databases,” Data and Knowledge Eng., vol. 46, pp. 97-121, July 2003.

S. Nguyen, X. Sun, and M. Orlowska, “Improvements of INCSPAN: Incremental Mining of Sequential Patterns in Large Database,” Proc. Ninth Pacific-Asia Conf. Knowledge Discovery and Data Mining (PAKDD), 2005.

J.-Z. Ouh, P. Wu, and M.-S. Chen, “Constrained Based Sequential Pattern Mining,” Proc. Int’l Workshop Web Technology, Dec. 2001.

A.B. Pandey, J. Srivastava, and S. Shekhar, “Web Proxy Server with Intelligent Prefetcher for Dynamic Pages Using Association Rules,” Technical Report 01-004, Univ. of Minnesota, Jan. 2001.

A.B. Pandey, R.R. Vatsavai, X. Ma, J. Srivastava, and S. Shekhar, “Data Mining for Intelligent Web Prefetching,” Proc. Workshop Mining Data Across Multiple Customer Touchpoints for CRM (MDCRM ’02), May 2002.

S. Parthasarathy, M.J. Zaki, M. Ogihara, and S. Dwarkadas, “Incremental and Interactive Sequence Mining,” Proc. Eighth ACM Int’l Conf. Information and Knowledge Management (CIKM ’99), pp. 251-258, 1999.

J. Pei, J. Han, H. Pinto, Q. Chen, U. Dayal, and M.C. Hsu, “Prefixspan: Mining Sequential Patterns Efficiently by Prefix Projected Pattern Growth,” Proc. 17th Int’l Conf. Data Eng. (ICDE ’01), pp. 215-224, 2001.

J. Pei, J. Han, and W. Wang, “Mining Sequential Patterns with Constraints in Large Databases,” Proc. 11th ACM Int’l Conf. Information and Knowledge Management (CIKM), 2002.

C. Romero, S. Ventura, J.A. Delgado, and P.D. Bra, “Personalized Links Recommendation Based on Data Mining. In Adaptive Educational Hypermedia Systems,” Proc. Second European Conf. Technology Enhanced Learning (EC-TEL ’07), Sept. 2007.

R. Srikant and R. Agrawal, “Mining Sequential Patterns: General-izations and Performance Improvements,” Proc. Fifth Int’l Conf. Extending Database Technology (EDBT ’96), Mar. 1996.

J. Wang and J. Han, “Bide: Efficient Mining of Frequent Closed Sequences,” Proc. 20th Int’l Conf. Data Eng. (ICDE ’04), pp. 79-91, 2004.

K. Wang, Y. Xu, and J.X. Yu, “Scalable Sequential Pattern Mining for Biological Sequences,” Proc. 13th ACM Int’l Conf. Information and Knowledge Management (CIKM ’04), pp. 178 187, Nov. 2004.

X. Yan, J. Han, and R. Afshar, “Clospan: Mining Closed Sequential Patterns in Large Datasets,” Proc. Third SIAM Int’l Conf. Data Mining (SDM ’03), pp. 166-177, May 2003.

Refbacks

There are currently no refbacks.

This work is licensed under a Creative Commons Attribution 3.0 License.

Username
Password
Remember me