ESWCA: An Efficient Algorithm for Mining Frequent Itemsets

K. Jothimani; Dr. Antony Selvadoss Thanamani

ESWCA: An Efficient Algorithm for Mining Frequent Itemsets

K. Jothimani, Dr. Antony Selvadoss Thanamani

Abstract

The most significant tasks in data mining are the process of mining frequent itemsets over data streams. It should support the flexible trade-off between processing time and mining accuracy. The objective was to propose an effective algorithm which generates frequent itemsets in a very less time by avoiding multiple scans. In this paper, we present an improved algorithm ESWCA for mining frequent itemsets using sliding window model. The ESWCA algorithm processes on an on-line transactional data stream. In this approach, we handle continues transaction slides in a segment-based manner which produces the improved runtime and memory consumption. Also, by revising the fair-cutter in the novel algorithm, multiple scans of the entire datasets will be avoided. Our experiments show that our algorithm not only achieved effectively consumes less memory, but also runs in an efficient manner.

Keywords

Data Stream, Data-Stream Mining, Frequent Itemset, and Sliding Window

Full Text:

PDF

References

M.N. Garofalakis, J. Gehrke, & R. Rastogi, Querying and mining data streams: you only get one look (A Tutorial), Proc. 2002 ACM SIGMOD Conf. on Management of Data, Madison, Wisconsin, 2002, p. 635.

Y. Zhu & D. Shasha, StatStream: statistical monitoring of thousands of data streams in real time, Proc. 28th Conf. on Very Large Data Bases, Hong Kong, China, 2002, pp. 358–369.

G.S. Manku & R. Motwani, Approximate frequency counts over data streams, Proc. 28th Conf. on Very Large Data Bases, Hong Kong, China, 2002, pp. 346–357.

J.H. Chang & W.S. Lee, A sliding window method for finding recently frequent itemsets over online data streams, Journal of Information science and Engineering, 20(4), 2004, pp. 753–762.

J. Cheng, Y. Ke, & W. Ng, Maintaining frequent itemsets over high-speed data streams, Proc. 10th Pacific-Asia Conf. on Knowledge Discovery and Data Mining, Singapore, 2006, pp.462–467.

C.K.-S. Leung & Q.I. Khan, DSTree: a tree structure for the mining of frequent sets from data streams,” Proc. 6th IEEE Conf. on Data Mining, Hong Kong, China, 2006, pp. 928–932.

B. Mozafari, H. Thakkar, & C. Zaniolo, Verifying and mining frequent patterns from large windows over data streams, Proc. 24th Conf. on Data Engineering, Mexico, 2008, pp. 179–188.

K.-F. Jea & C.-W. Li, Discovering frequent itemsets over transactional data streams through an efficient and stable approximate approach, Expert Systems with Applications, 36(10), 2009, pp. 12323–12331.

F. Bodon, A fast APRIORI implementation, Proc. ICDM Workshop on Frequent Itemset Mining Implementations (FIMI’03), 2003.

Kuen-Fang Jea and Chao-Wei Li ,A Sliding-window Based Adaptive Approximating Method to Discover Recent Frequent Itemsets from Data Streams, Proceedings of the International Multi Conference of Engineers and Computer Scientists 2009,Hong Kong Frequent Itemset Mining Implementations Repository (FIMI). Available: http://fimi.cs.helsinki.fi/

Y. Chi, H. Wang, P.S. Yu, & R.R. Muntz, Moment: maintaining closed frequent itemsets over a stream sliding window, Proc. 4th IEEE Conf. on Data Mining, Brighton, UK,2004, pp. 59–66.

N. Jiang & L. Gruenwald, CFI-Stream: mining closed frequent itemsets in data streams, Proc. 12th ACM SIGKDD Conf. on Knowledge Discovery and Data Mining, Philadelphia, PA,USA, 2006, pp. 592–597.

Frequent Itemset Mining Implementations Repository (FIMI). Available: http://fimi.cs.helsinki.fi/

Quest Data Mining Synthetic Data Generation Code. Available:http://www.almaden.ibm.com/cs/projects/iis/hdb/Projects/data_mining/datasets/syndata.html

H.F Li, S.Y. Lee, M.K. Shan, “An Efficient Algorithm for Mining Frequent Itemsets over the Entire History of Data Streams”, In Proceedings of First International Workshop on Knowledge Discovery in Data Streams 9IWKDDS, 2004.

H.F Li, S.Y. Lee, M.K. Shan, “Online Mining (Recently) Maximal Frequent Itemsets over Data Streams”, In Proceedings of the 15th IEEE International Workshop on Research Issues on Data Engineering (RIDE), 2005.

P. Indyk, D. Woodruff, “Optimal approximations of the frequency moments of data streams”, Proceedings of the thirty-seventh annual ACM symposium on Theory of computing, pp.202–208, 2005.

L. K. Lee, H. F. Ting, “Frequency counting and aggregation: A simpler and more efficient deterministic scheme for finding frequent items over sliding windows”, Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems (PODS ’06), pp. 290–297, 2006.

Refbacks

There are currently no refbacks.

This work is licensed under a Creative Commons Attribution 3.0 License.

Username
Password
Remember me