Running Time Analysis of Graph Theoretic Based Algorithm on Sparse and Dense Datasets

R.S. Thakur Thakur; Divakar Singh

Running Time Analysis of Graph Theoretic Based Algorithm on Sparse and Dense Datasets

R.S. Thakur Thakur, Divakar Singh

Abstract

The challenging issues of frequent pattern mining algorithms are multiple scans of databases and generation of huge number of candidates. The behavior of frequent pattern mining algorithms varies database to database. If database is dense, it generates more number of candidates and needs more number of scans. If database is sparse, it reduces candidate generation and number of scans at each subsequent pass. To reduce the number of database scans and candidates generation the Graph Theoretic based algorithm has been discussed recently. The prominent feature of this method is it requires only single scan of the database for finding frequent patterns. The running time behavior of Graph Theoretic based algorithm with sparse and dense databases is presented in this paper. This algorithm works efficiently with the sparse datasets but it is more CPU-bound for dense datasets. The performance analysis is done using synthetic datasets T10.I4.D100K, T40.I10.D100K and real datasets Mushrooms, Kosarak.

Keywords

CPU-Bound, Dense Datasets, Directed Graph, Frequent Patterns Mining, I/O-Bound, Sparse Datasets

Full Text:

PDF

References

R. Agrawal, T. Imielinski, and A. Swami, Mining association rules between sets of items in large databases. In Proc. SIGMOD, pages 207-216, May1993.

R. Agrawal and R. Srikant, Fast algorithms for mining association rules in large databases. In Proc. 20th VLDB, pages 478-499, Sept. 1994.

R. Agrawal and J. Shafer, Parallel mining of association rules. IEEE Trans. on Knowledge and Data Engineering, pages 487-499, Jan. 1996.

S. Brin, R. Motwani, J. Ullman and S. Tsur, Dynamic itemset counting and implication rules for market basket data. In Proc. SIGMOD, pages 255-264, May 1997.

G. Grahne, L. Lakshmanan, and X. Wang, Efficient mining of constrained correlated sets. In Proc. 2000 Int. Conf. Data Engineering (ICDE’00), San Diego, CA, pages 512–521, Feb. 2000.

J. Han and Y. Fu, Discovery of multiple-level association rules from large databases. In Proc. 21st VLDB, Sept. pages: 420-431, 1995.

M. Klemettinen, H. Mannila, P. Ronkainen, H. Toivonen and A. I. Verkamo, Finding interesting rules from large sets of discovered association rules. In Proc. 3rd Int. Conf. Information and Knowledge Management, Gaithersburg, Maryland, pages 401–408, Nov. 1994.

B. Lent, A. Swami and J. Widom, Clustering association rules. In Proc. 1997 Int. Conf. Data Engineering (ICDE’97), Birmingham, England, pages 220–231, April 1997.

H. Mannila and H. Toivonen, Levelwise search and borders of theories in knowledge discovery. Technical Report TR C-1997-8, Dept. of Computer Science, U. of Helsinki, pages 241-258, Jan. 1997.

H. Mannila, H. Toivonen and A. Verkamo, Improved methods for finding association rules. In Proc. AAAI Workshop on Knowledge Discovery, pages 181-192, July 1994.

R. Ng, L. V. S. Lakshmanan, J. Han and A. Pang, Exploratory mining and pruning optimizations of constrained association rules. In Proc. 1998 ACMSIGMOD Int. Conf. Management of Data (SIGMOD’98), Seattle, WA, pages 13–24, June 1998.

B. Ozden, S. Ramaswamy and A. Silberschatz, Cyclic Association Rules. In Proc. 14th International Conference on Data Engineering (ICDE), pages 412-421, Feb.1998.

J. Park, M. Chen and P. Yu, An effective hash-based algorithm for mining association rules. In Proc. ACM-SIGMOD, pages 175-186, May 1995.

A. Sarasere, E. Omiecinsky and S. Navathe, An efficient algorithm for mining association rules in large databases. In Proc. 21st VLDB, pages 432-444, Sept. 1995.

S. Sarawagi, S. Thomas and R. Agrawal, Integrating association rule mining with relational database systems: Alternatives and implications. In Proc. 1998 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD’98), Seattle, WA, pages 343–354, June 1998.

R. Srikant, Q. Vu and R. Agrawal, Mining association rules with item constraints. In Proc. 1997 Int. Conf. Knowledge Discovery and Data Mining (KDD’97), Newport Beach, CA, pages 67–73, Aug. 1997.

R. J. Bayardo, Efficiently mining long patterns from databases. Proc. SIGMOD conf., pages 85-93, June 1998.

D. Lin and Z. M. Kedem, Pincer-Search: A new algorithm for discovering the maximum frequent pattern. Proc. EBDT conf., pages 105-119, March 1998.

M. J. Zaki, S. Parthasarathy, M. Ogihara and W. Li, New algorithms for fast discovery of association rules. In Proc. 3rd International Conference on Knowledge Discovery and Data Mining (KDD), pages 283-296, Aug. 1997.

J. Han, J. Pei and Y. Yin, Mining frequent patterns without candidate generation. In SIGMOD’00, pages 1–12, 2000.

R. Agarwal, C. Aggarwal and V. V. V. Prasad, A tree projection algorithm for generation of frequent itemsets. In Journal of Parallel and Distributed Computing Vol. 61(3):350-371, 2001.

J. Han, J. Wang, Y. Lu and P. Tzvetkov, Mining top-k frequent closed patterns without minimum support. In Proceedings of ICDM’02, pages 211–218, Dec. 2002.

J. Pei, J. Han, H. Lu, S. Nishio, S. Tang and D. Yang, H-Mine: Hyper-structure Mining of Frequent Patterns in Large Databases, Proc. IEEE Int’1 Conference on Data Mining, pages 441, 2001.

J. Pei, J. Han and R. Mao, CLOSET: An efficient algorithm for mining frequent closed itemsets. In ACM SIGMOD’00 Workshop on Research Issues in Data Mining and Knowledge Discovery, pages 21–30, 2000.

J. Wang, J. Han, and J. Pei, Closet+: Searching for the best strategies for mining frequent closed itemsets. In Proceedings of ACM SIGKDD’03, Washington, DC, pages 236-245, 2003.

R.S. Thakur, R.C. Jain, K.R. Pardasani, Graph Theoretic Based Algorithm for Mining Frequent Patterns. In Proc. IEEE World Congress on Computational Intelligence, Hong Kong, pages 629-633, June 2008.

N. Deo, “Graph Theory with Applications to Engineering and Computer science”, PHI New Delhi 2006.

http://www.ics.uci.edu/ mlearn/MLRepository.html.

http://fimi.cs.helsinki.fi/data

R. Agrawal, A. Arning, T. Bollinger, M. Mehta, J. Shafer and R. Srikant, The Quest Data Mining System. In Proc. 2nd International Conference on Knowledge Discovery in Databases and Data Mining (KDD), Aug. 1996.

Refbacks

There are currently no refbacks.

This work is licensed under a Creative Commons Attribution 3.0 License.

Username
Password
Remember me