Towards Efficient Distributed Algorithm with Minimum Communication Overhead
Currently, organizations are distributed geographically. Normally, all the sites locally store its day-to-day data, which is being updated. Centralized data mining algorithms can’t be used in such type of organizations for discovering useful patterns as merging of datasets from different sites is not feasible as well as it causes large network communication costs. Data mining in distributed form has emerged as an active sub-domain of data mining research. In distributed association rule mining algorithm, one of the major challenges is to reduce the communication overhead. Data sites are required to exchange lot of information in the data mining process which may generates communication overhead. This report proposes an association rule mining algorithm which minimizes the communication overhead among the participating data sites. Instead of transmitting all itemsets and their counts, The algorithm transmits a binary vector of frequently large itemsets using Message Passing Interface (MPI) technique. Another challenge is to reduce number of database scan and generate the frequent itemsets from the database. Hence an algorithm term as “Efficient Distributed dynamic itemset counting” is proposed. This algorithm reduces the time of scan of partition database which increases the performance of the algorithm.
Md. Golam Kaosar,”Distributed association rule mining with minimum communication overhead” Australasian data mining conference vol. 101 2009.
Preeti Paranjape,”An optimistic messaging distributed algorithm for ARM “IEEE 2009
Surbhi Bhatnagar,”Algorithm for finding association rules in distributed databases”,IEEE,2012
Wenliang Cao,”Research of the mining algorithm based on distributed database”,IEEE,2011
Shi Yue-mei,”A sampling algorithm for mining association rules in distributed database”,IEEE ,2009
Chin-Chen Chang,”An efficient algorithm for incremental mining of association rules,” IEEE2005
.Mafruz Zaman Ashrafi, Monash University,” An Optimized Distributed Association Rule Mining Algorithm” IEEE distributed systems online vol. 5, no. 3; march 2004
.Dr. P. Alli,” Distributed Data Mining in the Grid Environment ,IJEIT2012
Dr (Mrs).Sujni Paul,”An optimized distributed association rule mining algorithm in parallel and distributed data mining with xml data for improved response time” ,international journal of computer science and information technology, 2010
Mohammed J. Zaki,”Parallel and distributed association mining: a survey “, IEEE,1999
Assaf Schuster,”CommunicationEf_cient Distributed Mining of Association Rules”,SIGMOD2001
S. Brin, R. Motwani, J.D. Ullman, and S. Tsur, 1997. “Dynamic itemset counting and implication rules for market basket data”. In Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data, vol. 26(2), pp. 255–264.
Rakesh Agrawal and Ramakrishnan Srikant, 1994. “Fast Algorithms for Mining Association Rules”, In Proceedings of the 20th Int. Conf. Very Large Data Bases, pp. 487-499.
R.Agrawal, T.Imielinski, and A.Swami, 1993. “Mining association rules between sets of items in large databases”, in proceedings of the ACM SIGMOD Int'l Conf. on Management of data, pp. 207-216.
Jiawei Han and Micheline Kamber ,“Data Mining Concepts &Techniiques”,Elsevier,2011
Arun K Pujari , “Data Mining Techniques”, University Press Private Limited 2001 ,pp.42- 46,2009
http://storm.cis.fordham.edu/~gweiss/data-mining/weka data/ supermarket.arff
- There are currently no refbacks.
This work is licensed under a Creative Commons Attribution 3.0 License.