Open Access Open Access  Restricted Access Subscription or Fee Access

Towards Efficient Distributed Algorithm with Minimum Communication Overhead

Anil Pandya, Sahista Machchhar, Glory Shah

Abstract


Currently, organizations are distributed geographically. Normally, all the sites locally store its day-to-day data, which is being updated. Centralized data mining algorithms can’t be used in such type of organizations for discovering useful patterns as merging of datasets from different sites is not feasible as well as it causes large network communication costs. Data mining in distributed form has emerged as an active sub-domain of data mining research. In distributed association rule mining algorithm, one of the major challenges is to reduce the communication overhead. Data sites are required to exchange lot of information in the data mining process which may generates communication overhead. This report proposes an association rule mining algorithm which minimizes the communication overhead among the participating data sites. Instead of transmitting all itemsets and their counts, The algorithm transmits a binary vector of frequently large itemsets using Message Passing Interface (MPI) technique. Another challenge is to reduce number of database scan and generate the frequent itemsets from the database. Hence an algorithm term as “Efficient Distributed dynamic itemset counting” is proposed. This algorithm reduces the time of scan of partition database which increases the performance of the algorithm.

 


Keywords


Association Rules, Distributed Environment, Minimum Communication Cost, Dynamic Itemset Counting, Frequent Pattern Growth, Support and Confidence.

Full Text:

PDF

References


Md. Golam Kaosar,”Distributed association rule mining with minimum communication overhead” Australasian data mining conference vol. 101 2009.

Preeti Paranjape,”An optimistic messaging distributed algorithm for ARM “IEEE 2009

Surbhi Bhatnagar,”Algorithm for finding association rules in distributed databases”,IEEE,2012

Wenliang Cao,”Research of the mining algorithm based on distributed database”,IEEE,2011

Shi Yue-mei,”A sampling algorithm for mining association rules in distributed database”,IEEE ,2009

Chin-Chen Chang,”An efficient algorithm for incremental mining of association rules,” IEEE2005

.Mafruz Zaman Ashrafi, Monash University,” An Optimized Distributed Association Rule Mining Algorithm” IEEE distributed systems online vol. 5, no. 3; march 2004

.Dr. P. Alli,” Distributed Data Mining in the Grid Environment ,IJEIT2012

Dr (Mrs).Sujni Paul,”An optimized distributed association rule mining algorithm in parallel and distributed data mining with xml data for improved response time” ,international journal of computer science and information technology, 2010

Mohammed J. Zaki,”Parallel and distributed association mining: a survey “, IEEE,1999

Assaf Schuster,”Communication­Ef_cient Distributed Mining of Association Rules”,SIGMOD2001

S. Brin, R. Motwani, J.D. Ullman, and S. Tsur, 1997. “Dynamic itemset counting and implication rules for market basket data”. In Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data, vol. 26(2), pp. 255–264.

Rakesh Agrawal and Ramakrishnan Srikant, 1994. “Fast Algorithms for Mining Association Rules”, In Proceedings of the 20th Int. Conf. Very Large Data Bases, pp. 487-499.

R.Agrawal, T.Imielinski, and A.Swami, 1993. “Mining association rules between sets of items in large databases”, in proceedings of the ACM SIGMOD Int'l Conf. on Management of data, pp. 207-216.

Jiawei Han and Micheline Kamber ,“Data Mining Concepts &Techniiques”,Elsevier,2011

Arun K Pujari , “Data Mining Techniques”, University Press Private Limited 2001 ,pp.42- 46,2009

http://archive.ics.uci.edu/ml/datasets/SPECT+Heart

http://storm.cis.fordham.edu/~gweiss/data-mining/weka data/ supermarket.arff


Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.