Open Access Open Access  Restricted Access Subscription or Fee Access

A New Algorithm for Parallel Association Rule Mining in Distributed Shared Memory System

Marghny H. Mohamed, Hosam E. Refaat

Abstract


Finding the frequent itemset is the most important problem in Association Rule Mining(ARM) because it is the most time costly step in ARM. In the case of huge number of transaction and items in the database, it is important to investigate efficient distributed algorithm for mining association rules. The efficient distributed algorithm must be scalable, easy partitioned and distributed of centralized database, minimize the calculation and communication. In this paper we present a new algorithm for finding frequent itemset, that is called HVPFI. This algorithm is an extension of our previous work. We then analyze the algorithm and compare it with other published algorithms.

Keywords


Parallel Systems, Distributed Shared Memory, Data Mining, Association Rule, Linda System, Tuple-Space, Jini, Javaspace.

Full Text:

PDF

References


R. Agrawal, T. Imielinski, and A. Swami, Mining association rules between sets of items in large databases., In Proc. of the ACM SIGMOD Conference on Management of Data, pages 207-216, Washington (1993).

R. Agrawal and J. C. Shafer, Parallel mining of association rules, IEEE Transactions On Knowledge And Data Engineering, Volume 8 pages:962-969 (1996).

R. Agrawal and R. Srikant, Fast algorithms for mining association rules in large databases, Proceedings of the 20th International Conference on Very Large Data Bases, pages 487–499 (1994).

L. M. Aouad, N Le-Khac, and T. M. Kechadi, Distributed frequent itemsets mining in heterogeneous platforms, Engineering, Computer and Architecture, Volume 1 (2007).

U. Badawi., A single system image supporting distributed objects, Ph.D. thesis, Dept. of Mathematics, Faculty of Science,Cairo University, Nov. (2000).

D. W. Cheung and Y. Xiao, Effect of data skewness in parallel mining of association rules, Lecture Notes in Computer Science, Volume 1394 (1998).

T. Vincent W. Ada D. Cheung, H. Jiawei and Y. Yongjian, A fast distributed algorithm for mining association rules., 4th Intl. Conf. Parallel and Distributed Info. Systems, (1996.).

M. Hahsler, G. Bettina, K. Hornik, and C. Buchta, Introduction to arules a computational environment for mining association rules and frequent item sets, (2010).

inca X, Inca x(tm) community edition, available from Incax WWW Site (http://www.incax.com/download.com), (2007).

H. Jiawei and M. Kamber, Data mining: Concepts and techniques, second edition (the morgan kaufmann series in data management systems), 2 ed., vol. 2, Morgan Kaufmann; 2 edition, November (2005).

S. Kotsiantis and D. Kanellopoulos, Association rules mining: A recent overview, GESTS International Transactions on Computer Science and Engineering, Volume 32 Pages:71–82 (2006).

T. G. Mattson, Programming environments for parallel and distributed computing: A comparison of p4, pvm, linda and tcgms-g., ftp Server, ftp.cs.yale.edu (1995).

Sun Microsystems., Java development kit, vol. 1.4.2 04, available from Sun Microsystems WWW Site (http://www.sun.com/products/jdk), (2004).

Sun Microsystems, Javaspaces specification, vol. 2.0.2, available from Sun Microsystems WWW Site (http://java.sun.com/products/javaspaces), Jun (2008).

Sun Microsystems, Jini architecture specification, vol. v2.0.2, available from Sun Microsystems WWW Site (http://www.sun.com/jini/), jun (2008).

Sun Microsystems, Jini technology core platform specification., vol. v2.0.2, available from Sun Microsystems WWW Site (http://www.sun.com/jini/), jun (2008).

H. E. Refaat, New mechanism to integrate fault tolerance in a distributed shared memory based system, Computer science, Cairo Uni, (2007).

A. Schuster and R. Wolff, Communication-efficient distributed mining of association rules, ACM SIGMOD Int’l. Conference on Management of Data, Santa Barbara, California, pp. 473-484. (2001).

P. Tang and M. Turkia, Parallelizing frequent item-set mining with fp-trees., Technical Report titus.compsci.ualr.edu/ ptang/papers/par-fi.pdf, Department of Computer Science,University of Arkansas at Little Rock (2005).

M. Tomasevic, J. Protic, and V. Milutinovic., Distributed shared memory: Concepts and systems., IEEE Parallel and Distributed technology, 4(2):63-79 (1996).

D. YaJun and L. HaiMing, Strategy for mining association rules for web pages based on formal concept analysis, Appl. Soft Com-put. volume 10 pages:772–783 (2010).

M. J. Zaki, Parallel and distributed association mining: A survey, IEEE Concurrency 7, 14–25 (1999).

M. J. Zaki, S. Parthasarathy, and L. Wei, A localized algorithm for parallel association mining, In Proceedings of the 9th Annual ACM Symposium on Parallel Algorithms and Architectures, pp. 321–330, (1997).

P. Shenoy, J. R. Haritsa, S. Sundarshan, G. Bhalotia , M. Bawa, and D. Shah. Turbo-charging vertical mining of large databases. In Proceedings of 2000 ACM SIGMOD International Conference on Management of Data, pages 22–33, (2000).

H. Marghny, and H. Refaat. Hori-Vertical Distributed Frequent Itemsets Mining Algorithm on Heterogeneous Distributed Shared Memory System. IJCSNS International Journal of Computer Science and Network Security, VOL.10 No.11 , pages 56–62 , Nov (2010).

LI Pingxiang, CHEN Jiangping and BIAN Fuling. A Developed Algorithm of Apriori Based on Association Analysis. Geo-Spatial Information Science. VOL 7. Issue 2. June (2004).

H. Marghny, and H. Refaat. A fast Parallel Association Rule Mining Algorithm Based on the Probability of Frequent Itemsets. IJCSNS International Journal of Computer Science and Network Security, VOL.11 No.5, pages 152-162, Nov (2011).


Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.