### Cube Computation in Distributed Environment using CM-Sketch Algorithm

#### Abstract

Now a day’s large amount of data is generated in structured and unstructured format. The volume of data stored is very large and huge in terms of terabyte, petabyte n sometime in zettabyte. So to analyses such huge data there is need to improve traditional RDBMS techniques. Data cube is commonly used operation in large amount of data that stores huge volume of data, analyzed and find out hidden information from data.

This paper addresses number of issues of constructing cubes for massive amount data. CM-Sketch algorithm used to partition data across nodes for cube construction. CM sketch algorithm performs ordering of dimension that minimizes the computation time of cube. After partitioning, cube generation algorithm used for cube construction over node. Experimental results from implementation of algorithm shows its effectiveness. For analysis of experimental result we consider parameters like cube construction with varying data size, parallelism and number hierarchies of dimension for cube construction.

#### Keywords

#### Full Text:

PDF#### References

Arnab Nandi, Cong Yu, Philip Bohannon, and Raghu Ramakrishnan “Data Cube Materialization and Mining over MapReduce” IEEE transaction on Knowledge and Data Engineering, vol. 24, no. 10, Oct 2012.

S. Agarwal, R. Agrawal, P. Deshpande, A. Gupta, J. Naughton, R.Ramakrishnan,and S. Sarawagi, "On the Computation of multidimensional Aggregates," Proc.22nd Int’l Conf. Very Large Data Bases (VLDB), 1996.

Y. Zhao, P. M. Deshpande, and J. F. Naughton.”An array-based algorithm for simultaneous multidimensional aggregates”.In SIGMOD'97.

K. Ross and D. Srivastava, "Fast Computation of Sparse Datacubes," Proc. 23rd Int'l Conf. Very Large Data Bases (VLDB), 1997.

R.T. Ng, A.S.Wagner, and Y. Yin, "Iceberg-Cube Computation with PC Clusters," Proc. ACM SIGMOD Int’l Conf. Management of Data, 2001.

D. Xin, J. Han, X. Li, and B. W. Wah. Starcubing: Computing iceberg cubes by top-down and bottom-up integration. In VLDB'03.

A. Nandi, C. Yu, P. Bohannon, and R. Ramakrishnan, “Distributed Cube Materialization on Holistic Measures,” Proc. IEEE 27th Int’l Conf. Data Eng. (ICDE), 2011.

Lei chen, Christopher Olston , Raghu Ramkrishnan, “ Parallel Evalution of composite aggregate queries”, ACM, 2008

FangboTao , Kin Hou Lei, “EventCube: Multi-Dimentional search and mining of structured and Text data” , ACM 978-I-4503-2174-7, 2013

Yixin Chen, JiaweiHan,“Stream Cube: An Architecture for Multi-Dimensional Analysis of Data Streams” Springer Science + Business Media, Inc. Manufactured in The Netherlands, Distributed and Parallel Databases , 2005.

J. Gray, S. Chaudhuri, A. Bosworth, A. Layman, D. Reichart, M. Venkatrao, F.Pellow, and H. Pirahesh, "Data Cube: A Relational Operator Generalizing Group-By, Cross-Tab and Sub-Totals," Proc. 12th Int’l Conf. Data Eng. (ICDE), 1996.

Cuzzocrea A., Song .I and Davis, “ Analytics over large scale Multidimension data : The big data revolution ! ”, Proc of ACM DOLAP,2011

G. Cormode and S. Muthukrishnan.The CM Sketch and its Applications.Journal of Algorithms, 2005.

Graham Cormode S. Muthukrishnan “Approximating Data with the Count-Min Data Structure” August 12, 2011

R. Jin, K. Vaidyanathan, G. Yang, and G. Agrawal. “Communication & Memory Optimal Parallel Datacube Construction” .Parallel and Distributed Systems, 2005.

C. Olston, B. Reed, U. Srivastava, et al. “Pig Latin: A not -so-foreign language for data processing”. SIGMOD , 2008

White, Tom (June 16, 2009). Hadoop: The Definitive Guide (1st ed.). O'Reilly Media.p. 524. ISBN 0-596-52197-9.

Running Hadoop on Ubuntu Linux (Single-Node Cluster)".Retrieved 6 June 2013.

George, Lars (20 September 2011). HBase: The Definitive Guide (1st ed.). O'Reilly Media.p. 556. ISBN 978-1449396107.

### Refbacks

- There are currently no refbacks.

This work is licensed under a Creative Commons Attribution 3.0 License.