Open Access Open Access  Restricted Access Subscription or Fee Access

Big Data Management and Processing in Cloud Computing Environments

T. Arunambika, P. S. Vijayalakshmi

Abstract


Big data is a new driver of the world economic and societal changes. The world’s data collection is reaching a tipping point for major technological changes that can bring new ways in decision making, managing our health, cities, finance and education. While the data complexities are increasing including data’s volume, variety, velocity and veracity, the real impact hinges on our ability to uncover the `value’ in the data through Big Data Analytics technologies With the rapid growth of emerging applications like social network analysis, semantic Web analysis and bioinformatics network analysis, a variety of data to be processed continues to witness a quick increase. Effective management and analysis of large-scale data poses an interesting but critical challenge. From the view of cloud data management and big data processing mechanisms, we present the key issues of big data processing, including cloud computing platform, cloud architecture, cloud database and data storage scheme. Following the Map Reduce parallel processing framework, we then introduce Map Reduce optimization strategies and applications reported in the literature.

Keywords


Big Data, Cloud Computing, Map Reduce, Hadoop, Distributed Computing.

Full Text:

PDF

References


Douglas and Laney, “The importance of ‘big data’: A definition,” 2008.

D. Borthakur, “The hadoop distributed file system: Architecture and design,” Hadoop Project Website, vol. 11, 2007.

S. Sakr, A. Liu, D. Batista, and M. Alomari, “A survey of large scale data management approaches in cloud environments,” Communications Surveys & Tutorials, IEEEvol. 13, no. 3, pp. 311–336, 2011.

D. Nurmi, R. Wolski, C. Grzegorczyk, G. Obertelli, S. Soman, L. Youseff, and D. Zagorodnov, “The eucalyptus open-source cloud-computing system,” in Cluster Computing and the Grid,2009.CCGRID’09.9thIEEE/ACMInternational Symposium on. IEEE, 2009, pp. 124–131.

Y. Xu, P. Kostamaa, and L. Gao, “Integrating hadoop and parallel dbms,” in Proceedings of the 2010 international conference on Management of data. ACM, 2010, pp. 969– 974.

X. Zhou, J. Lu,C. Li, and X. Du, “Big data challengeinthemanagementperspective,”Communications ofthe CCF, vol. 8, pp. 16–20, 2012.

P. Sempolinski and D. Thain, “A comparison and critiqueof eucalyptus, opennebula and nimbus,” in Cloud Computing Technology and Science (CloudCom), 2010 IEEE Second International Conference on. Ieee, 2010, pp. 417–426.

https://gigaom.com/2013/06/22/netflix-open-sources-its-hadoop-manager-for-aws/

https://aws.amazon.com/cloudfront/?hp=tile&so-exp=below

Y. Lin, D. Agrawal, C. Chen, B. Ooi, and S. Wu, “Llama: leveraging columnar storage for scalable join processing in the mapreduce framework,” in Proceedings of the 2011 international conference on Management of data. ACM, 2011, pp. 961–972

G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels, “Dynamo: amazon’s highly available keyvalue store,” in ACM SIGOPS Operating Systems Review, vol. 41, no. 6. ACM, 2007, pp. 205–220.

D. Nurmi, R. Wolski, C. Grzegorczyk, G. Obertelli, S. Soman, L. Youseff, and D. Zagorodnov, “The eucalyptus open-source cloud-computing system,” in Cluster Computing and the Grid,2009.CCGRID’09.9thIEEE/ACMInternational Symposium on. IEEE, 2009, pp. 124–131.

P. Sempolinski and D. Thain, “A comparison and critique of eucalyptus, opennebula and nimbus,” in Cloud Computing Technology and Science (CloudCom), 2010 IEEE Second International Conference on. Ieee, 2010, pp. 417–426.


Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.