Open Access Open Access  Restricted Access Subscription or Fee Access

Big Data Using Map Reduce and Hadoop Expertise

M. MohamedFaizulHuusian, R. Aktharunisa Begum

Abstract


Today, the data is not only produced by people, but massive data is generated by machines also and it betters human generated data[1]. This data is spread across different places, in diverse formats, in large volumes ranging from Gigabytes to Terabytes, Petabytes, and exabytes. In unlike areas of expertise, data is being generated at different speeds. A few examples include stock exchange data, chirrups on Twitter, status updates/likes/shares on Facebook, data from sensors, images from medical devices, surveillance videos, satellites data and many others. "Big Data" refers to a collection of massive volume of heterogeneous data that is being generated, often at high speed, from different sources. Traditional data management and analysis systems fall short of tools to analyze these data thus there is a need of innovative set of tools and frameworks to capture, process and manage these data within a tolerable elapsed time. Thus the concept of Big data is catching popularity faster than anything else in this technological era. Big Data demand cost-effective, fault tolerant, scalable and flexible and innovative forms of information processing for decision making[2]. This paper emphasis on the features, architectures, and functionalities of Big data, Hadoop, Map Reduce, HDFS.


Keywords


Big Data, Hadoop, Zettabyte HDFS, MapReduce, Apache.

Full Text:

PDF

References


Xindong Wu, Fellow, IEEE, Xingquan Zhu, Gong-Qing Wu, and Wei Ding,”Data Mining with Big Data” ,IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 26, NO. 1, JANUARY 2014

ShitalSuryawanshi, Prof. V.S.Wadne, “Big Data Mining using Map Reduce: A Survey Paper”, IOSR Journal of Computer Engineering (IOSR-JCE) e-ISSN: 2278-0661,p-/ISSN: 2278-8727, Volume 16, Issue 6, Ver. VII (Nov – Dec. 2014), PP 37-40 www.iosrjournals.org

Singh and Reddy, ”A Survey on platforms for big data Analytics” Journal of Big Data 2014, 1:8

J Dean, S Ghemawat, “MapReduce: simplified data processing on large clusters”, Communications of the ACM, 2008 – dl.acm.org.

SumanArora, Dr.MadhuGoel, “Survey Paper on Scheduling in Hadoop”, International Journal of Advanced Research in Computer Science and Software Engineering, Volume 4, Issue 5, May 2014

http://hadoop.apache.org/

Ms.VibhavariChavan, Prof. Rajesh. N. Phursule. “Survey paper on Big Data”, International Journal of Computer Science and Information Technologies, Vol. 5 (6) , 2014, 7932-7939

https://www.dezyre.com

http://wikibon.org/blog/taming-big-data

Changqing Ji, Yu Li, Wenming Qiu, Uchechukwu Awada, Keqiu Li, Big Data Processing in Cloud Computing Environments, 2012 International Symposium on Pervasive Systems, Algorithms and Networks.

Kyuseok Shim, MapReduce Algorithms for Big Data Analysis.

Jeffrey Dean and Sanjay Ghemawat, MapReduce: Simplified Data Processing on Large Clusters.

Apache Giraph Project, http://giraph.apache.org/

Guoping Wang and CheeYong Chan, MultiQuery Optimization in MapReduce Framework

VinayakBorkar, Michael J. Carey, Chen Li, Inside “Big Data Management”:Ogres, Onions, or Parfaits?, EDBT/ICDT 2012 Joint Conference Berlin, Germany,2012 ACM 2012, pp 3-14.


Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.