Open Access Open Access  Restricted Access Subscription or Fee Access

Locality Aware Scheduling Using Prefetching Technique in Hadoop

Utsav Prajapati, Shyam Deshmukh

Abstract


Hadoop is a hastily growing environment of components for fulfilling the Google MapReduce algorithms in a scalable fashion on commodity hardware. Hadoop qualifies users to store and process large capacities of data and analyze it in ways not previously potential with less scalable solutions or standard SQL-based tactics. MapReduce offers a favorable programming model for big data processing. Data Locality is of most concern in MapReduce as to improve the performance and to decrease the network traffic. Many algorithms are there for improving the performance based on locality of data. Somehow there are many defects or more future work is there to be done in this area. “Moving computation to data is cheaper than moving computation to data.” By following this Hadoop principle, Data Locality is the more effective performance metric for effective computation. In the proposed system, a new different approach is given to achieve the data locality in map phase. Here, task is assigned to the requesting node if it has the local data. If requesting node has non local data then the data is pre-fetched to this node from the nearest node. We consider progress of node to start prefetching. This approach will improve performance with faster computation and reduce the network traffic

Keywords


Hadoop, MapReduce, Data Locality, Prefetching

Full Text:

PDF

References


http://en.wikipedia.org/wiki/Mapreduce

http://www.google.com/

http://hadoop.apache.org/

http://www.yahoo.com/

https://www.facebook.com/

http://www.amazon.com/

http://www.nytimes.com/

http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html

http://wiki.apache.org/hadoop/PoweredBy

http://kickstarthadoop.blogspot.in/2011/04/word-count-hadoop-map-reduce-example.html

http://en.wikipedia.org/wiki/Mapreduce

]http://www.ibm.com/developerworks/library/os-hadoop-scheduling/‎

] S. Khalil, S. A. Salem, S. Nassar and E. M. Saad, “Mapreduce Performance in Heterogeneous Environments: A Review”, International Journal of Scientific & Engineering Research, vol. 4, no. 4, (2013).

J. Xie, S. Yin, X. Ruan, Z. Ding, Y. Tian, J. Majors, A. Manzanares and X. Qin, “Improving MapReduce Performance through Data Placement in Heterogeneous Hadoop Clusters”, IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW), (2010) April 19-23: Arlanta, USA.

] M. Zaharia, D. Borthakur, J. Sen Sarma, K. Elmeleegy, S. Shenker and I. Stoica, “Delay Scheduling: A Simple Technique for Achieving Locality and Fairness in Cluster Scheduling”, Proceedings of the 5th European conference on Computer systems, (2010) April 13-16: Paris, France.

Z. Tang, J. Q. Zhou, K. L. Li and R. X. Li, “MTSD: A task scheduling algorithm for MapReduce base on deadline constraints”, IEEE International Symposium on Parallel and Distributed Processing Workshops and PhD Forum (IPDPSW), (2012) May 21-25: Shanghai, China.

X. Zhang, Z. Zhong, S. Feng and B. Tu, “Improving Data Locality of MapReduce by Scheduling in Homogeneous Computing Environments”, IEEE 9th International Symposium on Parallel and Distributed Processing with Applications (ISPA), (2011) May 26-28: Busan, Korea.

C. Abad, Y. Lu and R. Campbell, “DARE: Adaptive Data Replication for Efficient Cluster Scheduling”, IEEE International Conference on Cluster Computing (CLUSTER), (2011) September 26-30: Austin, USA.


Refbacks

  • There are currently no refbacks.