Fault Tolerant Dynamic Scheduling in Grid Environment

K. Saravanan; Ponsy R.K. Sathiya Bhama

Fault Tolerant Dynamic Scheduling in Grid Environment

K. Saravanan, Ponsy R.K. Sathiya Bhama

Abstract

A Fault-tolerant dynamic scheduler meant for grid applications that uses job replication along with combines the compensation of static scheduling, specifically no overhead for the mistake free case, and low overhead in case of a fault. The dependency arrangement of a parallel program can be described by means of a directed acyclic graph (DAG). Scheduling graph consists of ordering the jobs and mapping them to processing units. Faults can happen at any point during time. A fault can be detected either because a processing unit and it does not send data at a prearranged time. Provide that fault-tolerance can be achieved while detecting that a processing unit has stopped through completing of a job, the status of this job is recovered from some checkpoint, and this job is re-executed on another processing unit, and all jobs (potentially) to be executed on this processing unit are reorganize to one more processor. Co-allocation involves transmission jobs to favorable resources. This is achieved by obtaining information from a variety of nodes of the grid and updating them as and when a modification occurs and assigning resources to the jobs based on their specifications and performances.

Keywords

Grid Computing, Dynamic Scheduling, Check pointing, Job Migration, and Fault Tolerance

Full Text:

PDF

References

Andre Luckow and Bettina Schnor, “Adaptive Checkpoint Replication for Supporting the Fault Tolerance of Applications in the Grid”, Seventh IEEE Transaction on Network Computing and Applications, 2008.

Bernhard Fechner, Udo Honig, Jorg Keller, and Wolfram Schiffmann, “Fault-Tolerant Static Scheduling for Grids”, Hagen Germany, IEEE Computer Society, 2008.

Qin Zheng, Bharadwaj Veeravalli and Chen-Khong Tham, “On the Design of Fault-Tolerant Scheduling Strategies Using Primary-Backup Approach for Computational Grids with Low Replication Costs”, IEEE Transactions on Computers, March 2009.

Filip H.A. Claeys, Bart Dhoedt, Filip De Turck, Maria Chtepen, Peter A. Vanrolleghem, and Piet Demeester, “Task Check-pointing and Replication: toward efficient fault-tolerant grids”, IEEE Transactions On Parallel and Distributed Systems, On February 2009.

Fabio Favarim, Joni da Silva Fraga and Miguel Correia, “GRIDTS:A New approach for Fault-Tolerant scheduling in grid computing” IEEE international Symposium on Network Computing and Applications, 2007.

G.Manimaran and C.SivaRamMurthy, “A Fault-tolerant Dynamic Scheduling algorithm for multiprocessor real-time systems and its analysis”, IEEE Transactions On Parallel and Distributed Systems,November 2002.

Karl Czajkowski, Ian Foster, and Carl Kesselman, “Resource Co-Allocation in Computational Grids”, In the 4th Workshop on Job Scheduling, 2000.

Deqing Zou, Hai Jin, Song Wu, and Jianhua Sun,“RT-GRID: A Real-Time, Fault Tolerant Grid Model”, IEEE International Conference, 2003.

J. H. Abawajy, “Fault-Tolerant Scheduling Policy for Grid Computing Systems”, IEEE International Advance Computing Conference, 2004.

Dominic Battre, Axel Keller “Job Migration and Fault Tolerance in SLA-aware Resource Management Systems”, The International Conference on Grid and Pervasive Computing, 2008.

Kerstin Voss, “Recursive Evaluation of Fault Tolerance Mechanisms for SLA Management”, Fourth International Conference on Networking and Services, 2008.

Janki Mehta, and Sanjay Chaudhary, “Checkpointing and Recovery Mechanism in Grid ”, IEEE International Symposium, April 2008.

Jang-uk In, Paul Avery, Richard Cavanaugh, Laukik Chitnis, Mandar Kulkarni, “SPHINX: A Fault-Tolerant System for Scheduling in Dynamic Grid Environments 19th IEEE International Parallel and Distributed Processing Symposium, 2007.

Marco A. S. Netto and Rajkumar Buyya, “Resource Co-allocation in Grid Computing Environments”, University of Melbourne, Australia.

M. Vivekananda Reddy, Sanjay Chaudhary, “Scheduling MPI applications using a fault-tolerant MPI implementation”, IEEE Conference, 2007.

Ruchir Shah, Bhardwaj Veeravalli and Manoj Mistra, “Adaptive Replication Based Security Aware and Fault Tolerant Job Scheduling for Grids”, Eighth ACIS International Conference Parallel / Distributed Computing, December 2007.

G.Manimaran C.Siva Ram Murthy, “A New Study for Fault-tolerant Real-time Dynamic Scheduling Algorithms”, IEEE conference, 1996.

Uwe Schwiegelshohn, Andrei Tchernykh, Ramin Yahyapour, “Online Scheduling in Grids”,IEEE conference, April 2008.

Yang Liu and Huacan He, “Multi-Unit Combinatorial Auction based Grid Co-allocation Approach”, Third International Conference on Semantics, Knowledge and Grid,2007.

Refbacks

There are currently no refbacks.

This work is licensed under a Creative Commons Attribution 3.0 License.

Username
Password
Remember me