Dynamic Allocation of Cloud Resources for Parallel Data Processing
Abstract
Cloud computing is the access to computers and their
functionality via the Internet. Cloud computing paradigm makes the computing be assigned in a great number of distributed computers, rather than local computer or remote server. The character of cloud
computing is in the virtualization, distribution and dynamic
extendibility. Infrastructure as a Service (IaaS) cloud computing focuses on providing a computing infrastructure that leverages system virtualization to allow multiple Virtual Machines (VM) to be
consolidated on one Physical Machine (PM) where VMs often
represent components of Application Environments (AE).Ad-hoc parallel data processing has emerged to be one of the killer applications for Infrastructure-as-a-Service (IaaS) clouds. Major Cloud computing companies have started to integrate frameworks for parallel data processing, making it easy for customers to access these
services and to deploy their programs. The processing frameworks which are currently used have been designed for static, homogeneous cluster setups and disregard the particular nature of a cloud. Consequently, the allocated compute resources may be inadequate for
big parts of the submitted job and unnecessarily increase processing time and cost. The objective of this paper is to explicitly exploit the dynamic resource allocation offered by today’s IaaS clouds for both,
task scheduling and execution. Particular tasks of a processing job can be assigned to different types of virtual machines which are automatically instantiated and terminated during the job execution.
Keywords
Full Text:
PDFReferences
R. Chaiken, B. Jenkins, P.-A. Larson, B. Ramsey, D. Shakib, S. Weaver,
and J. Zhou. SCOPE: Easy and Efficient Parallel Processing of Massive
Data Sets. Proc. VLDB Endow., 1(2):1265–1276, 2008. 5
H. chih Yang, A. Dasdan, R.-L. Hsiao, and D. S. Parker. Map-Reduce-
Merge: Simplified Relational Data Processing on Large Clusters. In
SIGMOD ’07: Proceedings of the 2007 ACM SIGMOD international
conference on Management of data, pages 1029–1040, New York, NY,
USA, 2007. ACM,6
J. Dean and S. Ghemawat. MapReduce: Simplified Data Processing on
Large Clusters. In OSDI’04: Proceedings of the 6th conference on
Symposium on Opearting Systems Design & Implementation, pages 10–
, Berkeley, CA, USA, 2004. USENIX Association. 9
M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad:
Distributed Data-Parallel Programs from Sequential Building Blocks. In
EuroSys ’07: Proceedings of the 2nd ACM SIGOPS/EuroSys European
Conference on Computer Systems 2007, pages 59–72, New York, NY,
USA, 2007. ACM.14
I. Raicu, I. Foster, and Y. Zhao. Many-Task Computing for Grids and
Supercomputers. In Many-Task Computing on Grids and
Supercomputers, 2008. MTAGS 2008. Workshop on, pages 1–11, Nov.
20
H. chih Yang, A. Dasdan, R.-L. Hsiao, and D. S. Parker. Map-Reduce-
Merge: Simplified Relational Data Processing on Large Clusters. In
SIGMOD ’07: Proceedings of the 2007 ACM SIGMOD international
conference on Management of data, pages 1029–1040, New York, NY,
USA, 2007. ACM.25
Amazon Web Services LLC. Amazon Elastic MapReduce.
http://aws.amazon.com/elasticmapreduce/, 2009.2
E. Deelman, G. Singh, M.-H. Su, J. Blythe, Y. Gil, C. Kesselman, G.
Mehta, K. Vahi, G. B. Berriman, J. Good, A. Laity, J. C. Jacob, and D.
S. Katz. Pegasus: A Framework for Mapping Complex Scientific
Workflows onto Distributed Systems. Sci. Program., 13(3):219–237,
10
M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad:
Distributed Data-Parallel Programs from Sequential Building Blocks. In
EuroSys ’07: Proceedings of the 2nd ACM SIGOPS/EuroSys European
Conference on Computer Systems 2007, pages 59–72, New York, NY,
USA, 2007. ACM.
Warneke, D and O. Kao, 2011. Exploiting Dynamicresource allocation
for efficient parallel data processing in the cloud. IEEE Trans. Parallel
Distributed Syst., 22: 985-997. DOI:10.1109/TPDS.2011.65
White, T., 2010. Hadoop: The Definitive Guide. 2ndEdn., O’Reilly
Media, Beijing, ISBN: 1449389732,pp: 600.
D. Battr´e, S. Ewen, F. Hueske, O. Kao, V. Markl, and D. Warneke.
Nephele/PACTs: A Programming Model and Execution Framework for
Web-Scale Analytical Processing. In SoCC ’10: Proceedings of the
ACM Symposium on Cloud Computing 2010, pages 119– 130, New
York, NY, USA, 2010. ACM.
Refbacks
- There are currently no refbacks.
This work is licensed under a Creative Commons Attribution 3.0 License.