Author : D.Anusha 1
Date of Publication :16th November 2017
Abstract: The MapReduce is an open supply Hadoop framework applied for processing and generating distributed huge Terabyte facts on big clusters. Its principal motive is to lowering the of completion time of massive units of MapReduce jobs. Hadoop Cluster most effective has predefined constant slot configuration for cluster lifetime. This fixed slot configuration can also produce long completion time (Makespan) and system low resource utilization. The current open source Hadoop allows only static slot configuration, like fixed numbers of map slots and reduce slots throughout the cluster lifetime. Such static configuration may lead to long completion length as well as low system resource utilization. Propose new schemes which use slot ratio between Map and Reduce tasks as a tunable knob for minimizing the completion length (i.e., makespan) of a given set. By leveraging the workload information of recently completed jobs, schemes dynamically allocates resources (or slots) to map and reduce tasks. Many scheduling methodologies are discussed that aim to improve execution performance as well as completion time goal.
Reference :
-
- S. R. Hejazi and S. Saghafian, ―Flowshop-scheduling problemswith makespan criterion: A review,‖ Int. J. Production Res., vol. 43,no. 14, pp. 2895–2929, 2005.
- S. Agarwal, S. Kandula, N. Bruno, M.-C. Wu, I. Stoica, and J.Zhou, ―Re-optimizing data-parallel computing,‖ in Proc. 9th USENIX Conf. Netw. Syst. Design Implementation, 2012, p. 21.
- P. Agrawal, D. Kifer, and C. Olston, ―Scheduling shared scans oflarge data files,‖ Proc. VLDB Endow., vol. 1, no. 1, pp. 958–969,Aug. 2008.
- W. Cirne and F. Berman, ―When the herd is smart: Aggregatebehavior in the selection of job request,‖ IEEE Trans. Parallel Distrib. Syst., vol. 14, no. 2, pp. 181–192, Feb. 2003.
- T. Condie, N. Conway, P. Alvaro, J. M. Hellerstein, K. Elmeleegy,and R. Sears, ―Mapreduce online,‖ in Proc. 7th USENIX Conf.Netw. Syst. Design Implementation, 2010, p. 21.
- J. Dean and S. Ghemawat, ―Mapreduce: Simplified data processing on large clusters,‖ in Proc. 6th Conf. Symp. Oper. Syst. DesignImplementation, 2004, vol. 6, p. 10.
- J. Dittrich, J.-A.-Quiane Ruiz, A. Jindal, Y. Kargin, V. Setty, and J.Schad, ―adoop++: Making a yellow elephant run like a cheetah(without it even noticing),‖ Proc. VLDB Endowment, vol. 3,nos. 1–2, pp. 515–529, Sep. 2010.
- P.-F. Dutot, L. Eyraud, G. Mounie, and D. Trystram, ―Bi-criteriaalgorithm for scheduling jobs on cluster platforms,‖ in Proc. 16thAnnu. ACM Symp. Parallelism Algorithms Archit., 2004, pp. 125–132.
- P.-F. Dutot, G.Mounie, and D. Trystram,―Scheduling paralleltasks: Approximation algorithms,‖ in Handbo ok of Scheduling:Algorithms, Models, and Performance Analysis, J. T. Leung, Ed. BocaRaton, FL, USA: CRC Press, ch. 26, pp. 26-1–26-24.
- J. Gupta, A. Hariri, and C. Potts, ―Scheduling a twostage hybridflow shop with parallel machines at the first stage,‖ Ann. Oper.Res., vol. 69, pp. 171–191, 1997.