Author : Vattipally Latha 1
Date of Publication :16th May 2018
Abstract: Most existing work makes of MapReduce efficiency development via optimizing its information transmission. Additionally to information partition, many efforts have been made on nearby aggregation, in-mapper combining and in-network aggregation to decrease network visitors inside MapReduce jobs. The goal of enhancement of network visitors is done with the aid of utilizing partition and aggregation. In line with typical system a hash function is used to partition intermediate knowledge amongst scale back duties however the natural operate will not be efficient to control network traffic. A new intermediate data partition scheme is designed to lessen network user’s rate in MapReduce. The aggregator placement concern is regarded, where each and every aggregator can slash merged traffic from more than one map duties. On this paper, we are studying the joint optimization of intermediate data partition and aggregation in MapReduce to lessen network traffic cost for large information purposes. We advocate a three-layer model for this hindrance and formulate it as a mixed-integer nonlinear main issue, which is then transferred into a linear kind that may be solved by way of mathematical tools.
Reference :
-
- J. Dean and S. Ghemawat, “Mapreduce: simplified data processing on large clusters,” Communications of the ACM, vol. 51, no. 1, pp. 107– 113, 2008.
- W. Wang, K. Zhu, L. Ying, J. Tan, and L. Zhang, “Map task scheduling in mapreduce with data locality: Throughput and heavy-traffic optimality,” in INFOCOM, 2013 Proceedings IEEE. IEEE, 2013, pp. 1609–1617.
- F. Chen, M. Kodialam, and T. Lakshman, “Joint scheduling of processing and shuffle phases in mapreduce systems,” in INFOCOM, 2012 Proceedings IEEE. IEEE, 2012, pp. 1143–1151.
- Y. Wang, W. Wang, C. Ma, and D. Meng, “Zput: A speedy data uploading approach for the hadoop distributed file system,” in Cluster Computing (CLUSTER), 2013 IEEE International Conference on. IEEE, 2013, pp. 1–5.
- T. White, Hadoop: the definitive guide: the definitive guide. ” O’Reilly Media, Inc.”, 2009
- S. Chen and S. W. Schlosser, “Map-reduce meets wider varieties of applications,” Intel Research Pittsburgh, Tech. Rep. IRP-TR-08-05, 2008.
- J. Rosen, N. Polyzotis, V. Borkar, Y. Bu, M. J. Carey, M. Weimer, T. Condie, and R. Ramakrishnan, “Iterative mapreduce for large scale machine learning,” arXiv preprint arXiv:1303.3517, 2013.
- S. Venkataraman, E. Bodzsar, I. Roy, A. AuYoung, and R. S. Schreiber, “Presto: distributed machine learning and graph processing with sparse matrices,” in Proceedings of the 8th ACM European Conference on Computer Systems. ACM, 2013, pp. 197– 210.
- A. Matsunaga, M. Tsugawa, and J. Fortes, “Cloudblast: Combining mapreduce and virtualization on distributed resources for bioinformatics applications,” in eScience, 2008. eScience’08. IEEE Fourth International Conference on. IEEE, 2008, pp. 222–229.
- J. Wang, D. Crawl, I. Altintas, K. Tzoumas, and V. Markl, “Comparison of distributed data-parallelization patterns for big data analysis: A bioinformatics case study,” in Proceedings of the Fourth International Workshop on Data Intensive Computing in the Clouds (DataCloud), 2013.