Author : M. Bairavi 1
Date of Publication :13th February 2018
Abstract: With the rapidly increasing amounts of data produced worldwide, networked and multi-user storage systems are becoming very popular. However, concerns over data security still prevent many users from migrating data to remote storage. The conventional solution is to encrypt the data before it leaves the owner’s premises. While sound from a security perspective, this approach prevents the storage provider from effectively applying storage efficiency functions, such as compression and deduplication, which would allow optimal use of the resources and consequently lower service cost. Client-side data deduplication, in particular, ensures that multiple uploads of the same content only consume network bandwidth and storage space of a single upload. Deduplication is actively used by a number of cloud backup providers as well as various cloud services. Unfortunately, encrypted data is pseudorandom and thus cannot be deduplicated: as a consequence, current schemes have to entirely sacrifice either security or storage efficiency. In this paper, we present schemes that permit a more fine-grained trade-off in data chunk similarity. The intuition is that outsourced data may require different levels of protection, depending on how popular it is: content shared by many users. Various deduplication schemes are analyzed and provide experimental results that show proposed secure data chunk similarity provide improved results in real time cloud environments
Reference :
-
- L. Wang, J. Zhan, W. Shi and Y. Liang, ―In cloud, can scientific communities benefit from the economies of scale?‖ IEEE Transactions on Parallel and Distributed Systems 23(2): 296-303, 2012.
- B. Li, E. Mazur, Y. Diao, A. McGregor and P. Shenoy, ―A platform for scalable one-pass analytics using mapreduce,‖ in: Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'11), 2011, pp. 985-996.
- R. Kienzler, R. Bruggmann, A. Ranganathan and N. Tatbul, ―Stream as you go: The case for incremental data access and processing in the cloud,‖ IEEE ICDE International Workshop on Data Management in the Cloud (DMC'12), 2012
- C. Olston, G. Chiou, L. Chitnis, F. Liu, Y. Han, M. Larsson, A. Neumann, V.B.N. Rao, V. Sankarasubramanian, S. Seth, C. Tian, T. ZiCornell and X. Wang, ―Nova: Continuous pig/hadoop workflows,‖ Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'11), pp. 1081-1090, 2011.
- K.H. Lee, Y.J. Lee, H. Choi, Y.D. Chung and B. Moon, ―Parallel data processing with mapreduce: A survey,‖ ACM SIGMOD Record 40(4): 11-20, 2012.
- X. Zhang, C. Liu, S. Nepal and J. Chen, ―An Efficient Quasiidentifier Index based Approach for Privacy Preservation over Incremental Data Sets on Cloud,‖ Journal of Computer and System Sciences (JCSS), 79(5): 542-555, 2013.
- X. Zhang, T. Yang, C. Liu and J. Chen, ―A Scalable Two-Phase Top-Down Specialization Approach for Data Anonymization using Systems, in MapReduce on Cloud,‖ IEEE Transactions on Parallel and Distributed, 25(2): 363-373, 2014.