Author : Juturu Chandana 1
Date of Publication :13th September 2017
Abstract: Data mining is the analysis step of the “Knowledge discovery in data bases process†[1]. Actually, it is very hard to mine item-sets which are frequently used in the transactions. To identify frequently used item-sets, parallel algorithms which are used for mining were developed. These parallel algorithms were developed to balance the data and to maintain equal partitions of data, among a group of nodes which are to be computed. Because of redundant transactions, there is a significant performance problem of parallel frequent item-sets mining. Therefore, a data partitioning technique has been developed. FiDoop-DP is a kind of data partitioning method which is used to divide the data based on item-sets of the transaction which are brought by the clients or customers. To know better about frequent item-sets i.e., products which are regularly sold together, an algorithm is used for time consumption while running data which is extremely large. This algorithm is named as Equivalence Class clustering and Lattice Traversal algorithm (ECLAT). This ECLAT algorithm is combined with the Map-Reduce functionality, and then it gives better solutions within small amount of time. At the same time ECLAT is combined with Local sensitive hashing technique for better performance of items which are present at locally present in the data nodes. By combining those two techniques, the performance of FiDoop increases. This is known by the time taken to mine frequent item-sets. The main goal of this paper is to mine the itemsets which are prominently used or sold in the market by that it can increase the sales of those products.
Reference :
-
- M. J. Zaki, “Parallel and distributed association mining: A survey,” Concurrency, IEEE, vol. 7, no. 4, pp. 14–25, 1999.
- Yaling Xun, Jifu Zhang, Xiao Qin and Xujun Zhao “FiDoop-DP: Data Partitioning in Frequent Item-set Mining on Hadoop Clusters” IEEE Transactions on Parallel and Distributed Systems, pp.7- 14, 2016.
- W. Lu, Y. Shen, S. Chen, and B. C. Ooi, “Efficient processing of knearest neighbor joins using mapreduce,” Proceedings of the VLDB Endowment, vol. 5, no. 10, pp. 1016–1027, 2012.
- I. Pramudiono and M. Kitsuregawa, “Parallel fpgrowth on pc cluster,” in Advances in Knowledge Discovery and Data Mining. Springer, 2003, pp. 467–473.
- A. Stupar, S. Michel, and R. Schenkel, “Rank reduce– processing knearest-neighbor queries on top of mapreduce,” in Proceedings of the 8th Workshop on Large-Scale Distributed Systems for Information Retrieval. Citeseer, 2010, pp. 13–18.
- B. Bahmani, A. Goel, and R. Shinde, “Efficient distributed locality sensitive hashing,” in Proceedings of the 21st ACM international conference on Information and knowledge management.ACM, 2012, pp.2174–2178.
- T. Kanungo, D. M. Mount, N. S. Netanyahu, C. D. Piatko, R. Silverman, and A. Y. Wu, “An efficient kmeans clustering algorithm: Analysis and implementation,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 24, no. 7, pp. 881–892, 2002.
- A. K. Jain, “Data clustering: 50 years beyond kmeans,” Pattern Recognition Letters, vol. 31, no. 8, pp. 651–666, 20
- http://www.phillipefourier.comAccessed on June 22, 2017.