Open Access Journal

ISSN : 2394-2320 (Online)

International Journal of Engineering Research in Computer Science and Engineering (IJERCSE)

Monthly Journal for Computer Science and Engineering

Open Access Journal

International Journal of Engineering Research in Computer Science and Engineering (IJERCSE)

Monthly Journal for Computer Science and Engineering

ISSN : 2394-2320 (Online)

Improve Performance of Crawler Using K-means Clustering

Author : Swati G. Bhoi 1 Prof. Ujwala M. Patil 2

Date of Publication :22nd August 2017

Abstract: Nowadays the Internet is part of life because of any information is easily available on the Internet. It has a large size of information; hence the high efficiency and get relevant information are challenging issue due to the changing nature of the deep web. As crawler plays important role in such cases. So we proposed such crawler which provides efficient and extracts relevant information from web. The smart crawler contains two-phases as site locating and in-site exploring. We developed smart crawler using K-means clustering methods. Clustering makes a group of similar data items known as clusters. Here we describe K-means clustering techniques. The most famous clustering method is K-means methods which divide data items in K clusters and provide better result with high efficiency. Also we compare the result of existing system and smart crawler using Kmeans provide an efficient harvesting rate of deep websites within the least amount of time.

Reference :

    1. Olston and M. Najork, “Web Crawling,” Foundations and Trends in Information Retrieval, . 4, pp. 175- 246, 2010.
    2. F. Zhao, J. Zhou, C. Nie, and HaiJin, “SmartCrawler: A Two-stage Crawler for Efficiently Harvesting Deep-Web Interfaces,” IEEE Transactions on Services Computing, vol. 99, pp. 1-14, 2015
    3. Savita and Sachin Shrivastava, “Search Engine Indexing Using K-means Clustering Techniques,” International Journal of Advance Research in Science and Engineering, vol. 5, pp. 218-227, 2016.
    4. Bh.Bangoria, N. Mankad and V. Pambhar, “Enhanced k-means clustring algorithm to reduce time complexity for numeric valuess,” International Journal of Advance Engineering and Research Development, vol. 1, pp. 1-9, 2014.
    5. A. Anitha, “An Efficient Agglomerative Clustering Algorithm for Web Navigation Pattern Identification,” In Scientific Research Publishing, vol. 7, pp. 2349-2356, 2016.
    6. G.H. Kim, Kyu-Young Whang and Min-Soo Kim, “Incremental Clustering Crawler for CommunityLimited Search,” Applications of Digital Information and Web Technologies, pp. 438-445, 2009.
    7. P. Dubey and A. Rajavat, “Implementation aspect of k-means algorithm for Improving performance” Proceedings of 28th IRF International Conference. vol. 10, pp. 96-102, 2015.
    8. Y. Thakare and S. Bagal, “Performance Evaluation of K-means Clustering Algorithm with Various Distance Metrics,” International Journal of Computer Application, vol. 110, pp. 12-16, 2015.
    9. Unnati R. Raval and Chaita Jani, “Implementing and Improvisation of K-means Clustering,” International Journal of Computer Science and Mobile Computing, vol. 4, pp. 72-76, 2015
    10. Sara Sandabad, Achraf Benba, Yassine Sayd Tahri and Ahmed Hammouch, “New method of tumor detection using K-means classifier and thresholding process,” IJCSI International Journal of Computer Science, vol. 12, pp. 132-136, 2015.

Recent Article