Author : Arati Anilrao Wardekar 1
Date of Publication :8th March 2018
Abstract: Due to heavy usage of internet large amount of diverse data is spread over it which provides access to particular data or to search most relevant data. It is very challenging for search engine to retrieve required data as per user’s need and which takes more time. So, to reduce large amount of time spend on searching most relevant data we proposed the “Smartcrawlerâ€. In this proposed approach, results taken from different web search engines to achieve relevant pages. Take online link from web and performing two stages crawling on that data or URL’s. In which sight locating and in-site exploring is carried out or obtaining most relevant site with the help of page ranking and reverse searching techniques. This system can works online and offline manner. This survey presents the fundamental challenges and studies existing models and solutions. It also highlights direction or way for future work.
Reference :
-
- Sourcerank: Relevance and trust assessment for deep web sources based oninter-source agreement. In Proceedings of the 20th international conference on World Wide Web, pages 227–236, 2011.
- Focused crawler:a new approach to topic-specific web resource discovery.Soumen Chakrabarti, Martin Van den Berg, and Byron Dom. 1999.
- Personalization on E-Content Retrieval Based on Semantic Web Services -A.B. Gil1
- Optimal Web Page Download Scheduling Policies for Green Web Crawling Vassiliki Hatzi, B. Barla Cambazoglu, and Iordanis Koutsopoulos, Senior Member, IEEE
- Search Engines going beyond Keyword Search: A Survey Mahmudur Rahma School of Computing and Information Sciences Florida International University,
- A model-based approach for crawling rich internet applications.Mustafa Emmre Dincturk, Guy vincent Jourdan, Gregor V. Bochmann, and Iosif Viorel Onut. ACM Transactions on the Web, 8(3):Article 19, 1–39, 2014.
- A hierarchical approach to model web query interfaces for web source integration. Eduard C. Dragut, Thomas Kabisch, Clement Yu, and UlfLeser. Proc. VLDB Endow., 2(1):325–336, August 2009
- Optimal Algorithms for Crawling a Hidden Database in the WebCheng Sheng Nan Zhang Yufei Tao Xin Jin.Proceedings of the VLDB Endowment, 5(11):1112– 2012.
- The weka data mining Software: an update. Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H. Witten. SIGKDD Explorations Newsletter, 11(1):10–18, November 2009.
- Deep web integration with visqi. Thomas Kabisch, Eduard C. Dragut, Clement Yu, and Ulf Leser. Proceedings of the VLDB Endowment, 3(1-2):1613– 1616, 2010.