International Journal of Engineering Research in Computer Science and Engineering (IJERCSE)

Extracting Top K-List from Web Pages

Author : Shweta Chandge ¹ Prof. Ajay Chhajed ²

Date of Publication :7th January 2017

Abstract: It is very critical to find relevant and desired information in a small span of time in the current days. While surfing over the internet to find some data , a very small proportion of it can be interpreted or understood. Also it needs a lot of time to extract it. In this paper we provide a solution to this problem by extracting information from top-k websites, which consist top k instances of a subject. For Examples”top 5 football teams in the world”. In comparison with other structured information like web tables top-k lists contains high quality information. This enhances open-domain knowledge base [which can support search or fact answering applications]. Proposed system in paper extracts the top k list by using title classifier, parser, candidatepicker, ranker, content processor.

Reference :

1. Zhixian Zhang, Kenny Q. Zhu, Haixun Wang Hong song Li , “Automatic Extraction of Top-k Lists from the Web” IEEE ,ICDE Conference, 2013, 978-1-4673-4910-9.
2. Z. Zhang, K. Q. Zhu, and H. Wang, “A system for extracting top-k lists from the web” in KDD, 2012.
3. W. Wu, H. Li, H. Wang, and K. Q. Zhu, ”Probase: A probabilistic taxonomy for text understanding” in SIGMOD, 2012.
4. X. Cao, G. Cong, B. Cui, C. Jensen, and Q. Yuan, ” Approaches to exploring category information for question retrieval in community questionanswer archives,” TOIS, vol. 30, no. 2, p. 7,2012.
5. J. Wang, H. Wang, Z. Wang, and K. Q. Zhu, ”Understanding tables on the web,” in ER, 2012, pp. 141155.
6. F. Fumarola, T. Weninger, R. Barber, D. Malerba, and J. Han, ” Extracting general lists from web documents: A hybrid approach,” in IEA/AIE (1), 2011, pp. 285294.
7. Y. Song, H. Wang, Z. Wang, H. Li, and W. Chen, ”Short text conceptualization using a probabilistic knowledge base,” in IJCAI, 2011.
8. Angel, S. Chaudhuri, G. Das, and N. Koudas, ”Ranking objects based on relationships and fixed associations,” in EDBT, 2009, pp. 910921.
9. G. Miao, J. Tatemura, W.-P. Hsiung, A. Sawires, and L. E. Moser,” Extracting data records from the web using tag path clustering,” in WWW, 2009, pp. 981990.
10. EK. Fisher, D. Walker, K. Q. Zhu, and P. White,”From dirt to shovels: Fully automatic tools generation from ad hoc data,” in ACM POPL,2008.
11. N. Bansal, S. Guha, and N. Koudas, ”Ad-hoc aggregations of ranked lists in the presence of hierarchies,” in SIGMOD, 2008, pp. 6778.
12. M. J. Cafarella, E. Wu, A. Halevy, Y. Zhang, and D. Z. Wang,”Web tables: Exploring the power of tables on the web,” in VLDB, 2008.
13. W. Gatterbauer, P. Bohunsky, M. Herzog, B. Krupl, and B. Pollak, ”Towards domainindependent information extraction from web tables,” in WWW. ACM Press, 2007, pp. 7180.
14. K. Chakrabarti, V. Ganti, J. Han, and D. Xin, ”Ranking objects based on relationships,” in SIGMOD, 2006, pp. 371382.
15. B. Liu, R. L. Grossman, and Y. Zhai, ”Mining data records in web pages,” in KDD, 2003, pp. 601606.
16. P Deshmane , P.Patil, AbhaPathak “Survey on web mining techniques for Extraction of top k list”IJMTER 2015