Date of Publication :7th January 2017
Abstract: It is very critical to find relevant and desired information in a small span of time in the current days. While surfing over the internet to find some data , a very small proportion of it can be interpreted or understood. Also it needs a lot of time to extract it. In this paper we provide a solution to this problem by extracting information from top-k websites, which consist top k instances of a subject. For Examples”top 5 football teams in the world”. In comparison with other structured information like web tables top-k lists contains high quality information. This enhances open-domain knowledge base [which can support search or fact answering applications]. Proposed system in paper extracts the top k list by using title classifier, parser, candidatepicker, ranker, content processor.
Reference :
-
- Zhixian Zhang, Kenny Q. Zhu, Haixun Wang Hong song Li , “Automatic Extraction of Top-k Lists from the Web” IEEE ,ICDE Conference, 2013, 978-1-4673-4910-9.
- Z. Zhang, K. Q. Zhu, and H. Wang, “A system for extracting top-k lists from the web” in KDD, 2012.
- W. Wu, H. Li, H. Wang, and K. Q. Zhu, ”Probase: A probabilistic taxonomy for text understanding” in SIGMOD, 2012.
- X. Cao, G. Cong, B. Cui, C. Jensen, and Q. Yuan, ” Approaches to exploring category information for question retrieval in community questionanswer archives,” TOIS, vol. 30, no. 2, p. 7,2012.
- J. Wang, H. Wang, Z. Wang, and K. Q. Zhu, ”Understanding tables on the web,” in ER, 2012, pp. 141155.
- F. Fumarola, T. Weninger, R. Barber, D. Malerba, and J. Han, ” Extracting general lists from web documents: A hybrid approach,” in IEA/AIE (1), 2011, pp. 285294.
- Y. Song, H. Wang, Z. Wang, H. Li, and W. Chen, ”Short text conceptualization using a probabilistic knowledge base,” in IJCAI, 2011.
- Angel, S. Chaudhuri, G. Das, and N. Koudas, ”Ranking objects based on relationships and fixed associations,” in EDBT, 2009, pp. 910921.
- G. Miao, J. Tatemura, W.-P. Hsiung, A. Sawires, and L. E. Moser,” Extracting data records from the web using tag path clustering,” in WWW, 2009, pp. 981990.
- EK. Fisher, D. Walker, K. Q. Zhu, and P. White,”From dirt to shovels: Fully automatic tools generation from ad hoc data,” in ACM POPL,2008.
- N. Bansal, S. Guha, and N. Koudas, ”Ad-hoc aggregations of ranked lists in the presence of hierarchies,” in SIGMOD, 2008, pp. 6778.
- M. J. Cafarella, E. Wu, A. Halevy, Y. Zhang, and D. Z. Wang,”Web tables: Exploring the power of tables on the web,” in VLDB, 2008.
- W. Gatterbauer, P. Bohunsky, M. Herzog, B. Krupl, and B. Pollak, ”Towards domainindependent information extraction from web tables,” in WWW. ACM Press, 2007, pp. 7180.
- K. Chakrabarti, V. Ganti, J. Han, and D. Xin, ”Ranking objects based on relationships,” in SIGMOD, 2006, pp. 371382.
- B. Liu, R. L. Grossman, and Y. Zhai, ”Mining data records in web pages,” in KDD, 2003, pp. 601606.
- P Deshmane , P.Patil, AbhaPathak “Survey on web mining techniques for Extraction of top k list”IJMTER 2015