Author : Snehal Ingole 1
Date of Publication :7th April 2016
Abstract: The problem of diversifying keyword search is firstly studied in IR community. Most of them perform diversification as a post-processing or re-ranking step of document retrieval based on the analysis of result set and/or the query logs. In IR, keyword search diversification is designed at the topic or document level. The ambiguity of keyword query makes it difficult to effectively answer keyword queries, especially for short and vague keyword queries. To address this challenging problem, in this paper we propose an approach that automatically diversifies XML keyword search based on its different contexts in the XML data. Given a short and vague keyword query and XML data to be searched, we first derive keyword search candidates of the query by a simple feature selection model. And then, we design an effective XML keyword search diversification model to measure the quality of each candidate. After that, two efficient algorithms are proposed to incrementally compute top-k qualified query candidates as the diversified search intentions. Two selection criteria are targeted: the k selected query candidates are most relevant to the given query while they have to cover maximal number of distinct results. At last, a comprehensive evaluation on real and synthetic data sets demonstrates the effectiveness of our proposed diversification model and the efficiency of our algorithms.
Reference :
-
- Y. Chen, W. Wang, Z. Liu, and X. Lin, “Keyword search on structured and semi-structured data,” in SIGMODConference, 2009, pp. 1005–1010.
- L. Guo, F. Shao, C. Botev, and J. Shanmugasundaram, “Xrank: Ranked keyword search over xml documents,” in SIGMOD Conference, 2003, pp. 16–27. [3] C. Sun, C. Y. Chan, and A. K. Goenka, “Multiway slca-based keyword search in xml data,” in WWW, 2007, pp. 1043–1052
- Y. Xu and Y. Papakonstantinou, “Efficient keyword search for smallest lcas in xml databases,” in SIGMOD Conference, 2005, pp. 537–538.
- R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong, “Diversifying search results,” in WSDM, 2009, pp. 5–14.
- F. Radlinski and S. T. Dumais, “Improving personalized w eb search using result diversification,” in SIGIR, 2006, pp. 691–692.
- E. Demidova, P. Fankhauser, X. Zhou, and W. Nejdl, “ DivQ: diversification for keyword search over structured databases,” in SIGIR, 2010, pp. 331–338.
- J. G. Carbonell and J. Goldstein, “The use of mmr, diversity-based reranking for reordering documents and producing summaries,” in SIGIR, 1998, pp. 335–336
- H. Chen and D. R. Karger, “Less is more: probabilistic models for retrieving fewer relevant documents,” in SIGIR, 2006, pp. 429–436.
- C. L. A. Clarke, M. Kolla, G. V. Cormack, O. Vechtomova, A. Ashkan, S. Büttcher, and I. MacKinnon, “Novelty and diversity in information retrieval evaluation,” in SIGIR, 2008, pp. 659–666.
- A. Angel and N. Koudas, “Efficient diversity-aware sear ch,” in SIGMOD Conference, 2011, pp. 781–792.
- Z. Chen and T. Li, “Addressing diverse user preferences in sql-query-result navigation,” in SIGMOD Conference, 2007, pp. 641–652.
- E. Vee, U. Srivastava, J. Shanmugasundaram, P. Bhat, and S. Amer-Yahia, “Efficient computation of diverse query results,” in ICDE, 2008, pp. 228–236.
- B. L. 0002 and H. V. Jagadish, “Using trees to depict a forest,” PVLDB, vol. 2, no. 1, pp. 133–144, 2009.
- Z. Liu, P. Sun, and Y. Chen, “Structured search result differentiation,” PVLDB, vol. 2, no. 1, pp. 313–324, 2009
- H. Peng, F. Long, and C. H. Q. Ding, “Feature selection based on mutual information: Criteria of maxdependency, max-relevance, and min-redundancy,” IEEE Trans. PatternAnal. Mach. Intell., vol. 27, no. 8, pp. 1226–1238, 2005.
- C. O. Sakar and O. Kursun, “A hybrid method for feature selection based on mutual information and canonical correlation analysis,” in ICPR, 2010, pp. 4360– 4363.
- N. Sarkas, N. Bansal, G. Das, and N. Koudas, “Measure-driven keyword-query expansion,” PVLDB, vol. 2, no. 1, pp. 121–132, 2009.
- N. Bansal, F. Chiang, N. Koudas, and F. W. Tompa, “Seekin g stable clusters in the blogosphere,” in VLDB, 2007, pp. 806–817
- “http://dblp.uni-trier.de/xml/.”
- “http://monetdb.cwi.nl/xml/.”
- M. J. Welch, J. Cho, and C. Olston, “Search result divers ity for informational queries,” in WWW, 2011, pp. 237–246.
- R. H. van Leuken, L. G. Pueyo, X. Olivares, and R. van Zwol, “Visual diversification of image search results,” in WWW, 2009, pp. 341–350.
- Z. Liu, S. Natarajan, and Y. Chen, “Query expansion base d on clustered results,” PVLDB, vol. 4, no. 6, pp. 350–361, 2011.
- S. Gollapudi and A. Sharma, “An axiomatic approach for result diversification,” in WWW, 2009, pp. 381–390.
- J. Wang and J. Zhu, “Portfolio theory of information retrieval,” in SIGIR, 2009, pp. 115–122.