Author : K.Mallika 1
Date of Publication :29th March 2018
Abstract: Nonparametric relational topic models provide a successful way to discover the hidden topics from a document network. Most of the theoretical and practical tasks, such as dimensional reduction, document clustering, and link prediction, would benefit from this revealed knowledge. The sampling algorithm scalable to large networks by using new network constrain methods instead of MRFs. Current MRF-based methods do not make the inference efficient enough. Specifically, each document is assigned a Gamma process, although this method provides a solution, it brings additional challenges when mathematically modeling the network structure of typical document network i.e., two spatially closer document stand to have more similar topics. we require the topics are shared the documents through gamma process. In order to resolve these challenges, we use a sub-sampling strategy to assign each and every document a different Gamma process from the global Gamma process, and the sub-sampling probabilities of documents are assigned with a sampling technique instead of Markov Random Field constraint that inherits the document network structure. Through the posterior inference algorithm, we can discover the hidden topics and its number simultaneously. Experimental results on the capabilities of learning the hidden topics and, more importantly, the number of topics.
Reference :
-
- D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent dirichlet allocation,” Journal of machine Learning Research, vol. 3, pp. 993 – 1022, 2003.
- D. M. Blei, “Probabilistic topic models,” Communications of the ACM, vol. 55, no. 4, pp. 77–84, Apr. 2012.
- J. Xuan, J. Lu, G. Zhang, R. Yi Da Xu, and X. Luo, “Infinite author topic model based on mixed gammanegative binomial process,” in 2015 IEEE International Conference on Data Mining, Nov 2015, pp. 489–498
- Z. Guo, Z. Zhang, S. Zhu, Y. Chi, and Y. Gong, “A two-level topic model towards knowledge discovery from citation networks,” IEEE Transactions on Knowledge and Data Engineering, vol. 26, no. 4, pp. 780–794, April 2014.
- B. Klimt and Y. Yang, “The Enron corpus: A new dataset for email classification research,” in Machine learning: ECML 2004. Springer, 2004, pp. 217–226.
- H. W. Park, “Hyperlink network analysis: A new method for the study of social structure on the web,” Connections, vol. 25, no. 1, pp. 49–61, 2003.
- C. Wang, J. Lu, and G. Zhang, “A constrained clustering approach to duplicate detection among relational data,” in Proceedings of 11th Pacific-Asia Conference in Knowledge Discovery and Data Mining, ser. PAKDD ‟07, Nanjing, China, 2007, pp. 308–319
- J. Chang and D. M. Blei, “Relational topic models for document networks,” in AISTATS, 2009, pp. 81–88.
- J. Chang, D. M. Blei et al., “Hierarchical relational models for document networks,” The Annals of Applied Statistics, vol. 4, no. 1, pp. 124–150, 2010.
- N. Chen, J. Zhu, F. Xia, and B. Zhang, “Discriminative relational topic models,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. PP, no. 99, pp. 1–1, 2014
- J. Xuan, J. Lu, G. Zhang, and X. Luo, “Topic model for graph mining,” IEEE Transactions on Cybernetics, vol. 45, no. 12, pp. 2792–2803, Dec 2015.
- A. McCallum, X. Wang, and A. Corrada-Emmanuel, “Topic and role discovery in social networks with experiments on Enron and academic email.” Journal of Artificial Intelligence Research, vol. 30, pp. 249–272, 2007