Author : Dr.P.Kalyani 1
Date of Publication :7th February 2017
Abstract: Clustering is the process of assemble or aggregating of data items. Sentence clustering mainly used in types of applications such as classify and categorization of documents, automatic summary generation, organizing the documents, etc. In text processing, sentence clustering plays a vital role this is used in text mining activities. Size of the clusters may change from one cluster to another. The traditional clustering algorithms have some problems in clustering the input dataset. The problems such as, instability of clusters, complexity and sensitivity. To overcome the drawbacks of these clustering algorithms, this paper proposes a hierarchical hybrid frequent pattern mining algorithm and Hierarchical Fuzzy Relational Eigenvector Centrality based Clustering Algorithm (HFRECCA) which is used for clustering the sentences. Contents present in text documents contain hierarchical structure and there are many terms present in the documents which are related to more than one theme hence HFRECCA will be useful algorithm for natural language documents. Frequent pattern mining algorithm is an influential algorithm for mining frequent item sets for boolean association rules. It uses a "bottom up" approach, where frequent subsets are extended one item at a time (a step known as candidate generation, and groups of candidates are tested against the data).
Reference :
-
- D.R. Radev, H. Jing, M. Stys, and D. Tam, “Centroid-Based Summarization of Multiple Documents,” Information Processing and Management: An Int’l J., vol. 40, pp. 919-938, 2004
- B. Frakes and R. Baeza-Yates, Information Retrieval: Data Structures and Algorithms. Prentice Hall, 1992.
- R. Nock and F. Nielsen, “On Weighting Clustering,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 28, no. 8, pp. 1223- 1235, Aug. 2006
- C.D. Manning, P. Raghavan, and H. Schu¨ tze, Introduction to Information Retrieval. Cambridge Univ. Press, 2008.
- Y. Chen, E.K. Garcia, M.R. Gupta, A. Rahimi, and L. Cazzanti, “Similarity-Based Classification: Concepts and Algorithms,” J. Machine Learning Research, vol. 10, pp. 747-776, 2009.
- A. Rosenberg and J. Hirschberg, “V-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure,” Proc Conf. Empirical Methods in Natural Language Processing (EMNLP ’07), pp. 410-420, 2007.
- P. Corsini, F. Lazzerini, and F. Marcelloni, “A New Fuzzy Relational Clustering Algorithm Based on the Fuzzy C-Means Algorithm,” Soft Computing, vol. 9, pp. 439-447, 2005.
- A. Budanitsky and G. Hirst, “Evaluating WordNetBased Measures of Lexical Semantic Relatedness,” Computational Linguistics, vol. 32, no. 1, pp. 13-47, 2006.
- S. Shehata, F. Karray, and M. Kamel, “Enhancing Text Clustering Using Concept-Based Mining Model,” Proc. Sixth IEEE Int’l Conf. Data Mining (ICDM), 2006.
- T. Hisamitsu and Y. Niwa, “A Measure of Term Representativeness based on the Number of CoOccurring Salient Words,” Proc. 19th Int’l Conf. Computational Linguistics (COLING ’02), vol. 1, pp. 1- 7, 2002.
- Adway Mitra; Soma Biswas; Chiranjib Bhattacharyya"Bayesian Modeling of Temporal Coherence in Videos for Entity Discovery and Summarization"in IEEE Transactions on Pattern Analysis and Machine Intelligence ,Year: 2016, Volume: PP, Issue: 99