Open Access Journal

ISSN : 2394-2320 (Online)

International Journal of Engineering Research in Computer Science and Engineering (IJERCSE)

Monthly Journal for Computer Science and Engineering

Open Access Journal

International Journal of Engineering Research in Computer Science and Engineering (IJERCSE)

Monthly Journal for Computer Science and Engineering

ISSN : 2394-2320 (Online)

Data Mining Method using Clustering Mechanisms and Feature assortment for efficient content categorization

Author : E. Ramesh 1 Dr. B Tarakeswara Rao 2

Date of Publication :9th September 2017

Abstract: Data mining manages the way toward finding data from data. With the wide accessibility of data, there are different applications and requirements for data mining. Data can be of any sort specifically message, pictures, recordings and some more. This work concentrates on the characterization of content data utilizing a semi-regulated grouping calculation. The principle issue in any data mining assignment is the treatment of gigantic data. Immense dimensionality does not demonstrate more data rather it might incorporate irregularities and commotions. To make the data predictable powerful pre-preparing strategies are finished. Notwithstanding this element choice system chooses valuable highlights and evacuates superfluous ones along these lines making the data significant for mining. A semi-managed bunching calculation TESC was utilized as a part of this investigation. It was then changed by actualizing a component determination strategy, record recurrence thresholding by which the tremendous dimensionality issue was tended to. The proposed framework along these lines beats the current strategy. Tests were led on Reuters- 21578 which portrayed better execution with lessened time many-sided quality.

Reference :

    1. Jiliang Tang and Salem Alelyani and Huan Liu, "Feature Selection for Classification: A Review",url: http://citeseerx.ist.psu.edu
    2. Feng Chen, Pan Deng, JiafuWan and Daqiang Zhang and Athanasios V Vasilakos and Xiaohui Rong, "Data Mining for the Internet of Things:Literature Review and Challenges" , In Proc. of Hindawi Publishing Corporation International Journal of Distributed Sensor Networks, 2015, url: http:// dx.doi .org/10. 1155/2015 /431047.
    3. Nimit Kumar and Krishna Kummamuru, "Semisupervised Clustering with Metric Learning Using Relative Comparisons", IEEE Transactions on Knowledge and Data Engineering, vol,20, issue.4 April 2008.
    4. Vikram Singh and Balwinder Saini, "An Effective Pre- Processing Algorithm for Data Retrieval Systems", International Journal of Database Management Systems - IJDMS, December 2014, vol.6,issue 6.
    5. Wen Zhang, Taketoshi Yoshida and Xijin Tang, " A comparative study of TF-IDF LSI and multi-words for text classification", In Proc. of Expert Systems with Applications 38, pp. 2758– 2765, 2011, url: www.sciencedirect.com.
    6. Yung-Shen Lin,Jung-Yi Jiang and Shie-Jue Lee, "A Similarity Measure for Text Classification and Clustering", IEEE Transactions on Knowledge and Data Engineering, vol.26 , pp. 1575–1590, July 2014.
    7. Xiaofei Zhoua and Yue Hua, Li Guoa, "Text Categorization Based on Clustering Feature Selection", In Proc. of Procedia Computer Science 31, pp. 398-405, 2014, url: www.sciencedirect.com.
    8. Yimming Yang and Jan O Pedersen,"A Comparitive study on Feature Selection in Text Categorization", 2012. [9]. Yan Xu, Bin Wang, JinTao Li and Hongfang Jing, "An Extended Document Frequency Metric for FeatureSelection in Text Categorization", In Proc. of Springer- Ve rlag Berlin Heidelberg, 2008, pp.71-82.
    9. Diederik P Kingma, Danilo J Rezendey, Shakir Mohamedy and Max Welling, "Semi-supervised Learning with Deep Generative Models", Proceedings of the International Conference on Machine Learning ICML,October 2014
    10. Ishtiaq Ahmed, Rahman Ali, Donghai Guan, Young-Koo Lee, Sungyoung Lee and TaeChoong Chung, "Semi- supervised learning using frequent itemset and ensemble learning for SMS classification", In Proc. of Expert Systems with Applications, 2014.
    11. Zhaocai Sun,Yun ming Ye, Xiaofeng Zhang, Zhexue Huang, Shudong Chen and Zhi Liu "Batch-Mode Active Learning With Se mi-supervised Cluster Tree For Text Classification", In Proc. of IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology, 2012, pp.388-395.
    12. Reuter-21578 Distribution 1.0 , Available Online at url: http://www.research.att.com/lewis.
    13. Stanford POS Tagger, Available Online at url: http://nlp.stanford.edu/software/tagger.shtmlDownload.
    14. TF-IDF calculation, Available Online at url: http://www.tfidf.com/
    15. Cosine Similarity, Available Online at url : https://en.wikipedia.org/wiki/ Cosinesimilarity"
    16. Wei Bi, James T Kwok, "Efficient Multi-label Classification with Many Labels", In Proc. of the 30 th International Conference on Machine Learning, vol. 28, 2013

Recent Article