Open Access Journal

ISSN : 2394-2320 (Online)

International Journal of Engineering Research in Computer Science and Engineering (IJERCSE)

Monthly Journal for Computer Science and Engineering

Open Access Journal

International Journal of Engineering Research in Computer Science and Engineering (IJERCSE)

Monthly Journal for Computer Science and Engineering

ISSN : 2394-2320 (Online)

A Review of Document Classification Techniques

Author : Harsimran Pal Kaur 1 Kamaldeep Kaur 2

Date of Publication :7th June 2016

Abstract: Now-a-days, vast amount of data is generated daily on the web. Data on the web is in the form of e-journals, enewspapers, web pages etc. The most of the data is in the form of English language. So, different techniques are developed for management of English documents. But due to the development of regional languages the data in regional languages is also available in valuable amount. The vast amount of data causes problems in its management. So, Automatic classification of documents raises much more attention in few decades. Automatic classification is the task of assigning predefined category to the unlabelled documents. This gives the class to the document to which it actually belongs. Automatic classification frees the organizations to handle the large amount of documents manually and enhance the retrieval process. It can be concluded that number of classification techniques are available. But for different data length same technique gives different accuracy.

Reference :

    1. S. Fabrizio, “Machine learning in automated text categorization,” ACM computing surveys (CSUR), vol. 34, no. 1, pp. 1–47, 2002.
    2. N. Lyfenko, “Automatic classification of documents in a natural language: A conceptual model,” Automatic Documentation and Mathematical Linguistics, vol. 48, no. 3, pp. 158–166, 2014.
    3. A. Chopra, A. Prashar, and C. Sain, “Natural language processing,” International Journal of Technology Enhancements and Emerging Engineering Research, vol. 1, no. 4, pp. 131–134, 2013.
    4. Nidhi and V. Gupta, “Recent trends in text classification techniques,” International Journal of Computer Applica- tions (0975 8887) Volume, vol. 35, 2014.
    5. H. S. Christopher D. Manning, Prabhakar Raghavan, An Introduction to Information Retrieval, 2009.
    6. C. C. Aggarwal, Data Classification Algorithms and Applications, 2015.
    7. B. Baharudin, L. H. Lee, and K. Khan, “A review of machine learning algorithms for text-documents classification,” Journal of advances in information technology, vol. 1, no. 1, pp. 4–20, 2010.
    8. S. Tan, “Neighbor-weighted k-nearest neighbor for unbalanced text corpus,” Expert Systems with Applications, vol. 28, no. 4, pp. 667–671, 2005.
    9. K. Q. Weinberger and L. K. Saul, “Distance metric learning for large margin nearest neighbor classification,” The Journal of Machine Learning Research, vol. 10, pp. 207–244, 2009.
    10. A. McCallum, K. Nigam et al., “A comparison of event models for naive bayes text classification,” in AAAI-98 workshop on learning for text categorization, vol. 752. Citeseer, 1998, pp. 41–48.
    11. W. M. HADI, T. FADI, AND H. ABDEL-JABER, “a comparative study using vector space model with knearest neighbor on text categorization data.” in world congress on engi- neering, 2007, pp. 296–300
    12. k. rajan, v. ramalingam, m. ganesan, s. palanivel, and b. palaniappan, “automatic classification of tamil documents using vector space model and artificial neural network,” expert systems with applications, vol. 36, no. 8, pp. 10 914–10 918, 2009.
    13. f. harrag and e. al-qawasmah, “improving arabic text categorization using neural network with svd.” jdim, vol. 8, no. 4, pp. 233–239, 2010.
    14. t. d. sanger, “optimal unsupervised learning in a single layer linear feedforward neural network,” neural networks, vol. 2, no. 6, pp. 459–473, 1989.
    15. l. manevitz and m. yousef, “one-class document classification via neural networks,” neurocomputing, vol. 70, no. 7, pp. 1466–1481, 2007.
    16. f. jensen, “an introduction to bayesian networks springer,” new york, 1996.
    17. j. chen, h. huang, s. tian, and y. qu, “feature selection for text classification with na¨ive bayes,” expert systems with applications, vol. 36, no. 3, pp. 5432– 5435, 2009.
    18. d. isa, l. l. hong, v. kallimani, and r. rajkumar, “text document pre-processing using the bayes formula for classification based on the vector space model,” computer and information science, vol. 1, no. 4, p. 79, 2008.
    19. t. joachims, “transductive inference for text classifica- tion using support vector machines,” in icml, vol. 99, 1999, pp. 200–209.
    20. a. sun, e. p. lim, and y. liu, “on strategies for imbalanced text classification using svm: a comparative study,” decision support systems, vol. 48, no. 1, pp. 191– 201, 2009.
    21. S. Tong and D. Koller, “Support vector machine active learning with applications to text classification,” The Journal of Machine Learning Research, vol. 2, pp. 45– 66, 2002.
    22. H. Drucker, D. Wu, and V. N. Vapnik, “Support vector machines for spam categorization,” Neural Networks, IEEE Transactions on, vol. 10, no. 5, pp. 1048–1054, 1999.
    23. a. chidanand and d. fred, “automated learning of decision rules for text categorization,” acm trans. inf. syst., vol. 12, no. 3, pp. 233–251, jul. 1994.
    24. M. N. Anyanwu and S. G. Shiva, “Comparative analysis of serial decision tree classification algorithms,” Interna- tional Journal of Computer Science and Security, vol. 3, no. 3, pp. 230–240, 2009.
    25. R. mihalcea, c. corley, and c. strapparava, ―corpus- based and knowledge-based measures of text semantic similarity,‖ in aaai, vol. 6, 2006, pp. 775–780.
    26. s. parseh and a. baraani, ―improving persian document classification using semantic relations between words,‖ arxiv preprint arxiv:1412.8147, 2014.
    27. m. a´ . corella and p. castells, “semi-automatic semanticbased web service classification,” in business process management workshops. springer, 2006, pp. 459– 470.
    28. A. Aizawa, “An information-theoretic perspective of tf– idf measures,” Information Processing & Management, vol. 39, no. 1, pp. 45–65, 2003.
    29. G. Salton, A. Wong and C.S. Yang, “A vector space model of automatic indexing”, Communication of the ACM, vol. 18, no. 11, pp. 615-620, 1975.
    30. P. Gawande and P. A. Suryawanshi, “Improving Web Page Classification by Vector Space Model,” International Journal of Innovative Research in Computer and Communication Engineering, vol. 2015, Apr. 2015. [Online].Available:http://www.rroij.com/abstract.php ?abstract id=44783.

Recent Article