Date of Publication :30th September 2020
Abstract: Sentiment analysis [6] has long been an important problem of study in NLP and machine learning, for finding public sentiments on products, brands or services. Previous approaches to sentiment analysis have included unsupervised learning, Naive Bayes classifiers and SVM. In this paper, we focus on sentiment analysis of movies using text reviews. Sentiment analysis can be a challenging problem to solve because our language is rather complex and a single word can have either positive or negative connotations based on the context. We will be using the Large Movie Review Dataset [4] given by Stanford AI lab, which is a binary sentiment classification dataset based on IMDB reviews of movies, and contains 50,000 reviews with a 50:50 train: test split. The objective is to classify a movie as good or bad, based on its text review. We will approach this problem using the Tf-Idf vector of the corpus and applying a deep learning model on top of it. This model achieved an accuracy of 90.7%, which is a significant improvement over the current approaches. Future extensions to this approach could include more powerful deep learning models like LSTM or GRU, which can extract even more contextual information.
Reference :
-
- Peter D Turney “Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews” July 2002 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (ACL). Philadelphia, pp. 417-424
- Bo Pang, Lillian Lee, Shivakumar Vaithyanathan, “Thumbs up? Sentiment Classification using Machine Learning Techniques”. Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP 2002), Association for Computational Linguistics, July 2002, pp. 79–86. doi:10.3115/1118693.1118704
- Minqing Hu and Bing Liu. 2004. Mining and Summarizing Customer Reviews. In Proceedings of the tenth acm sigkdd international conference on knowledge discovery and data mining. https://www.cs.uic.edu/~liub/publicatio ns/kdd04-revSummary.pdf
- Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. 2011. Learning Word Vectors for Sentiment Analysis. The 49th Annual Meeting of the Association for Computational Linguistics (ACL 2011). http://ai.stanford.edu/~amaas/papers/wv Sent_acl2011.pdf
- Shahzad Qaiser and Ramsha Ali. Text Mining: Use of TF-IDF to Examine the Relevance of Words to Documents. International Journal of Computer Applications 181(1):25-29, July 2018.
- Mika V. Mäntylä, Daniel Graziotin, Miikka Kuutila, The evolution of sentiment analysis— A review of research topics, venues, and top cited papers, Computer Science Review, Volume 27, February 2018, Pages 16-32, ISSN 1574-0137
- Dr. S.Vijayarani and Ms. R.Janani. TEXT MINING: OPEN SOURCE TOKENIZATION TOOLS – AN ANALYSIS, Advanced Computational Intelligence: An International Journal (ACII), Vol.3, No.1, January 2016
- Jivani, Anjali. (2011). A Comparative Study of Stemming Algorithms. Int. J. Comp. Tech. Appl.. 2. 1930-1938.
- NLTK documentation: https://www.nltk.org/
- Scikit Learn documentation: https://scikitlearn.org/stable/
- Keras documentation: https://keras.io/
- Skapura, David M. Building Neural Networks. Menlo Park, CA: Addison-Wesley Publishing Company, 1996