Author : Payal Chhabra 1
Date of Publication :15th October 2019
Abstract: To manage characterization for huge information, information or data filtering and cleansing are preferred as preprocessing steps. For the most part it evacuate noisy, errors and conflicted data and results misclassification. In this paper, we performed examination of misclassified data and recognize how much information is should be redressed to get important data. To exhibit this idea, we have utilized AirTrafficDataset from Statistical Computing Statistical Graphics to analyze misclassified content in informational index. Two directed classifiers are used: Support vector Machine and decision tree. The results shows that out of these classifiers, SVM classify 85% of the data correctly and only 15% of data has misclassification.
- Villars, Richard L., Carl W. Olofson, and Matthew Eastwood. "Big data: What it is and why you should care." White Paper, IDC (2011)
- Bello-Orgaz, Gema, Jason J. Jung, and David Camacho. "Social big data: Recent achievements and new challenges." Information Fusion 28 (2016): 45-59.
- IBM, Big Data and Analytics, URL http://www01.ibm.com/software/data/bigdata/what-isbig-data.html (2015)
- Infographic, The Data Explosion in 2014 Minute by Minute, 2015. URL http://aci.info/2014/07/12/the-dataexplosion-in-2014-minute-by-minute-infographic
- Tole, Alexandru Adrian. "Big data challenges." Database Syst J 4, no. 3 (2013): 31-40.
- Herzig, Kim, Sascha Just, and Andreas Zeller. "It's not a bug, it's a feature: how misclassification impacts bug prediction." In Proceedings of the 2013 International Conference on Software Engineering, pp. 392-401. IEEE Press, 2013.
- Kochhar, Pavneet Singh, Tien-Duy B. Le, and David Lo. "It's not a bug, it's a feature: does misclassification affect bug localization?." In Proceedings of the 11th Working Conference on Mining Software Repositories, pp. 296-299. ACM, 2014.
- Labrinidis, Alexandros, and Hosagrahar V. Jagadish. "Challenges and opportunities with big data." Proceedings of the VLDB Endowment 5, no. 12 (2012): 2032-2033.
- Wu, Xindong, Xingquan Zhu, Gong-Qing Wu, and Wei Ding. "Data mining with big data." ieee transactions on knowledge and data engineering 26, no. 1 (2014): 97- 107.
- Fayyad, Usama M. "Data mining and knowledge discovery: Making sense out of data." IEEE Expert: Intelligent Systems and Their Applications 11, no. 5 (1996): 20-25.
- Caudill, Steven B., and Franklin G. Mixon. "Analysing misleading discrete responses: A logit model based on misclassified data." Oxford Bulletin of Economics and Statistics 67, no. 1 (2005): 105-113.
- Brodley, Carla E., and Mark A. Friedl. "Identifying mislabeled training data." Journal of Artificial Intelligence Research 11 (1999): 131-167.
- Miranda, André LB, Luís Paulo F. Garcia, André CPLF Carvalho, and Ana C. Lorena. "Use of classification algorithms in noise detection and elimination." In International Conference on Hybrid Artificial Intelligence Systems, pp. 417-424. Springer Berlin Heidelberg, 2009.
- Van den Hout, Ardo, and Peter GM Van der Heijden. "The analysis of multivariate misclassified data with special attention to randomized response data." Sociological Methods & Research 32, no. 3 (2004): 384- 410.
- Cortes, Corinna, and Vladimir Vapnik. "Supportvector networks." Machine learning 20, no. 3 (1995): 273- 297.
- O. Okun, G. Valentini, (Eds.), Supervised and Unsupervised Ensemble Methods and their Applications Studies in Computational Intelligence, vol. 126, Springer, Heidelberg, 2008.
- Lior Rokach and Oded Maimon,IEEE Transaction On System, Man and Cybernetics Part C, Vol 1, No. 11, November Top Down Induction Of Decision Tree Classifier-A Survey,2002
- Kotsiantis, Sotiris B., I. Zaharakis, and P. Pintelas. "Supervised machine learning: A review of classification techniques." (2007): 3-24.
- Statistical Computing Statistical Graphics http://statcomputing.org/dataexpo/2009/the-data.html