Open Access Journal

ISSN : 2394-2320 (Online)

International Journal of Engineering Research in Computer Science and Engineering (IJERCSE)

Monthly Journal for Computer Science and Engineering

Open Access Journal

International Journal of Engineering Research in Computer Science and Engineering (IJERCSE)

Monthly Journal for Computer Science and Engineering

ISSN : 2394-2320 (Online)

Data Analysis and Machine Learning using PySpark

Author : Kavitha C 1 Athithya J 2 Dhinesh Kumar K 3 Mohamad Imran A 4

Date of Publication :22nd March 2018

Abstract: Data analysis and machine learning have the potential to become an integral part of every existing industry. Using collective data from different sources on a specific topic or issue, an extensive scientific analysis could be done to create models and patterns that enable us to predict future outcomes with a comfortable measure of accuracy. This paper focuses on using the predefined methods available in the PySpark API to conduct data analysis and create an efficient model to predict future outcomes.

Reference :

    1. Supervised Machine Learning: A Survey of Classification techniques” by S. B. Kostiantis.
    2. “Data Mining tasks Classification: Decision Tree Reconvery” and methods: by Ronny Kohavi and J. Ross Quinlan.
    3.  “ Applications of Machine Learning and rule Induction” by Pat Langely and Herbert A. Simon.
    4. “Data Mining: Practical Machine Learning tools and techniques” by Ian H. Witten, Ebbie frank, Mark A. Hall and Christopher J. Pal.
    5.  “Mllib: Machine Learning in Apache Spark” by Xiangrui Meng, Joseph Bradley

Recent Article