Date of Publication :12th October 2017
Abstract: Data becomes big data when its volume, variety, and velocity exceed the abilities of our systems architecture and algorithm. This paper discusses about three major sources of big data: machine generated data, people generated data and organization generated data, 6V’s of Big Data: volume, velocity, variety, valence, veracity and value along with we discussed the different variety of data: structured, semi-structured and un-structured data like sensor, images, PDF, CSV, JSON, RDMS, database, table data etc. out of which approximately 5% of available data is in structured form rest other data is in either unstructured or semi structured. Big data is facing lots of challenges due to volume, variety and other complexity in the data. Hadoop is the platform where we can find all our solution related to big data to store process and analysis purpose. The main objective of this paper to describe how Hadoop can solve different challenges of Big data by using HDFS (Hadoop distributed file System), Map Reduce and Hadoop Ecosystem components like Hive, Sqoop, HBase, Pig, spark, Flume, Kafka etc.
Reference :