Open Access Journal

ISSN : 2394-2320 (Online)

International Journal of Engineering Research in Computer Science and Engineering (IJERCSE)

Monthly Journal for Computer Science and Engineering

Open Access Journal

International Journal of Engineering Research in Computer Science and Engineering (IJERCSE)

Monthly Journal for Computer Science and Engineering

ISSN : 2394-2320 (Online)

Information Extraction Using by Big Data

Author : Anuradha D.Biradar 1 Suraj Shivaji Bhoite 2

Date of Publication :20th March 2018

Abstract: The promise of data-driven decision-making is now being recognized broadly, and there is growing enthusiasm for the notion of ``Big Data.’’ While the promise of Big Data is real -- for example, it is estimated that Google alone contributed 54 billion dollars to the US economy in 2009 -- there is currently a wide gap between its potential and its realization. Heterogeneity, scale, timeliness, complexity, and privacy problems with Big Data impede progress at all phases of the pipeline that can create value from data. The problems start right away during data acquisition, when the data tsunami requires us to make decisions, currently in an ad hoc manner, about what data to keep and what to discard, and how to store what we keep reliably with the right metadata. Much data today is not natively in structured format; for example, tweets and blogs are weakly structured pieces of text, while images and video are structured for storage and display, but not for semantic content and search: transforming such content into a structured format for later analysis is a major challenge. The value of data explodes when it can be linked with other data, thus data integration is a major creator of value. Since most data is directly generated in digital format today, we have the opportunity and the challenge both to influence the creation to facilitate later linkage and to automatically link previously created data. Data analysis, organization, retrieval, and modeling are other foundational challenges. Data analysis is a clear bottleneck in many applications, both due to lack of scalability of the underlying algorithms and due to the complexity of the data that needs to be analyzed. Finally, presentation of the results and its interpretation by non-technical domain experts is crucial to extracting actionable knowledge. During the last 35 years, data management principles such as physical and logical independence, declarative querying and cost-based optimization have led, during the last 35 years, to a multi-billion dollar industry. More importantly, these technical advances have enabled the first round of business intelligence applications and laid the foundation for managing and analyzing Big Data today. The many novel challenges and opportunities associated with Big Data necessitate rethinking many aspects of these data management platforms, while retaining other desirable aspects. We believe that appropriate investment in Big Data will lead to a new wave of fundamental technological advances that will be embodied in the next generations of Big Data management and analysis platforms, products, and systems. We believe that these research problems are not only timely, but also have the potential to create huge economic value in the US economy for years to come. However, they are also hard, requiring us to rethink data analysis systems in fundamental ways. A major investment in Big Data, properly directed, can result not only in major scientific advances, but also lay the foundation for the next generation of advances in science, medicine, and business.

Reference :

    1. CHIANG, A. C., AND WAINWRIGHT, K. Fundamental methods of mathematical economics, 4. ed., internat. ed., [repr.] ed. McGraw-Hill [u.a.], Boston, Mass. [u.a.], 2009.
    2. COHEN, R. Defining Elastic Computing, September 2009. http://www.elasticvapor.com/2009/09/definingelastic-computing.html, last consulted Feb. 2013.
    3. DUBOC, L. A Framework for the Characterization and Analysis of Software Systems Scalability. PhD thesis, Department of Computer Science, University College London, 2009. http://http: //discovery.ucl.ac.uk/19413/1/19413. pdf.
    4. DUBOC, L., ROSENBLUM, D., AND WICKS, T. A Framework for Characterization and Analysis of Software System Scalability. In Proceedings of the 6th joint meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering (ESEC-FSE ’07) (2007), ACM, pp. 375–384.
    5. HERBST, N. R. Quantifying the Impact of Configuration Space for Elasticity Benchmarking. Study thesis, Faculty of Computer Science, Karlsruhe Institute of Technology (KIT), Germany, 2011. http://sdqweb.ipd.kit.edu/publications/pdfs/Herbst2011a. pdf.
    6. JOGALEKAR, P., AND WOODSIDE, M. Evaluating the scalability of distributed systems. IEEE Transactions on Parallel and Distributed Systems 11 (2000), 589–603.
    7. KEOGH, E., AND RATANAMAHATANA, C. A. Exact indexing of dynamic time warping. Knowl. Inf. Syst. 7, 3 (Mar. 2005), 358–386.
    8. MELL, P., AND GRANCE, T. The NIST Definition of Cloud Computing. Tech. rep., U.S. National Institute of Standards and Technology (NIST), 2011. Special Publication 800-145, http://csrc.nist.gov/publications/nistpubs/800-145/SP800- 145.pdf.
    9. OCDA. Master Usage Model: Compute Infratructure as a Service. Tech. rep., Open Data Center Alliance (OCDA), 2012. http://www.opendatacenteralliance.org/docs/ODCA_ Compute_IaaS_MasterUM_v1.0_Nov2012.pdf.
    10. PLUMMER, D. C., SMITH, D. M., BITTMAN, T. J., CEARLEY, D. W., CAPPUCCIO, D. J., SCOTT, D., KUMAR, R., AND ROBERTSON, B. Study: Five Refining Attributes of Public and Private Cloud Computing. Tech. rep., Gartner, 2009. http://ww.gartner.com/DisplayDocument?doc_ cd=167182, last consulted Feb. 2013.
    11. SCHOUTEN, E. Rapid Elasticity and the Cloud, September 2012.http://thoughtsoncloud.com/index.php/

Recent Article