Author : Prashant Chaturvedi 1
Date of Publication :7th April 2016
Abstract: DNA Sequencing is a process where we determine and identify every single DNA base and element that is in the genome of an individual. There are six billion of those in every normal cell in every person. When we apply DNA sequencing in the Cancer project. We first figure out what those six billion DNA bases are in the normal cells in that person and then we take some of the tumor in that person and figure out what the DNA bases are in the tumor. In this paper, we discuss impediments and future works about Hadoop in bioinformatics. We study the Map Reduce algorithm from algorithm lay by point and demonstrate the appropriates of our approach by tracing and analyzing efficient Map Reduce algorithms for sorting and simulation problem of parallel algorithms specified in the help of pigeonhole principle. The big data is a great computational challenge to statistical analysis of DNA big data. We can get general statistical analysis through R language. After studying the survey paper various approaches of using GPU and Map Reduce. We adopted the best solution to using R with Map Reduce. An R package is created to shift a set of critical R functions on GPU card. It allows users to run R code with GPU spread that enable much faster large data set of computation.
Reference :
-
- Izzat Alsmadi and Maryam Nuser, String Matching Evaluation Methods for DNA Comparison, Vol. 47, October, 2012, International Journal of Advanced Science and Technology
- LI Xu-bin, JIANG Wen-rui, JIANG Yi, ZOU Quan*, “Hadoop Applications in Bioinformatics”, 2012 7th Open Cirrus,Summit,978-0-7695-4908- 8/12$26.00 © 2012 IEEE DOI 10.1109/OCS.2012.40
- Gang Liao, Longfei Ma, Guangming Zang, Lin Tang, Parallel DC3 Algorithm for Suffix Array Construction on Many-core Accelerators
- Aryan Arbabi, Milad Gholami, Mojtaba Varmazyar, Shervin Daneshpajouh, Fast CPU-Based DNA Exact Sequence Aligner, 978-1-4673-1313-1/12/$31.00 ©2012 IEEE
- Da Li, Michela Becchi, Multiple Pairwise SequencesAlignments with Needleman-Wunsch Algorithm on GPU
- Chad Nelson, Kevin Townsend, Bhavani Satyanarayana Rao, Phillip Jones, Joseph Zambreno, Shepard: A Fast Exact Match Short Read Aligner
- *Sophie Schbath,1 *VE´ Ronique Martin,1 Matthias Zytnicki,2 Julien Fayolle,1 Valentin Loux,1 and Jean-Franc¸ OIS Gibrat1, Mapping Reads on a Genomic Sequence: An Algorithmic Overview and a Practical Comparative Analysis, Journal of computational biology, Volume 19, Number6,2012# Mary Ann Liebert, Inc. Pp.796–813, DOI:10.1089/cmb.2012.0022
- Manber, “Finding similar files in a large file system [C/OL]”, In: Proceedings of the Winter USENIX Conference, (1994), pp. 1-10.
- Wei Wang, Juan Liu* “Distinguishing SingleStranded and Double-Stranded DNA binding Proteins Based on Structural Information”, 978-1-4799-1310- 7/13/$31.00 ©2013 IEEE, 2013 IEEE International Conference on Bioinformatics and Biomedicine.
- Ka Kit Lam and Nihar B. Shah, Towards Computation, Space, and Data Efficiency in de novo DNA Assembly: A Novel Algorithmic Framework.
- Gang Liao, Qi Sun, Longfei Ma, Zhihui Qin, GPUAccelerated Multiple Deoxyribose Nucleic Acid Sequence Parallel Matching , arXiv:1303.3692v1 [cs.DS] 15 Mar 2013
- Wei-Chun Chung †‡, Yu-Jung Chang , D. T. Lee ‡§, Jan-Ming Ho †, Using Geometric Structures to Improve the Error Correction Algorithm of HighThroughput Sequencing Data on MapReduce Framework, 2014 IEEE International Conference on Big Data, 978-1-4799-5666-1/14/$31.00 ©2014 IEEE
- LI Xu-bin, JIANG Wen-rui, JIANG Yi, ZOU Quan*, Hadoop Applications in Bioinformatics, 2012 7th Open Cirrus Summit, IEEE Computer Society, 978-0-7695- 4908-8/12 $26.00 ©2012 IEEE, DOI 10.1109/OCS.2012.40
- Cancer Genomics: What Does It Mean for You?, The Cancer Genome Atlas (TCGA), NIH Publication no. 10-7556 Printed July-2010
- Snehal P. Adey, Dr. Vandana Inamdar, GPU Accelerated Pattern Matching Algorithm for DNA Sequences to Detect Cancer using CUDA,Department of Computer Engineering and Information Technology