Open Access Open Access  Restricted Access Subscription or Fee Access

A Statistical Method for Analyzing Low Quality Scores in DNA Sequencing Reads

Sangharsh Saini

Abstract


The exponential growth of new DNA sequencing technologies is changing biological sciences by allowing scientific investigators to sequence large amounts raw DNA bases previously requiring a major genome sequencing efforts. Next-generation sequencing produces much higher output with significantly lower cost, because of the millions of reactions running in parallel and much smaller reaction volumes [1]. These new Techniques come with unmatched amount of data - but this sequencing data comes with errors. A better knowledge of the error profiles is essential for sequence analysis and absolutely necessary in order to make substantial decisions [19]. Unterminated bases in sequencing cycles have been reported to be the major source of errors. In this paper we perform an analysis on sequencing reads data from a real human being for sequence quality scores. Here, we compute quality scores and detect low quality clusters in DNA sequencing reads and produce a graphical analysis. We also infer the factors that lead to the presence of many low quality clusters in the sample. This statistical analysis allows us to study and compare various errors introduced by different next generation sequencers. Having the ability to analyze error profiles for sequencing reads has the potential to significantly enhance our ability to perform accurate sequence analysis.


Keywords


Next Generation Sequencing, DNA Bases, Sequencing Errors, Quality Scores, Base Caller, Sequencing Reads.

Full Text:

PDF

References


Barba M, Czosnek H, Hadidi A. Historical Perspective, Development and Applications of Next-Generation Sequencing in Plant Virology. Viruses. 2014;6(1):106-136. doi: 10.3390/v6010106.

https://www.coursera.org/course/ads1

Besaratinia A, Li H, Yoon J-I, Zheng A, Gao H, Tommasi S. A high-throughput next-generation sequencing-based method for detecting the mutational fingerprint of carcinogens. Nucleic Acids Research. 2012; 40(15):e116. doi:10.1093/nar/gks610.

https://class.coursera.org/comparinggenomes-001

https://www.coursera.org/course/algobioprogramming

Knief C. Analysis of plant microbe interactions in the era of next generation sequencing technologies. Frontiers in Plant Science. 2014; 5:216. doi:10.3389/fpls.2014.00216.

Anderson MW, Schrijver I. Next Generation DNA Sequencing and the Future of Genomic Medicine. Genes. 2010; 1(1):38-69. doi: 10.3390/genes1010038.

Androniki Menelaou and Jonathan Marchini Genotype calling and phasing using next-generation sequencing reads and a haplotype scaffold Bioinformatics (2013) 29 (1): 84-91 October 23, 2012 doi:10.1093/bioinformatics/bts632

http://rosalind.info/problems/list-view/?location=bioinformatics-textbook-track

Weichun Huang, Leping Li, Jason R. Myers, and Gabor T. Marth ART: a next-generation sequencing read simulator Bioinformatics (2012) 28 (4): 593-594 December 23, 2011 doi:10.1093/bioinformatics/btr708

https://stepic.org/course/Bioinformatics-Algorithms-2/syllabus

Heng Li BFC: correcting Illumina sequencing errors Bioinformatics (2015) 31 (17): 2885-2887 May 6, 2015 doi:10.1093/bioinformatics/btv290

Robert C. Edgar and Henrik Flyvbjerg Error filtering, pair assembly and error correction for next-generation sequencing reads Bioinformatics first published online July 2, 2015 doi:10.1093/bioinformatics/btv401

Yun Heo, Xiao-Long Wu, Deming Chen, Jian Ma, and Wen-Mei Hwu BLESS: Bloom filter-based error correction solution for high-throughput sequencing reads Bioinformatics (2014) 30 (10): 1354-1362 January 21, 2014 doi:10.1093/bioinformatics/btu030

http://theory.bio.uu.nl/BPA/2015/

Leena Salmela Correction of sequencing errors in a mixed set of reads Bioinformatics (2010) 26 (10): 1284-1290 April 8, 2010 doi:10.1093/bioinformatics/btq151

Michiaki Hamada, Edward Wijaya, Martin C. Frith, and Kiyoshi Asai Probabilistic alignments with quality scores: an application to short-read mapping toward accurate SNP/indel detection Bioinformatics (2011) 27 (22): 3085-3092 October 5, 2011 doi:10.1093/bioinformatics/btr537.

Raymond Wan, Vo Ngoc Anh, and Kiyoshi Asai Transformations for the compression of FASTQ quality scores of next-generation sequencing data Bioinformatics (2012) 28 (5): 628-635 December 13, 2011 doi:10.1093/bioinformatics/btr689

Schirmer, Melanie, Linda D'Amore, Neil Hall, and Christopher Quince. "Error profiles for Next Generation sequencing technologies." EMBnet. journal 19, no. A (2013): pp-81.


Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.