Low quality data can have tremendous down-stream implications if it is not dealth with early. A common technique to mitigate the effects of low quality data is to 'mask' it. To mask the data means to...

1 answer below »

Low quality data can have tremendous down-stream implications if it is not dealth with early. A common technique to mitigate the effects of low quality data is to 'mask' it. To mask the data means to change the base calls to some predefined letter that is known to represent data to be ignored.


The FASTQ file format has both DNA base call and quality information for each sequence in the file. The length of the sequence and the length of the quality information are the same and with some simple programming, the quality value for each base of a sequence can be found. Write a Python program that does the following;



  1. Outputs to the screen the sequence reads in FASTA format with low quality bases changed to an 'N'

  2. The quality cutoff to define an 'N' is entered by the user

  3. The FASTQ file to mask is entered by the user

HINT:from Bio import SeqIO
sIO = SeqIO.parse('ecoli.fastq','fastq')for record in sIO: qualities = record.letter_annotations['phred_quality'] print(qualities)
Answered Same DayOct 25, 2021

Answer To: Low quality data can have tremendous down-stream implications if it is not dealth with early. A...

Karthi answered on Oct 25 2021
117 Votes
# Fasta file which is downloaded is printed on the screen with using below code.
seq=raw_input('E
nter FASTA seq. file name:')
fh = open(seq,'r')
#line=fh.read()
#print line
line = fh.readline()
meta = ''
sequence=''
while line:
line=line.rstrip('/n')
if '>' in line:
meta=line
else:
sequence=sequence+line
line=fh.readline()
print meta
print sequence
# Everything that Bio.SeqIO can be read.
from Bio import SeqIO
human, mouse, rat, the other organisms = tuple(SeqIO.parse("example.fasta", "fasta"))
import Bio
import sys
import os
import random
import unittest
from Bio import Entrez
from Bio import ExPASy
from Bio import SeqIO
from Bio import SwissProt
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
from Bio.Alphabet import generic_rna, generic_dna, generic_protein
from...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here