Sequence Analysis and Phylogenetics, WS 2017/18, Assignment no. 7 Assignment no. 7 Sequence Analysis and Phylogenetics XXXXXXXXXX2UE) WS 2017/18 Exercise XXXXXXXXXXpoints) Implement the Smith-Waterman...

ReadScoring file for A7 and rest for A8.. But for A8 solutions of A7 is important


Sequence Analysis and Phylogenetics, WS 2017/18, Assignment no. 7 Assignment no. 7 Sequence Analysis and Phylogenetics 365.062 (2UE) WS 2017/18 Exercise 13 (16 points) Implement the Smith-Waterman algorithm with affine gap penalty as a Python function. Design the function in a general way, i.e. the sequences, the gap penalties, and the scoring matrix should be supplied as arguments. The function should not print any output, but return the final score and the best alignment. The main program should then print the score and the alignment to the standard output. The usage of your program should be the following: python ex13.py , for example: python ex13.py AQYR ARFF BLOSUM100.txt 7 3. Test your program with the following three pairs of amino acid sequences: a) 1: AYILFFEGWVCRMNQIVLRMFYALSTQELDSH 2: CRMNQIVLPMEVAL b) 1: AYILFFEGWVCRMMFYALTTQELDSHHAES 2: TLFGEGWVCRMNQI c) 1: AYTLFGEGWWCRMNQIVLPMEVALTIQELDSH 2: QILKMCYFDW each with at least two different choices of BLOSUMn and PAMn matrices as well as two different choices for the gap penalties (in total at least 8 tests per sequence pair). Interpret the resulting local alignments. You can obtain BLOSUMn and PAMn matrices from ftp://ftp.ncbi.nih.gov/blast/matrices/. Hints: • For reading scoring matrices, you can use the function ReadScoringMatrix.py (just copy the function into your program or import it). • The function does not need to find all best-scoring alignments, one is sufficient. Exercise 14 (4 additional points) If your program returns all best-scoring alignments, you get 4 additional points. ftp://ftp.ncbi.nih.gov/blast/matrices/ Exercise 15 (8 points) 1. Search an appropriate public database for protein sequences of phenylalanine hydrox- ylase of at least six different species. Indicate the sources from which you took the sequence data and document from which organisms they are. 2. Use your Python function from Exercise 13 to compute the best local alignments and the resulting scores of all pairs of these sequences. Choose an appropriate scoring matrix and decide about a gap opening and extension penalty. Important: Document and justify your choices. Elaborate on the following question: Do the scores reflect similarities of the species? Hand in a well documented exercise, that contains the sequences, sources, alignments, scores and parameters (scoring matrix, penalties). One major criterion for the grading of this exercise is reproducibility! Note: If your program from Exercise 13 does not work, use the following web-service for calculation of all pairwise local alignments and scores of the sequences: http://fasta.bioch.virginia.edu/fasta www2/fasta www.cgi by activating the option Compare sequences. The sequences have to be submitted in FASTA format. Hint: In Chapter 2 Bioinformatics Resources of the lecture notes you can find links to many appro- priate databases, e.g. http://www.uniprot.org. Submission: electronically via Moodle: https://moodle.jku.at/ Deadline: January 10th, 2018, 10:00 am. http://fasta.bioch.virginia.edu/fasta_www2/fasta_www.cgi http://www.uniprot.org https://moodle.jku.at/ Sequence Analysis and Phylogenetics, WS 2017/18, Assignment no. 8 Assignment no. 8 Sequence Analysis and Phylogenetics 365.062 (2UE) WS 2017/18 Exercise 16 (12 points) In 1990, Michael Crichton published the book Jurassic Park about the resurrection of dinosaurs using the blood from the stomachs of insects which had been encased in amber. At one point in the book, Dr. Henry Wu is asked to explain some of the DNA techniques used in reconstructing the extinct dinosaur genomes. Dr. Wu describes the use of restriction enzymes and how the fragmented pieces of dino DNA can be spliced together with these enzymes. He also alludes to the fact that they don’t have the entire genome but that they ”fill in the gaps” with modern day frog DNA. At one point during his discussion he points to a computer screen and remarks ”Here you see the actual structure of a small fragment of dinosaur DNA.” In 1992 Dr. Mark Boguski at NCBI entered this sequence into a text editor and searched all of the known DNA sequences at the time. Dr. Boguski wrote up his findings and submitted a manuscript to the journal BioTechniques, as a tongue-in-cheek joke. His manuscript was accepted and published. (Boguski, M.S. A Molecular Biologist Visits Jurassic Park. (1992) BioTechniques 12(5):668-669). You will reproduce this experiment using BLAST. (http://www.ncbi.nlm.nih.gov/blast/) Submit the ”dinosaur DNA” sequence you can find in the file dino1.fasta to a Nucleotide- nucleotide BLAST (blastn) search. How many of the top ten matches are artificial sequences? Identify any actual organisms in the top ten. Mark Boguski’s published article was brought to Crichton’s attention. In his second book, ”The Lost World”, Mr. Crichton used Dr. Boguski as a consultant. Dr. Boguski constructed an interesting sequence from existing species and also embedded a message in the protein translation of the DNA sequence which he submitted for use in the book. Once again, invoke Nucleotide-nucleotide BLAST (blastn) with the second ”dinosaur DNA” sequence you can find in the file dino2.fasta. Identify all organisms of the top ten matches. Are either of these organisms related to dinosaurs? Now use Translated query vs. protein database BLAST (blastx) with the same sequence and the Swiss-Prot data base. Look at the amino acid sequence of the query sequence aligned to the best hit. What is the hidden massage Dr. Boguski included in this sequence? Hand in a well documented exercise, that contains the sequences, sources, output alignments and scores and the parameters used for the algorithms. One major criterion for the grading of this exercise is reproducibility. Hint: Use the blastn and not the megablast option. ”PREDICTED” sequences count as hits. Submission: electronically via Moodle: https://moodle.jku.at/ Deadline: January 17th, 2018, 10:00 am. http://www.ncbi.nlm.nih.gov/blast/ https://moodle.jku.at/ >DinoDNA "Dinosaur DNA" from Crichton JURASSIC PARK p. 103 nt 1-1200 GCGTTGCTGGCGTTTTTCCATAGGCTCCGCCCCCCTGACGAGCATCACAAAAATCGACGC GGTGGCGAAACCCGACAGGACTATAAAGATACCAGGCGTTTCCCCCTGGAAGCTCCCTCG TGTTCCGACCCTGCCGCTTACCGGATACCTGTCCGCCTTTCTCCCTTCGGGAAGCGTGGC TGCTCACGCTGTACCTATCTCAGTTCGGTGTAGGTCGTTCGCTCCAAGCTGGGCTGTGTG CCGTTCAGCCCGACCGCTGCGCCTTATCCGGTAACTATCGTCTTGAGTCCAACCCGGTAA AGTAGGACAGGTGCCGGCAGCGCTCTGGGTCATTTTCGGCGAGGACCGCTTTCGCTGGAG ATCGGCCTGTCGCTTGCGGTATTCGGAATCTTGCACGCCCTCGCTCAAGCCTTCGTCACT CCAAACGTTTCGGCGAGAAGCAGGCCATTATCGCCGGCATGGCGGCCGACGCGCTGGGCT GGCGTTCGCGACGCGAGGCTGGATGGCCTTCCCCATTATGATTCTTCTCGCTTCCGGCGG CCCGCGTTGCAGGCCATGCTGTCCAGGCAGGTAGATGACGACCATCAGGGACAGCTTCAA CGGCTCTTACCAGCCTAACTTCGATCACTGGACCGCTGATCGTCACGGCGATTTATGCCG CACATGGACGCGTTGCTGGCGTTTTTCCATAGGCTCCGCCCCCCTGACGAGCATCACAAA CAAGTCAGAGGTGGCGAAACCCGACAGGACTATAAAGATACCAGGCGTTTCCCCCTGGAA GCGCTCTCCTGTTCCGACCCTGCCGCTTACCGGATACCTGTCCGCCTTTCTCCCTTCGGG CTTTCTCAATGCTCACGCTGTAGGTATCTCAGTTCGGTGTAGGTCGTTCGCTCCAAGCTG ACGAACCCCCCGTTCAGCCCGACCGCTGCGCCTTATCCGGTAACTATCGTCTTGAGTCCA ACACGACTTAACGGGTTGGCATGGATTGTAGGCGCCGCCCTATACCTTGTCTGCCTCCCC GCGGTGCATGGAGCCGGGCCACCTCGACCTGAATGGAAGCCGGCGGCACCTCGCTAACGG CCAAGAATTGGAGCCAATCAATTCTTGCGGAGAACTGTGAATGCGCAAACCAACCCTTGG CCATCGCGTCCGCCATCTCCAGCAGCCGCACGCGGCGCATCTCGGGCAGCGTTGGGTCCT
Jan 14, 2020
SOLUTION.PDF

Get Answer To This Question

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here