Biofrontiers: Intro to Python (BIOL 282/382) Exercises: 1. Write a script that lets you play Blackjack against the computer. In real world casinos, card dealers have a set of instructions to ensure...

1 answer below »
pick 3 questions from #4 to #10


Biofrontiers: Intro to Python (BIOL 282/382) Exercises: 1. Write a script that lets you play Blackjack against the computer. In real world casinos, card dealers have a set of instructions to ensure the casino has an advantage. You can consult the rules for Blackjack and the behaviour of the dealer in Wikipedia: https://en.wikipedia.org/wiki/Blackjack#Rules 2. Let’s imagine we have a University course whose evaluation consists on: a) 10 exercises. Each exercise is worth 1 point, and the evaluation can only be positive or negative. Internally, each exercise has a value of 0 to 10, with a threshold for passing/failing (probably score equal or greater than 6.0). The sum of the score on all exercises represents 60% of the total score. b) An exam, with a score from 0 to 10, that represents a 40% of the total score. The score in the test might have decimals. c) The teacher own bias towards the students, which might or might not round up the final score. For example, if the final score would be 6.89, the teacher might round it up to 7.00. This bias also affects the exercises, but not the result of the exam. Each student has 4 attributes associated to them: Effort, Luck, Charisma and a fourth parameter of your liking. Your simulation must consider the following: a) Effort affects the probability of succeeding each of the exercises and lowers the minimum sore in the test. b) Luck can be positive or negative (good or bad luck). For each of the exercises and for the test, check if luck is involved. If good luck is involved, simulate the score twice and get the best. If bad luck is involved, get the worst result. c) Charisma only affects the probability of the teacher rounding the score up. d) You decide what is the fourth parameter and its effect. You might decide that the fourth value represents the ability of the student to cheat, which gives an advantage to the test and the exercises, but has a probability of failing automatically the whole course. Or you might represent how stressed the student is, which might translate in a probability of the student dropping some exercises or a lower upper cap to the test score. Write a script that lets you introduce a name and a value for each of these parameters and simulates the scores. Return the results of the exercises and the test in numeric values, while the final score must be returned with a standard american letter score value. I swear I will not use your scripts to calculate your scores. 3. Write a brief explanation of the parameters used in exercise 2. Describe the type of variable you used, the range of values they can take and how they work within the simulation. Investigate the relative role of each of the parameters as you see fit, using simulated data. Plot your findings. 4. Write a script that takes a Uniprot fasta file as input (A), and any number of Uniprot fasta files (B). The script must: a) Write each protein in A in its own fasta file and its own directory. The name of the file must be the protein ID + “.fasta”, and the name of the directory must be the protein ID. https://en.wikipedia.org/wiki/Blackjack#Rules Zhuoyun Zhuoyun Zhuoyun Zhuoyun Zhuoyun b) Each directory must contain a file named protein ID + “.job”. This file must include one line per fasta sequence in B, and the line must be a functional blast command that takes the protein ID+”fasta” file as query, each one of the fastas in B as database and outputs the results in the same directory. c) Create directories that will hold up to 1000 of the previously described directories. The names of these directories must describe the number of files it contains. For example, for a fasta with 2732 proteins, the first directory might be called “1-1000”, the second “1001-2000” and the third “2001-2732”. 5. Write a script that takes a PDB identifier as input and makes a Ramachandran plot (https://en.wikipedia.org/wiki/Ramachandran_plot) for the protein. If the protein has more than one chain, represent all chains in the plot. Do not represent chains that are not polypeptides. Put the original general Ramachandran plot as background picture in your plot. 6. Write a script that takes a Uniprot or Uniref multifasta as input and generates a 3D scatter plot. Highlight proteins labeled as “DNA-binding”. Each dot represents: a) Molecular weigth of the protein b) Isoelectric point c) Instability index 7. Write a script that takes two taxaID. The script must return the most recent common taxon shared by the two. If possible, return the Genus, Family and Order each of the taxaIDs belong to. For example, if the two taxaIDs are 3702 (Arabidopsis thaliana) and 3712 (Brassica oleracea), the most recent common ancestor would be 3700 (Brassicaceae). For A. thaliana, the genus is Arabidopsis, the family is Brassicaceae and the order is Brassicales; and for B. oleracea the genus is Brassica, the family is Brassicaeae and the order is Brassicales. The script must work for any kind of organism. HINT: Nomenclatural codes assign suffixes that are nearly universal for the different ranks. 8. Write a script that takes a lists of geographical coordinates of length equal or greater than 3 and an additional coordinate. It will return True if the second input is a coordinate that is within the geographical range defined by the coordinates in the list; and False otherwise. Take Earth’s curvature into consideration. HINT: Angles 9. Write a script that takes a Uniprot proteome and an aminoacid in IUPAC one-letter symbol (e. g. Alanine would be A). It calculates the percent of sequence that that aminacid represents for each of the proteins in the fasta and returns two lists of proteinIDs, corresponding to the 20th length percentile and the 80th percentile. Calculate how many amino acids there are in the top 20% longest proteins and if that number is higher than the number of amino acids of the remaining 80%. 10.Write a script that takes a Uniprot proteome and a file containing a subset of Uniprot Ids from the same proteome, one per line. Plot the fraction of cysteines, as well as acid, basic and hydrophobic amino acids in the whole proteome and the subset in a way it’s easy to visualize differences in the two sets. Plot everything in the same figure. https://en.wikipedia.org/wiki/Ramachandran_plot Zhuoyun
Answered Same DayFeb 16, 2021

Answer To: Biofrontiers: Intro to Python (BIOL 282/382) Exercises: 1. Write a script that lets you play...

Sandeep Kumar answered on Feb 17 2021
130 Votes
bio/exercise 4/exercise4.py
# -*- coding: utf-8 -*-
"""
author: [email protected]
Place the Uniprot fasta file in the same
folder as the script.
"""
from Bio import SeqIO
from Bio import Blast
import os
import sys
def main(my_fasta, datafiles = None):
""""Please provide the name of a
Uniprot fasta file as input (A) and any number of Uniprot
fasta files (B) as a list.
"""
basedir = os.getcwd() # useful for later
# check that the file exists
if not os.path.isfile(my_fasta):
print("File not found")
sys.exit()

############ PART C: output folders #########
folderslist = []

count = 0
fasta_file = SeqIO.parse(open(my_fasta),'fasta')
with open(my_fasta, mode= 'r'):
for fasta in fasta_file:
count +=1

fullfolders = count //1000
for i in range(fullfolders):
# yes this gives you leading zeros but its comphrehensible
os.mkdir(f"{i}001 - {i+1}000")
folderslist.append(f"{i}001 - {i+1}000")
#last folder
if count%1000 != 0:
zerocount = str(0) * (3- len(str(2777%1000)))
os.mkdir(f"{fullfolders}001 - {fullfolders}{zerocount}{count%1000}")
folderslist.append(f"{fullfolders}001 - {fullfolders}{zerocount}{count%1000}")
if...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here