Please find details forAssignment 4(Total of 2 data files + PDF with details about the assignment itself). The first file (PDF) details the question/s. Theupstream regions fileis bzipped so you will...

1 answer below »
Please find details forAssignment 4
(Total of 2 data files + PDF with details about the assignment itself). The first file (PDF) details the question/s. Theupstream regions fileis bzipped so you will have to unzip it before using while thecounts matrix is being uploaded as a text file
. These files are all in space/tab-delimited text format. As always I am expecting the scripts to be submitted so that when they are run with these files in the current directory with their names as such I have the desired output from your script/program.


Assignment 4 • In motif finding, a weight matrix (also referred to as Position Weight Matrix or Position Specific Weight Matrix) is defined as the log-odds matrix whose elements are defined as W (b, j) = log [ F’ (b, j) / F(b, o) ] Where b corresponds to the base and j is the index accounting for the number of bases in the motif. F’ (b, j) corresponds to the frequency with which each base occurs at a specified position and can easily be calculated from the counts matrix after adjusting for zero values (see below). F(b, o) corresponds to the background frequency with which a particular base is known to occur and can be assumed to be 0.25 for all bases at all positions in the motif. A transcription factor argR is known to bind to a motif which can be represented with the following counts matrix built from a total of 27 binding sites documented in the literature (the counts matrix is attached as a text file which should be used by your program). a | 8 12 21 9 4 2 21 21 3 10 8 5 7 25 4 2 2 25 c | 7 4 1 6 2 3 3 3 1 0 2 0 7 0 3 3 24 0 g | 3 2 1 8 2 21 2 2 0 1 0 1 0 1 0 15 0 2 t | 9 9 4 4 19 1 1 1 23 16 17 21 13 1 20 7 1 0 Now write a script/program to compute the frequency matrix F( b, j) using the above counts matrix. Since log odds matrix is based on frequency matrix, to avoid taking logarithm of 0 in computing it, a revised F’ (b, j) can be computed by augmenting all the base counts in counts matrix by 1 thereby artificially increasing the number of sites to 31 (put another way, a pseudocount of +1 is added to each of the real counts for each base at each position, which increases the total counts at each position in the matrix to 31). Based on this notion, compute the F’ (b, j) as well in the same script/program. • Now use the weight matrix to scan and identify the binding sites in the attached set of upstream regulatory regions of genes by filtering to those with highest similarity to the PSM i.e, your program should predict and show only the top 30 scoring gene ids corresponding to these sequences. Upstream regulatory regions of genes defined as 400 bases upstream and 50 bases after the translational start site are provided in the fasta nucleotide format along with information about the gene id to which it corresponds to. a|8122194221213108572542225 c|7416233310207033240 g|32182212201010101502 t|9944191112316172113120710
Answered Same DayMar 30, 2021

Answer To: Please find details forAssignment 4(Total of 2 data files + PDF with details about the assignment...

Yogesh answered on Apr 03 2021
136 Votes
import math as m
from collections import Counter
def freq_mat_gen(a_cMat, c_cMat, g_cMat, t_cMat, a_fMat, c_fMat, g_fMat, t_fMat):
    
    div_fac = a_cMat[0] +
c_cMat[0] + g_cMat[0] + t_cMat[0]
    #print(div_fac)
    
    for i in a_cMat:
        j = i/div_fac
        j = round(j, 3)
        a_fMat.append(j)
    #print(a_fMat)
        
    for i in c_cMat:
        j = i/div_fac
        j = round(j, 3)
        c_fMat.append(j)
    #print(c_fMat)
        
    for i in g_cMat:
        j = i/div_fac
        j = round(j, 3)
        g_fMat.append(j)
    #print(g_fMat)
        
    for i in t_cMat:
        j = i/div_fac
        j = round(j, 3)
        t_fMat.append(j)
    #print(t_fMat)
    
    matrix = []
    matrix.append(a_fMat)
    matrix.append(c_fMat)
    matrix.append(g_fMat)
    matrix.append(t_fMat)
    ### For printing the matrix
    for i in range(4):
        for j in range(18):
            print(matrix[i][j], end = "\t")
        print('\n')
    print('')
def Freq_Mat_1():
    global a_fMat1, c_fMat1, g_fMat1, t_fMat1
    a_fMat1 = []
    c_fMat1 = []
    g_fMat1 = []
    t_fMat1 = []
    
    print("\tFreq. Matrix - F(b, j)")
    freq_mat_gen(a_cMat1, c_cMat1, g_cMat1, t_cMat1, a_fMat1, c_fMat1, g_fMat1, t_fMat1)
    
def pseudocount(a_cMat, c_cMat, g_cMat, t_cMat):
    global a_cMat2, c_cMat2, g_cMat2, t_cMat2
    a_cMat2 = []
    c_cMat2 = []
    g_cMat2 = []
    t_cMat2 = []
    
    for i in a_cMat:
        j = i +1
        a_cMat2.append(j)
        
    for i in c_cMat:
        j = i +1
        c_cMat2.append(j)
        
    for i in g_cMat:
        j = i +1
        g_cMat2.append(j)
        
    for i in t_cMat:
        j = i +1
        t_cMat2.append(j)
        
def Freq_Mat_2():
    global a_fMat2, c_fMat2, g_fMat2, t_fMat2
    a_fMat2 = []
    c_fMat2 = []
    g_fMat2 = []
    t_fMat2 =...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here