Submit your solution codes into a notebook file with “.ipynb” extension. Write discussions and explanations including outputs and figures into a separate file and submit as a PDF file. 1. What are the...

1 answer below »
Have attached data set , data description and instructions


Submit your solution codes into a notebook file with “.ipynb” extension. Write discussions and explanations including outputs and figures into a separate file and submit as a PDF file. 1. What are the differences between hyperparameter and parameter of a machine learning (ML) model. Explain your answer using at least two machine learning models that you have learned in this unit. 2. Prove that Elastic net can be used as either LASSO or Ridge regulariser. Background The recently started human and other genome projects are likely to change the situation of molecular biology. Comprehensive analyses of whole genomic sequences will enable us to understand the general mechanisms of how protein and nucleic acid functions are encoded in the sequence data. Dataset filename: yeast2vs4.csv Dataset description: There are 8 features and one target in the dataset. All the features are in a numerical format, and the target is in text format. For further information about the attributes, please read “Data Set Information.pdf”. 3. Analyse the importance of the features for predicting presence or absence of protein using two different approaches. Explain the similarity/difference between outcomes. 4. Create three supervised machine learning (ML) models except any ensemble approach for predicting presence or absence of protein. a. Report performance score using a suitable metric. Is it possible that the presented result is an overfitted one? Justify. b. Justify different design decisions for each ML model used to answer this question. c. Have you optimised any hyper-parameters for each ML model? What are they? Why have you done that? Explain. d. Finally, make a recommendation based on the reported results and justify it. 5. Build three ensemble models for predicting presence or absence of protein. a. When do you want to use ensemble models over other ML models? b. What are the similarities or differences between these models? c. Is there any preferable scenario for using any specific model among the set of ensemble models? d. Write a report comparing performances of models built in question 5 and 6. Report the best method based on model complexity and performance. e. Is it possible to build ensemble model using ML classifiers other than decision tree? If yes, then explain with an example. Microsoft Word - Data Set Information.docx Attribute Information: 1. mcg: McGeoch's method for signal sequence recognition. 2. gvh: von Heijne's method for signal sequence recognition. 3. alm: Score of the ALOM membrane spanning region prediction program. 4. mit: Score of discriminant analysis of the amino acid content of the N-terminal region (20 residues long) of mitochondrial and non-mitochondrial proteins. 5. erl: Presence of "HDEL" substring (thought to act as a signal for retention in the endoplasmic reticulum lumen). Binary attribute. 6. pox: Peroxisomal targeting signal in the C-terminus. 7. vac: Score of discriminant analysis of the amino acid content of vacuolar and extracellular proteins. 8. nuc: Score of discriminant analysis of nuclear localization signals of nuclear and non-nuclear proteins. 9. class: Presence or absence of protein {positive, negative}. Further information: Following are the articles that have used this dataset: 1. "Expert Sytem for Predicting Protein Localization Sites in Gram-Negative Bacteria", Kenta Nakai & Minoru Kanehisa, PROTEINS: Structure, Function, and Genetics 11:95-110, 1991. 2. "A Knowledge Base for Predicting Protein Localization Sites in Eukaryotic Cells", Kenta Nakai & Minoru Kanehisa, Genomics 14:897-911, 1992.
Answered 4 days AfterSep 19, 2022

Answer To: Submit your solution codes into a notebook file with “.ipynb” extension. Write discussions and...

Mukesh answered on Sep 21 2022
55 Votes
Submit your solution codes into a notebook file with “.ipynb” extension. Write discussions and explanations including outputs and figures into a separate file and submit as a PDF file.
1. What are the differ
ences between hyperparameter and parameter of a machine learning (ML) model. Explain your answer using at least two machine learning models that you have learned in this unit.
In a machine learning model, there are 2 types of parameters:
Model Parameters: These are the parameters in the model that must be determined using the training data set. These are the fitted parameters.
Hyperparameters: These are adjustable parameters that must be tuned in order to obtain a model with optimal performance.
LogisticRegression(C=1000.0, random_state=0)
Here, C is the inverse of regularization strength, and random_state is the seed of the pseudo random number generator to use when shuffling the data.
 Support Vector Machine Classifier
SVC(kernel='linear', C=1.0, random_state=0)
Here, kernel specifies the kernel type to be used in the algorithm, for example kernel = ‘linear’, for linear classification, or kernel = ‘rbf’ for non-linear classification. C is the penalty parameter of the error term, and random_state is the seed of the pseudo random number generator used when shuffling the data for probability estimates.
2. Prove that Elastic net can be used as either LASSO or Ridge regulariser.
Background
The recently started human and other genome projects are likely to change the situation of molecular biology.
Comprehensive analyses of whole genomic sequences will enable us to understand the general mechanisms
of how protein and nucleic acid functions are encoded in the sequence data.
Dataset filename: yeast2vs4.csv
Dataset description: There are 8 features and one target in the dataset. All the features are in a numerical
format, and the target is in text format. For further information about the attributes, please read “Data Set
Information.pdf”.
3. Analyse the importance of the features for predicting presence or absence of protein...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here