Applied Machine Learning Online exam Thursday7/January/2021 at 09:30 Irish Time2020 Sample Exam...

Question

Applied Machine Learning Online exam Thursday7/January/2021 at 09:30 Irish Time2020 Sample Exam Paper Open Book.pdf  Answer any 3 questions from 4, all questions carry equal marks.    Question 1 (A) The following table presents the Pearson coefficients from a data-set.      (i) Evaluate the table for potential multi-colinear attributes. Explain the reasoning behind the  choices you have made.    (ii) Evaluate the table for attribute selection. Explain the reasoning behind the potential  attributes that you have selected, based on the Pearson coefficients.  (6 Marks)      (B)  The histogram presented, represents 398 cars surveyed for their fuel efficiency (miles per  gallon  MPG). (i) Evaluate the Histogram for potential outliers or deemed missing data. Explain the reasoning  behind the choices you have made (provide the steps and calculations you used to support  your decisions)    (ii) Explain how you would deal with the evaluation findings from part (i).    (8 Marks)  Mean 23.3  Standard Deviation 7.05  Minimum 0  Maximum 43.6  Number of Instances 398 Question 1 contd     (C)   The pre-examination of the class distribution is an important exercise before developing  classification models .   (i) Discuss this statement explaining why this is an appropriate pre-examination technique,  and discuss the implications of not conducting this technique.       (ii) Provide examples of problem situations where this technique would be useful when  examining the model's performance. (6 Marks) Question 2 (A) The terms type I error and a type II error are often discussed when model performance is  presented.     (i) Explain how you would evaluate a type I error and a type II error.     (ii) Given that the model is trying to identify patients with a life treating disease, discuss this  problem situation concerning both types of errors, also explaining which you think is a  more important error in this case and why.  (6 Marks) (B) The following table contains the performance results for classification models a and b,  (Accuracy, Sensitivity and Specificity). Where both models are trying to identify sports injuries  before they happen                                               Model A Model B                  Assumption: B is the most suitable to predict sports injuries before they happen .    (i) Explain why someone would make this incorrect assumption, using the values  presented in the table above to aid your answer.     (ii) Explain your reason why Model A is the most suitable model for predicting sports  injuries before they happen, using the values presented in the table above to aid your  answer.  (6 Marks)    (C) Ten-fold Cross Validation Machine Learning  model validation techniques (the best technique to use).    (i) Explain what is the most important      (ii) Explain an alternative to Ten-fold Cross Validation? Compare and contrast the two  techniques (10-fold Cross Validation and the alternative technique), giving examples  of problem situations where each technique may be more suitable.    (8 Marks) Question 3    (A) The k-value in the KNN classification algorithm can be selected using the elbow method.                         (i) Explain why you would initially decide a k-value to be even or odd?    (ii) Explain how you would evaluate the most appropriate k-value for a KNN algorithm using  the above figure and the elbow method.    (6 Marks)   (B) The naïve Bayes Machine Learning Algorithm is often, a high performing classification  algorithm.     (i) Explain why the Baysian based algorithm, includes in the title  and how this may  affect the models  performance    (ii) Compare and contrast the naïve Bayes algorithm with two other Machine Learning  Algorithm .          (8 Marks)   (C) Semi-supervised learning is an approach that is sometimes required, combining both  supervised learning and unsupervised learning .    (i) Describe a problem situation where semi-supervised learning is required.    (ii) Explain why this approach is needed for the answer in part (i), describing why alone,  supervised learning or unsupervised learning would be unable to address the problem  situation mentioned.     (6 Marks) Question 4 (A) The Hyperparameters batch size  and epochs  are fundamental in the development of an  Artificial Neural Network (ANN).    (i) Explain how you would evaluate and select a suitable batch size and epochs.    (6 Marks)   (B) Bootstrap is a statistical estimation technique where a statistical quantity like a mean is  estimated from multiple random samples of your data (with replacement)     (i) Discuss this statement, explaining in your own words how the Bootstrap pre-processing  technique works.     (ii) Provide an example of a problem situation where this technique should be considered,  explaining why you think Bootstrap, is suitable for this problem situation.      (6 Marks) (C) Statistical testing is often used to compare the performance of two or more machine learning  models.    (i) Compare and contrast any two methods of statistical testing for comparing two or more  Machine Learning models.     When reporting a statistical test result, many people do not present the entire picture, calling  into question the findings.     (ii) Discuss what are the most important parts of a statistical test to report, so that there is  no ambiguity in the findings, explaining your reason for each part selected.                         (8 Marks)   SampleExam2019 Sample Exam Paper (Traditional in person).pdf  Answer any 3 questions from 4, all questions carry equal marks.    Question 1 (A) The following table presents the Pearson coefficients from a data-set.  What is this table often  used for in data pre-processing? Identify one pair of attributes from the table and explain their  values and what they mean in reference to data pre-processing.       (6 Marks)      (B)  The histogram presented, represents 398 cars surveyed for their fuel efficiency (miles per  gallon  MPG). Describe this histogram. What values if any, would you identify as concerns  (consider for marking)? Finally, what steps (and the rationale for your choice) would you  take for each identified concern? (14 Marks)    Mean 23.3  Standard Deviation 8.05  Minimum 0  Maximum 46.6  Number of Instances 398 Question 1 contd     (C)   Explain why a pre-examination of the class distribution is an important exercise prior to  running classification models. Give an example of a concern that may arise if this exercise is  not conducted.   (4 Marks) Question 2 (A) Explain what the terms type I error and a type II error. Given that the model is trying to identify  patients with a life treating disease, which is the most important measure to identify correctly  and why?    (4 Marks) (B) a and b, calculate:  appropriate and why, considering that both models are trying to identify sports injuries before  they happen a b

Swapnil · Accepted Answer

QUESTION 1
A)
1)
Outliers are the data [points that can be useful to the other data points. SO they are basically unusual values in the dataset. If a value is a certain number of standard deviations away from the mean, that data point is identified as an outlier. The specified number of standard deviations is called the threshold. This method can fail to detect outliers because the outliers increase the standard deviation. The smaller your range or standard deviation, the lower and better your variability is for further analysis. The range is useful, but the standard deviation is considered the more reliable and useful measure for statistical analyses.
2) 
Standard deviation measures the spread of a data distribution. The more spread out a data distribution is, the greater its standard deviation. standard deviation cannot be negative. A standard deviation close to 0 indicates that the data points tend to be close to the mean. The further the data points are from the mean, the greater the standard deviation. For an approximately normal data set, the values within one standard deviation of the mean account for about 68% of the set; while within two standard deviations account for about 95%; and within three standard deviations account for about 99.7%.
B)
1) 
An outlier is an observation that lies an abnormal distance from other values in a random sample from a population. In a sense, this definition leaves it up to decide what will be considered abnormal. Before abnormal observations can be singled out, it is necessary to characterize normal observations The outlier is identified as the largest value in the data set, 4257, and appears as the circle to the right of the box plot. 
2) Outliers should be investigated carefully. It is containing valuable information about the process under investigation or the data gathering and recording process. Before considering the possible elimination of these points from the data, one should try to understand why they appeared and whether it is likely similar values will continue to appear. So finally we will calculate the outliers of our dataset. 
C)
1) The statement is true because Multicollinearity is a situation where two or more predictors are highly linearly related. It is appropriate to use the Pearson correlation coefficient when the two variables of interest are scored using interval or ratio measures while the associations of ordinal or nominal variables should be compared using alternative methods.
2) The following pints are supports the above situations. 
· Redundancy: two predictors might be providing the same information about the response variable thereby leading to unreliable coefficients of the predictors (especially for linear models). 
· The estimate of a predictor on the response variable will tend to be less precise and less reliable. 
· An important predictor can become unimportant as that feature has a collinear relationship with other predictors. 
QUESTION 2
A)
1)
Accuracy is the terms we use into the statistics that means what we think it does. But sensitivity and specificity are complicated to understand into the statistics. When you work on the above table you will get the outcome is only one that is positive or negative. Basically the model is to predict the outcome better than randomly guessing. 
2)
You must use the confusion matric to do the overall calculations. When you train your model B then you must predict the values for the confusion matrix because it will help you to the predict the how many clicks will be clicked for the Netflix system. The accuracy will give you the correct output for that matric that will be availed for the next title. A perfectly accurate model would put every transaction into the boxes of the dataset. A simple proportion is to classified the properties of the true positives and true negatives.
B)
1)
A gold standard is an accepted standard that people can look to as an accurate and reliable reference. In medicine, for example, researchers often refer to blood assay as a gold standard for checking patients’ medication adherence. As with many such standards, however, because it is expensive and time-consuming, researchers search for quicker and less expensive, but still consistent ways of achieving comparable results. They gauge the value of their methods by comparing them to those achieved using the so-called gold standard.
2)
You train your model on the training set and then cross validate with the cross validation set. Once your model is at its highest accuracy using the cross validation set then you evaluate to get the "real" accuracy with the test set. Cross-validation is the most rigorous way of choosing hyper parameters, but it’s time-consuming. One alternative method is simple holdout validation, which reduces the time complexity. Alternatively,

2020 Sample Exam Paper Open Book.pdf Answer any 3 questions from 4, all questions carry equal marks. Question 1 (A) The following table presents the Pearson coefficients from a data-set. (i) Evaluate...

Answer To: 2020 Sample Exam Paper Open Book.pdf Answer any 3 questions from 4, all questions carry equal marks....

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment