dk 3 sample exams need to solve with full explanation so this assignment is DATA science subject plus mostly math part. in some questions you need to circle which is right questions but whichever is...

1 answer below »



dk 3 sample exams need to solve with full explanation so this assignment is DATA science subject plus mostly math part. in some questions you need to circle which is right questions but whichever is wrong also explain why is wrong where we can use according whatever question asked here are some sample exams its not too much work,


this is sample test my prof told me I allow to take open books so i can slove this and take with me in exam plus this some questions is multiple questions for that you have to circle it but not right answers you need to explain too those kind questions coming to my exam so i need full explanation and right answers mostly math work included


Final Exam – CS 301 (Time 3 hours) Fall 2020 Total points -50 Name……………………………………………………………………………… 1. Multiple Choice/ Single Answer Questions. For some questions, there may be more than one correct answer. You have to select all that applies. (1+1+2+1+1+2+2=10) a. Which one is subjectively most interesting? i. An association rule that has reasonably high support but low confidence. ii. A rule that has low support and low confidence. iii. A rule that has low support and high confidence. b. Suppose the confidence of the rules A → B and B → C are larger than some threshold, minconf . Is it possible that A → C has a confidence less than minconf ? i. Yes ii. No c. The training error on an unpruned decision tree is 80% while the error rate on each bootstrap sample is close to 60%. What is the error estimate of 0.632 bootstrap? d. Chose the incorrect one - A continuous attribute could be discretized using. i. Binning ii. Clustering iii. Rule discovery e. Which one has the highest entropy? i. A fair coin ii. An unfair coin iii. A fair dice f. Based on your answer to question (e), write down the entropy values to corroborate your answer. g. The following data below shows the relationship between age and high blood pressure (high and normal) for a set of 10 patients. Age Blood pressure 75 H 84 H 39 N 44 N 35 H 42 H 53 N 62 N 67 H Discretize the attribute age considering at least 2 instances of the majority class (just write the answer – no need to show your work). 2. The following table consists of training data from an employee database. The data have been generalized. For example, “31 . . . 35” for age represents the age range of 31 to 35. For a given row entry, the count column represents the number of data tuples having the values for department, status, age , and salary. a. Given a data tuple having the values “systems”, “26. . . 30” , and “46–50K” for the attributes department, age , and salary , respectively, what would a Decision Tree (information gain) classification of the status for the tuple be? (6) b. What would be the results if Naïve Bayesian classification is used? (4) 3. a. Suppose that we would like to select between two prediction models , M1 and M2 . We have performed 5-fold cross validation on each model, where the same data partitioning in round i is used for both M1 and M2 . The error rates obtained for M1 are 30.5, 32.2, 20.7, 20.6, 31.0. The error rates for M2 are 22.4, 14.5, 22.4, 19.6, 20.7. Comment on whether one model is significantly better than the other considering a significance level of 10%. (5) b. Compute the Precision and the Recall of the model for this 3-class problem. (5) Actual/predicted Class-1 Class-2 Class 3 Class-1 200 5 7 Class-2 30 200 10 Class-3 40 20 300 4. a. Compute the ROC Curve (5) b. If there is a cost associated with wrong classification (that you intend to minimize), what would be your best threshold value from the probability table to minimize the cost? (5) Actual/predicted P N P 0 10 N 2 0 5. In a survey of 1000 people, the following preference is observed. Buys computer Doesn’t buy computer Student 300 100 Not student 200 400 a. Is buying computer correlated with the student status? Is this relationship significant with 90% significance level? (3) b. With a support of 30% and confidence of 30%, write two significant association rules from this dataset. (4) c. Find out the maximum itemsets from this dataset considering the support threshold of 30%. (3) 1 Sample Final Exam Time 2 hour 30 minutes Total points -45 Name -------------------------------------------------------- 1. Multiple choice or single answer question. show your work. (2+1+2+2+2+1=10) a. Which one has the highest entropy (select all that applies)? i. A fair coin ii. An unfair coin. iii. A 6-sided fair dice iv. A 4 sided fair dice b. The accuracy on an unpruned decision tree is 40% on the test data, while that on the training data is close to 60%. Write the accuracy expression of this using 0.632 bootstrap? c. The following data demonstrates the relationship between Math Score and Age of the test writers. Write down the model equation. d. The dataset contains the following 10 training points (each point contains their coordinates and a class label). X1(1,1)  - Male, X2 (2,2)  - Male, X3(2, 2.5) – Female, X4 (3,7) -Female, X5(9,9) – Male, X6 (8,9) -Male, X7(3,3) – Male, X8(9,9)  -Male, X9(9,10)  - Female, X10(9,5) –Female Classify a test point X(8,7) using k (=3) nearest neighbor based classification. Use Manhattan distance. e. A transaction Database contains three transactions as follows = {, {, < a1,="" …,="" a50="">} .Min support = 2. Write down the closed itemsets and the corresponding support count. Write down the maximum itemsets. f. Select all those that are true. i. Supervised discretization could be obtained by applying information gain based criteria ii. Supervised discretization could be obtained by simply finding the pure intervals where the class labels are the same iii. discretization could be obtained by applying Elbow method 2. Given the dataset below and a support threshold of 3 and confidence of 100%, generate all the association rules that describe weather conditions with play outcome. Find out the closed and the max patterns. (6+2+2=10) 3. Given to us are the following 6 objects. Run AGNES over it and compute the dendrogram (use single link as inter cluster distance). Show each step and the computed dissimilarity matrix (10) Age Test-1 Test-2 Standing Gender AP courses taken A 20 P P Junior M Chem, Math B 19 P P Sophomore F CS, Math C 19 F P Freshman F English D 18 F F Freshman F E 25 F F Senior M Math, Physics F 24 F F Senior M Math G 21 F P Junior M CS, Chem H 21 P F Sophomore M Physics I 20 F P Junior F English 4. Given below is historical data that determines the play decision based on weather parameters. a. Classify X (Outlook=Sunny, Temperature=Cool, Humidity=High, Wind=Strong) using Naïve Bayes classifier. (5) b. Compute the Naïve Bayes scores (up to two decimal points rounded) produced by D1, D2, D3, D4, D5 considering their respective actual class to be their predicted class. (5) c. Draw the ROC curve. If the cost of false positive (positive class is Play=Yes) is 9, and false negative is 1, find the best threshold computed for the records considered in b. (5) Score (y) age (x) 78.93 22 58.20 23 67.47 21 37.47 39 45.65 33 32.92 42 29.97 73 Score (y)age (x) 78.9322 58.2023 67.4721 37.4739 45.6533 32.9242 29.9773 Sheet1 Score (y)age (x) 78.9322 58.2023 67.4721 37.4739 45.6533 32.9242 29.9773 Sheet2 Sheet3 1. A sample data table contains attribute of mixed type. Compute the Dendogram after applying AGNES considering linkage distance as maximum distance. Show all the steps. (10) Object id Test-1 (nominal) Test-2 (asymmetric binary) Test-3 (symmetric binary) 1 Code A Positive Positive 2 Code B Negative negative 3 Code C Positive Positive 4 Code A Negative negative 5 Code B Positive Positive 2. The following table summarizes supermarket transaction data, where hot dogs refers to the transactions containing hot dogs and hamburgers refers to the transaction containing hamburgers. Hot dogs hot dogs’ hamburgers 2000 500 Hamburgers’ 1000 1500 i. Suppose that the association rule "hotdogs => hamburgers" is mined. Given a minimum support threshold of 25% and a minimum confidence threshold of 50%, is this association rule strong? ii. Based on the given data, is the purchase of hotdogs independent of the purchase of hamburgers? If not, what kind of correlation relationship exists between the two? Is this result significant with 90% significance level? Hint: use correlation formula, such as, Pearson Correlation Coefficient 3. Functionally describe false positive (FPR) rate using specificity 4. The data below presents the T4 value of 115 suspected hypothyroidism patients. Draw the ROC curve. 5. The training error on an unpruned decision tree is 100% while the error rate on each bootstrap sample is close to 50%. What is the error estimate of 0.632 bootstrap? 6. You use K-means to cluster the data, but for all values of K , 1 ≤ K ≤ 100, the K-means algorithm returns only one non-empty cluster. Is that possible? 7. A database has 5 transactions as listed below. With 60% support (coverage) and 60% confidence (Accuracy), a. Compute all frequent patterns using Apriori algorithm b. Compute all max patterns c. Compute all closed patterns 8. Given the training dataset below, classify the fruit that is Long, Sweet and Yellow 9. Question on Entropy/ Information Gain/ Gain ratio
Answered 4 days AfterDec 10, 2022

Answer To: dk 3 sample exams need to solve with full explanation so this assignment is DATA science subject...

Banasree answered on Dec 14 2022
35 Votes
Final Exam – CS 301
(Time 3 hours)
Fall 2020
Total points -50
Name………………………………………………………………………………
1. Multip
le Choice/ Single Answer Questions. No need to show your work. For some questions, there may be more than one correct answer. You have to select all that applies. (1+1+2+1+1+2+2=10)
a. Which one is subjectively most interesting?
i. An association rule that has reasonably high support but low confidence.
ii. A rule that has low support and low confidence.
iii. A rule that has low support and high confidence.
Ans. i)
b. Suppose the confidence of the rules A → B and B → C are larger than some threshold, minconf . Is it possible that A → C has a confidence less than minconf ?
i. Yes
ii. No
Ans. Yes
c. The training error on an unpruned decision tree is 80% while the error rate on each bootstrap sample is close to 60%. What is the error estimate of 0.632 bootstrap?
Ans. 0.673
d. Chose the incorrect one - A continuous attribute could be discretized using.
i. Binning
ii. Clustering
iii. Rule discovery
Ans. iii)
e. Which one has the highest entropy?
i. A fair coin
ii. An unfair coin
iii. A fair...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here