PROG8430 – Data Analysis, Modeling and Algorithms Assignment 5 Classification DUE BEFORE 10 pm, DECEMBER 9, 2022 1. Submission Guidelines All assignments must be submitted via the...

Data Analysis modelling and Algorithm
Assignment 5 Calssification



PROG8430 – Data Analysis, Modeling and Algorithms Assignment 5 Classification DUE BEFORE 10 pm, DECEMBER 9, 2022 1. Submission Guidelines All assignments must be submitted via the econestoga course website before the due date in to the assignment folder. You may make multiple submissions, but only the most current submission will be graded. SUBMISSION In the Assignment 5 Folder submit: 1. Your *.Rmd file. This file must have all output already run and your comments and answers to the questions. If you do not include your output, I will not be running your code to generate it. I may, however run the code to verify the results. 2. The *.pdf file that is produced from your code using the ‘Knit’ function in *.Rmd. DO NOT PUT THE DOCUMENTS IN TO A ZIP FILE! PLEASE NOTE: The marks on the assignment are generally awarded 50% for the actual R code and calculations and 50% for interpretation and demonstration that you understand what you have done. EXAMPLES: The example output provided is simply to demonstrate what a typical submission might look like. You can use it as a basis, but your submission must be in your own words. Submissions that simply “cut and paste” my example commentary will be marked 0. All variables in your code must abide by the naming convention [variable_name]_[intials]. For example, my variable for State would be State_DM. Follow the example in the into video from week 1. THIS IS AN INDIVIDUAL ASSIGNMENT. UNAUTHORIZED COLLABORATION IS AN ACADEMIC OFFENSE. Please see the Conestoga College Academic Integrity Policy for details. Remember the discussion forums on eConestoga are a great place to ask questions as is class time and my office hours. 2. Grading This assignment will be marked out of 25 and is worth 12.5% of your total grade in the course. Assignments submitted after 10pm will be reduced 20%. Assignments received after 8:00am the morning after the due date will receive a mark of 0%. Assignments which do not follow the submission instructions may have marks deducted. 3. Data Each student will be using one dataset: PROG8430-Assign05-22F.txt Appendix one contains a data dictionary for the study file. 4. Background A major mail-order company tracks the time (in days) it takes for customers to receive their orders (each row in the dataset represents one order). The company has a goal of ensuring deliveries are made in seven days or less. Your task is to use logistic regression to determine the factors that predict probability of a delivery being ‘on time’ (i.e. in ten days or less) You will then be using two other classification techniques and will compare all three of them. All of the tasks have been demonstrated in class. A careful review of your notes from the lectures should give you everything you need to complete these tasks. 5. Assignment Tasks Nbr Description Marks 1 Preliminary Data Preparation 1. Rename all variables with your initials appended (just as was done in previous assignments). Remember that any variables you subsequently create need to have your initials appended. 2. As demonstrated in class and conducted in previous assignments, make quick exploratory graphs of all variables. Remember to adjust categorical variables to factor variables (e.g. all indicator variables). (NOTE – In this assignment, all of the data I have provided is well- mannered and free of outliers so this should be a quick and simple exercise. 3. Create a new variable in the dataset called OT_[Intials] which will have a value of 1 if Del ≤ 10 and 0 otherwise. If you have forgotten how to do this, the code to accomplish it is included in the appendix. 3 2 Exploratory Analysis 1. Correlations: Create numeric correlations (as demonstrated) and comment on what you see. Are there co-linear variables? 2. Identify the most significant predictor of an on time delivery and provide statistical evidence (in addition to the correlation coefficient) 1 2 that suggest they are associated with an on time delivery (Think of the contingency tables bar plots we did in class). 3 Model Development As demonstrated in class, create two logistic regression models. 1. A full model using all of the variables. 2. An additional model using backward selection. For each model, interpret and comment on the main measures we discussed in class: (1) AIC (2) Deviance (3) Residual symmetry (4) z-values (5) Parameter Co-Efficients Based on your preceding analysis, recommend which model should be selected and explain why. 1 1 3 1 PART B In this section, all three classifiers should be built using OT_[Intials] as the dependant variable and the remaining variables as the independent variables. 1 Logistic Regression – Backward 1. As above, use the step option in the glm function to fit the model (using backward selection). 2. Summarize the results in a Confusion Matrix . 3. As demonstrated in class, calculate the time (in seconds) it took to fit the model and include this in your summary. 1 1 1 2 Naïve-Bayes Classification 1. Use all the variables in the dataset to fit a Naïve-Bayesian classification model. 2. Summarize the results in a Confusion Matrix. 3. As demonstrated in class, calculate the time (in seconds) it took to fit the model and include this in your summary. 1 1 1 3 Linear Discriminant Analysis 1. Use all the variables in the dataset to fit an LDA classification model. 2. Summarize the results in a Confusion Matrix. 3. As demonstrated in class, calculate the time (in seconds) it took to fit the model and include this in your summary. 1 1 1 4 Compare All Three Classifiers For all questions below please provide evidence. 1. Which classifier is most accurate? (provide evidence) 2. Which classifier is most suitable when processing speed is most important? 3. Which classifier minimizes false positives? 4. Which classifier is best overall? 3 5 Professionalism and Clarity 1 APPENDIX ONE: DATA DICTIONARY Variable Description Del Time for delivery (in days, rounded to nearest 10th) Vin Vintage of product (i.e. how long it has been in the warehouse). Pkg How many packages of product have been ordered Cst How many orders the customer has made in the past Mil Distance the order needs to be delivered (in km) Dom Indicator for if the product is manufactured in Canada (C) or elsewhere (I) Haz Indicator for if the product is designated as Hazardous (H) or not (N). Car Indicator for which Carrier delivered the item (Fed Post, or M-Press Delivery) Example Code for outcome OT_DM <- as.factor(ifelse(del="">< 10.1, 1,0)) "del","vin","pkg","cst","mil","dom","haz","car" 9.5,6,6,13,1447,"c","h","m-press delivery" 11.9,18,7,7,1874,"i","n","fed post" 14.6,7,7,8,1865,"i","n","fed post" 17.5,11,5,16,3111,"i","h","m-press delivery" 10.7,12,4,10,1319,"c","h","fed post" 10.5,12,3,5,1415,"c","n","m-press delivery" 10.7,21,1,10,1599,"c","h","m-press delivery" 11.9,12,4,12,2361,"c","n","m-press delivery" 8.9,13,6,8,1394,"i","n","fed post" 7.4,16,5,10,1121,"i","h","m-press delivery" 10.8,11,8,11,1119,"c","h","fed post" 10.6,4,1,10,1889,"c","n","m-press delivery" 9.9,11,7,15,1429,"c","n","fed post" 7.9,17,7,9,1257,"c","n","m-press delivery" 10.7,10,2,13,1611,"c","n","fed post" 12.2,10,6,10,1839,"c","n","m-press delivery" 7,10,4,11,1065,"c","h","m-press delivery" 6.9,12,3,12,1592,"i","n","m-press delivery" 12,14,5,9,1567,"c","n","fed post" 8.7,11,4,10,1619,"c","h","m-press delivery" 11.8,13,5,7,2242,"c","n","fed post" 9.7,7,1,8,1161,"c","n","fed post" 8.9,12,5,7,1526,"c","n","fed post" 11,17,5,9,2081,"i","n","m-press delivery" 11.9,10,5,14,1735,"i","h","fed post" 9.7,18,3,11,1700,"i","n","fed post" 10.1,12,3,5,1388,"c","h","m-press delivery" 11.2,19,5,6,1389,"c","n","fed post" 6.7,11,3,6,1266,"i","h","m-press delivery" 11,13,7,14,1505,"i","n","m-press delivery" 11.5,11,5,10,1457,"c","n","fed post" 12.6,18,3,13,1927,"c","h","fed post" 14.9,13,4,9,1843,"i","h","fed post" 15.2,17,2,11,1943,"c","h","fed post" 12.8,13,4,8,1598,"c","h","fed post" 12.1,9,3,7,1881,"c","n","m-press delivery" 12.5,14,5,14,1963,"c","n","fed post" 6.2,14,4,19,1090,"c","n","m-press delivery" 9.7,12,9,10,1547,"c","h","fed post" 9.7,14,3,7,1762,"c","h","m-press delivery" 12,10,6,7,2326,"c","n","m-press delivery" 10.1,8,6,6,1194,"i","n","fed post" 10,13,5,7,1746,"c","n","fed post" 17.2,13,3,11,2131,"c","n","fed post" 11.2,20,1,12,1662,"c","n","m-press delivery" 12.2,9,2,15,1973,"i","n","fed post" 11.8,16,3,6,1972,"c","h","m-press delivery" 7.3,13,3,8,1322,"c","n","m-press delivery" 11.5,7,5,10,1597,"c","n","fed post" 8.3,16,3,7,1061,"c","n","fed post" 12.6,17,5,5,2143,"c","n","m-press delivery" 7.4,13,2,11,1199,"c","h","m-press delivery" 7.2,12,3,3,1371,"c","n","m-press delivery" 12,11,2,13,1820,"c","n","m-press delivery" 7.2,12,1,11,1072,"i","n","m-press delivery" 16.8,12,3,8,2440,"c","h","fed post" 6.8,12,4,5,903,"c","n","fed post" 15.2,21,1,10,2291,"c","n","fed post" 8.9,13,3,11,1435,"c","n","m-press delivery" 12.6,13,6,8,2075,"c","h","m-press delivery" 8.5,22,1,6,756,"c","n","fed post" 5,14,3,9,415,"i","n","fed post" 14,10,5,7,1975,"i","h","fed post" 9.8,13,6,13,1798,"i","n","m-press delivery" 7.9,14,2,12,1764,"c","n","m-press delivery" 14.4,17,6,3,2151,"c","h","fed post" 9.8,7,4,10,1554,"i","n","m-press delivery" 14.4,14,2,13,2692,"c","n","m-press delivery" 15.2,18,6,8,1941,"c","n","fed post" 18.7,15,7,6,2415,"c","n","fed post" 8.2,12,5,8,1059,"c","n","fed post" 6.5,11,7,6,1247,"c","n","fed post" 10.2,11,5,12,1496,"c","h","m-press delivery" 9.5,17,3,5,1791,"i","n","m-press delivery" 13.1,18,4,13,2513,"c","n","m-press delivery" 8.2,10,4,8,1243,"c","h","m-press delivery" 10.1,10,6,10,1023,"i","n","fed post" 9.6,14,5,8,2149,"c","h","m-press delivery" 9.6,8,3,6,1589,"c","n","m-press delivery" 12,11,2,9,1743,"c","n","fed post" 8.4,10,2,18,981,"c","h","fed post" 9.4,19,7,8,1562,"i","n","m-press delivery" 10.5,8,3,9,1540,"c","n","fed post" 13.3,17,5,9,1736,"c","n","fed post" 9.5,13,3,12,1474,"i","n","m-press delivery" 15.5,13,6,8,2354,"c","n","fed post" 16.4,17,2,8,2570,"c","n","m-press delivery" 8.7,13,6,9,1598,"c","n","m-press delivery" 9.4,11,2,10,1350,"c","n","fed post" 15.7,16,5,6,1974,"i","n","fed post" 13.9,14,7,15,2014,"c","n","fed post" 14.4,13,1,10,2357,"c","n","m-press delivery" 11.6,3,4,6,1881,"i","n","m-press delivery" 12.6,14,3,12,2105,"c","h","fed post" 12.9,10,6,6,1899,"c","n","m-press delivery" 11.6,18,3,6,1781,"i","n","fed post" 17.1,21,6,12,1884,"i","h","m-press delivery" 18.4,10,3,11,2591,"c","n","fed post" 10.8,17,4,7,1964,"c","n","m-press delivery" 16.2,11,4,16,2745,"i","n","fed post" 13.7,11,3,5,2086,"c","h","fed post" 11.1,11,4,8,2041,"c","n","m-press delivery" 12.5,11,3,12,1747,"i","n","fed post" 13.9,19,2,7,2097,"i","n","fed post" 9.7,21,2,9,1558,"c","n","fed post" 13.8,10,3,9,2211,"c","n","fed post" 7.3,16,3,8,1405,"c","n","m-press delivery" 7.3,11,3,13,1538,"c","n","m-press delivery" 5 10.1,="" 1,0))="" "del","vin","pkg","cst","mil","dom","haz","car"="" 9.5,6,6,13,1447,"c","h","m-press="" delivery"="" 11.9,18,7,7,1874,"i","n","fed="" post"="" 14.6,7,7,8,1865,"i","n","fed="" post"="" 17.5,11,5,16,3111,"i","h","m-press="" delivery"="" 10.7,12,4,10,1319,"c","h","fed="" post"="" 10.5,12,3,5,1415,"c","n","m-press="" delivery"="" 10.7,21,1,10,1599,"c","h","m-press="" delivery"="" 11.9,12,4,12,2361,"c","n","m-press="" delivery"="" 8.9,13,6,8,1394,"i","n","fed="" post"="" 7.4,16,5,10,1121,"i","h","m-press="" delivery"="" 10.8,11,8,11,1119,"c","h","fed="" post"="" 10.6,4,1,10,1889,"c","n","m-press="" delivery"="" 9.9,11,7,15,1429,"c","n","fed="" post"="" 7.9,17,7,9,1257,"c","n","m-press="" delivery"="" 10.7,10,2,13,1611,"c","n","fed="" post"="" 12.2,10,6,10,1839,"c","n","m-press="" delivery"="" 7,10,4,11,1065,"c","h","m-press="" delivery"="" 6.9,12,3,12,1592,"i","n","m-press="" delivery"="" 12,14,5,9,1567,"c","n","fed="" post"="" 8.7,11,4,10,1619,"c","h","m-press="" delivery"="" 11.8,13,5,7,2242,"c","n","fed="" post"="" 9.7,7,1,8,1161,"c","n","fed="" post"="" 8.9,12,5,7,1526,"c","n","fed="" post"="" 11,17,5,9,2081,"i","n","m-press="" delivery"="" 11.9,10,5,14,1735,"i","h","fed="" post"="" 9.7,18,3,11,1700,"i","n","fed="" post"="" 10.1,12,3,5,1388,"c","h","m-press="" delivery"="" 11.2,19,5,6,1389,"c","n","fed="" post"="" 6.7,11,3,6,1266,"i","h","m-press="" delivery"="" 11,13,7,14,1505,"i","n","m-press="" delivery"="" 11.5,11,5,10,1457,"c","n","fed="" post"="" 12.6,18,3,13,1927,"c","h","fed="" post"="" 14.9,13,4,9,1843,"i","h","fed="" post"="" 15.2,17,2,11,1943,"c","h","fed="" post"="" 12.8,13,4,8,1598,"c","h","fed="" post"="" 12.1,9,3,7,1881,"c","n","m-press="" delivery"="" 12.5,14,5,14,1963,"c","n","fed="" post"="" 6.2,14,4,19,1090,"c","n","m-press="" delivery"="" 9.7,12,9,10,1547,"c","h","fed="" post"="" 9.7,14,3,7,1762,"c","h","m-press="" delivery"="" 12,10,6,7,2326,"c","n","m-press="" delivery"="" 10.1,8,6,6,1194,"i","n","fed="" post"="" 10,13,5,7,1746,"c","n","fed="" post"="" 17.2,13,3,11,2131,"c","n","fed="" post"="" 11.2,20,1,12,1662,"c","n","m-press="" delivery"="" 12.2,9,2,15,1973,"i","n","fed="" post"="" 11.8,16,3,6,1972,"c","h","m-press="" delivery"="" 7.3,13,3,8,1322,"c","n","m-press="" delivery"="" 11.5,7,5,10,1597,"c","n","fed="" post"="" 8.3,16,3,7,1061,"c","n","fed="" post"="" 12.6,17,5,5,2143,"c","n","m-press="" delivery"="" 7.4,13,2,11,1199,"c","h","m-press="" delivery"="" 7.2,12,3,3,1371,"c","n","m-press="" delivery"="" 12,11,2,13,1820,"c","n","m-press="" delivery"="" 7.2,12,1,11,1072,"i","n","m-press="" delivery"="" 16.8,12,3,8,2440,"c","h","fed="" post"="" 6.8,12,4,5,903,"c","n","fed="" post"="" 15.2,21,1,10,2291,"c","n","fed="" post"="" 8.9,13,3,11,1435,"c","n","m-press="" delivery"="" 12.6,13,6,8,2075,"c","h","m-press="" delivery"="" 8.5,22,1,6,756,"c","n","fed="" post"="" 5,14,3,9,415,"i","n","fed="" post"="" 14,10,5,7,1975,"i","h","fed="" post"="" 9.8,13,6,13,1798,"i","n","m-press="" delivery"="" 7.9,14,2,12,1764,"c","n","m-press="" delivery"="" 14.4,17,6,3,2151,"c","h","fed="" post"="" 9.8,7,4,10,1554,"i","n","m-press="" delivery"="" 14.4,14,2,13,2692,"c","n","m-press="" delivery"="" 15.2,18,6,8,1941,"c","n","fed="" post"="" 18.7,15,7,6,2415,"c","n","fed="" post"="" 8.2,12,5,8,1059,"c","n","fed="" post"="" 6.5,11,7,6,1247,"c","n","fed="" post"="" 10.2,11,5,12,1496,"c","h","m-press="" delivery"="" 9.5,17,3,5,1791,"i","n","m-press="" delivery"="" 13.1,18,4,13,2513,"c","n","m-press="" delivery"="" 8.2,10,4,8,1243,"c","h","m-press="" delivery"="" 10.1,10,6,10,1023,"i","n","fed="" post"="" 9.6,14,5,8,2149,"c","h","m-press="" delivery"="" 9.6,8,3,6,1589,"c","n","m-press="" delivery"="" 12,11,2,9,1743,"c","n","fed="" post"="" 8.4,10,2,18,981,"c","h","fed="" post"="" 9.4,19,7,8,1562,"i","n","m-press="" delivery"="" 10.5,8,3,9,1540,"c","n","fed="" post"="" 13.3,17,5,9,1736,"c","n","fed="" post"="" 9.5,13,3,12,1474,"i","n","m-press="" delivery"="" 15.5,13,6,8,2354,"c","n","fed="" post"="" 16.4,17,2,8,2570,"c","n","m-press="" delivery"="" 8.7,13,6,9,1598,"c","n","m-press="" delivery"="" 9.4,11,2,10,1350,"c","n","fed="" post"="" 15.7,16,5,6,1974,"i","n","fed="" post"="" 13.9,14,7,15,2014,"c","n","fed="" post"="" 14.4,13,1,10,2357,"c","n","m-press="" delivery"="" 11.6,3,4,6,1881,"i","n","m-press="" delivery"="" 12.6,14,3,12,2105,"c","h","fed="" post"="" 12.9,10,6,6,1899,"c","n","m-press="" delivery"="" 11.6,18,3,6,1781,"i","n","fed="" post"="" 17.1,21,6,12,1884,"i","h","m-press="" delivery"="" 18.4,10,3,11,2591,"c","n","fed="" post"="" 10.8,17,4,7,1964,"c","n","m-press="" delivery"="" 16.2,11,4,16,2745,"i","n","fed="" post"="" 13.7,11,3,5,2086,"c","h","fed="" post"="" 11.1,11,4,8,2041,"c","n","m-press="" delivery"="" 12.5,11,3,12,1747,"i","n","fed="" post"="" 13.9,19,2,7,2097,"i","n","fed="" post"="" 9.7,21,2,9,1558,"c","n","fed="" post"="" 13.8,10,3,9,2211,"c","n","fed="" post"="" 7.3,16,3,8,1405,"c","n","m-press="" delivery"="" 7.3,11,3,13,1538,"c","n","m-press="" delivery"="">
Dec 02, 2022
SOLUTION.PDF

Get Answer To This Question

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here