This assignment (Assignment 3) covers material from Descriptive Statistics, Resampling Methods, Linear Regression, Model Prediction, Principal Component Analysis, Discriminant Analysis, Cluster...


This assignment (Assignment 3) covers material from Descriptive Statistics, Resampling Methods, Linear Regression, Model Prediction, Principal Component Analysis, Discriminant Analysis, Cluster analysis and is worth 50 marks. Your solutions should be properly presented, and it is important that you double-check your spelling and grammar and thoroughly proofread your assignment before submitting. R code should not be presented in the assignment itself but rather handed in as a separate R script for review (TextEdit document). Essential R output should be tidied into figures, tables or summaries as appropriate. The assignment must be submitted as a PDF. I’ve also attached an example of how the assignment should be answered/structured (Assignment 3 Solutions) and an example of how the R text (Assignment3Horse.R.txt) should look like.





MAS223 Applied Statistics Semester 2, 2021 Assignment 3 Due: Friday, 15 October 2020, 5PM This assignment covers material from Topics 1-7 (Descriptive Statistics, Re- sampling Methods, Linear Regression, Model Prediction, Principal Compo- nent Analysis, Discriminant Analysis, Cluster analysis) and is worth 50 marks. Your solutions should be properly presented, and it is important that you double-check your spelling and grammar and thoroughly proofread your assignment before submitting. R code should not be presented in the assign- ment itself but rather handed in as a seperate R script for review. Essential R output should be tidied into figure, tables or summaries as appropriate. The assignment must be submitted as a PDF. Work up to 24 hours late will incur a 20% penalty. Beyond 24 hours late work will be marked for formative assessment purposes only unless you have emailed me ahead of submission and been given an extension. 1 Questions 1. (20 marks) In late 2013 the ASX200 stocks prices were recorded for 53 days. Consider the question: What are some of the features of these stocks? In considering this question you are not required to consider all 200 stocks, if you want to, you can subset them on statistical grounds (say randomly selected) or knowledge. (Extensive research or background reading for this purpose is not encouraged). Carry out a principal component analysis using all the suitable vari- ables. Do not include the stock name in analysis. (a) Please describe your subset here, if relevant, for my reference when marking your work. (0 Marks) (b) Should this data be scaled prior to running a PCA or not? (2 marks) (c) Consider the eigenvalues for the principal component analysis. Provide evidence to support your answer. • How many principal components would you select if using the “elbow” method? • How many principal components would you select if attempt- ing to account for 70% of total variation? (10 marks) (d) Produce a biplot of the first two principal components. What vari- able groupings load onto these components similarly? (5 marks) (e) What are the percentage contributions (loadings) of the first five variables of PC1? (3 marks) 2 2. (16 marks) In this question, we will examine a Milk Production dataset. A researcher has collected some data on daily milk production and classified it as low, medium or high. We are interested in whether baby birth weight, number of feeds (on average, per day) and mother concern (measure from 0 to 100) can aid in classifying maternal production. (a) Discuss the assumptions of linear discriminant analysis as they relate to this data set. (3 marks) (b) Using linear discriminant analysis, determine the hit rate when considering the variables baby birthweight, number of feeds and mother concern in trying to predict the outcome. (3 marks) (c) Using the group means, describe the three outcomes and how they typically differ. (3 marks) (d) How does this change if we say that the costs of mis-diagnosing the high or low production mothers are 5 times that of medium production. i. What are the new priors? (2 marks) ii. What is the new hit rate? (3 marks) (e) Is linear discriminant analysis effective in this context? Provide at least one visualisation to support your answer. (2 marks) Note: Be sure to remove the randomness from the linear discriminant analysis analysis by setting the seed in your Rcode. 3 3. (6 marks) In your own words, compare supervised and unsupervised learning with reference to k-nearest-neighbours classification and k-means clustering. In addition to your regular readings, section 2.1.4 of your textbook is recommended reading. 4. (4 marks) Report presentation marks These marks are allocated based on: • structure, clarity, and tidiness of presented solutions/answers, • readability, correctness in spelling and grammar 5. (4 marks) Coding marks When submitted, this script file should have a name given by Assign- ment 3 SURNAME.R, where SURNAME is replaced by your surname. Your R script will be marked based on: • If the script is submit as an R script • Readability of code: This includes the use of informative com- menting to make it clear what blocks of code are meant to do, descriptive variable names, and appropriate use of spacing to sep- arate blocks of code meant to perform different functions. • Accuracy of code: This includes the correct specification of func- tions to produce the results reported in your assignment and whether I am able to run your entire script file without producing any errors. It is important that you verify that your code runs error-free from start to finish before submitting. • Efficiency: This includes writing a script that uses minimal lines of code, is easily adapted to new datasets or slight modifications to the existing dataset, and runs quickly. 4 MotherID,ProductionCategory,BabyBirthweight,NumberFeeds,MotherConcern 1001,Low,3610,4,83 1002,Low,3400,10,71 1003,Low,3950,2,86 1004,Low,3500,4,91 1005,Low,3160,7,87 1006,Low,3630,6,89 1007,Low,2985,9,80 1008,Low,3120,5,79 1009,Low,2480,6,77 1010,Low,2900,7,79 1011,Low,3370,8,76 1012,Low,3060,9,66 1013,Low,2390,10,83 1014,Low,2803,12,75 1015,Low,3955,7,71 1016,Low,3600,6,73 1017,Low,3060,10,70 1018,Low,3120,8,68 1019,Low,3470,10,71 1020,Low,4060,8,87 1021,Low,3650,11,93 1022,Low,2360,9,89 1023,Low,3235,10,86 1024,Low,,8, 1025,Low,2935,6,80 1026,Low,3270,7,79 1027,Medium,3320,4,61 1028,Medium,3330,8,51 1029,Medium,3240,8,63 1030,Medium,3060,7,57 1031,Medium,3170,12,40 1032,Medium,3500,10,55 1033,Medium,4050,10,57 1034,Medium,2910,10,48 1035,Medium,3765,8,65 1036,Medium,,10,50 1037,Medium,3190,8,55 1038,Medium,3250,9,40 1039,Medium,2940,13,57 1040,Medium,3270,7,57 1041,Medium,3210,8,52 1042,High,3440,12,62 1043,High,3240,10,56 1044,High,3190,12,49 1045,Medium,4080,12,49 1046,Medium,3770,12,53 1047,Medium,3365,11,39 1048,High,3120,8,75 1049,Medium,2820,9,63 1050,Medium,3420,9,60 1051,Medium,3500,11,55 1052,Medium,2895,12,64 1053,Medium,3450,11,70 1054,Medium,3485,11,52 1055,Medium,3435,7,58 1056,Medium,4500,12,67 1057,Medium,4190,12,45 1058,Medium,3000,12,41 1059,Medium,2690,11,43 1060,Medium,3600,10,48 1061,Medium,3230,8,49 1062,Medium,3830,10,40 1063,Medium,3908,12,62 1064,Medium,3035,11,66 1065,Medium,3960,13,53 1066,Medium,3140,12,63 1067,Medium,3520,10,54 1068,Medium,3038,8,51 1069,Medium,2700,9,54 1070,Medium,3950,9,42 1071,Medium,3040,7,59 1072,Medium,3870,11,51 1073,High,4186,10,67 1074,Medium,4040,8,50 1075,Medium,2510,14,53 1076,Medium,3380,9,66 1077,Medium,3235,14,59 1078,Medium,2863,12,54 1079,Medium,4200,12,49 1080,Medium,3220,7,71 1081,Medium,3800,8,45 1082,Medium,3280,6,63 1083,Medium,3701,12,38 1084,High,4070,10,51 1085,Medium,3070,10,42 1086,Medium,2880,9,61 1087,High,3080,13,65 1088,Medium,3410,9,57 1089,Medium,2540,9,52 1090,Medium,3670,12,57 1091,Medium,3090,7,61 1092,Medium,3705,11,63 1093,High,3695,12,53 1094,Medium,3565,8,46 1095,Medium,3144,9,69 1096,Medium,3688,8,55 1097,Medium,3515,8,67 1098,Medium,2950,9,41 1099,High,4480,11,59 1100,Medium,2950,11,54 1101,Medium,3830,10,63 1102,Medium,3420,11,58 1103,Medium,3760,11,44 1104,Medium,3048,10,51 1105,Medium,3450,7,72 1106,Medium,3360,10,59 1107,Medium,3810,12,64 1108,Medium,2890,8,56 1109,Medium,3960,13,61 1110,Medium,3830,10,60 1111,Medium,2940,11,65 1112,Medium,2750,10,62 1113,Medium,4080,11,53 1114,Medium,4395,11,56 1115,Medium,3245,9,46 1116,Medium,3705,8,51 1117,Medium,3505,12,62 1118,Medium,3145,11,47 1119,Medium,3250,8,59 1120,Medium,3040,12,48 1121,Medium,4069,10,66 1122,Medium,2941,12,42 1123,Medium,4120,10,45 1124,Medium,3090,7,38 1125,Medium,3322,7,60 1126,Medium,3090,12,54 1127,Medium,2890,12,55 1128,Medium,3950,5,62 1129,Medium,3100,10,51 1130,Medium,3245,10,43 1131,Medium,3425,7,75 1132,Medium,2890,13,41 1133,Medium,3760,8,51 1134,Medium,3910,10,40 1135,Medium,3190,11,64 1136,High,3560,10,69 1137,Medium,4000,7,35 1138,Medium,4020,10,52 1139,Medium,3700,11,40 1140,Medium,3380,12,66 1141,Medium,3390,12,45 1142,Medium,3965,13,48 1143,Medium,2900,11,53 1144,Medium,4260,8,58 1145,Medium,3600,9,52 1146,Medium,3410,14,47 1147,Medium,3208,11,59 1148,Medium,3510,12,60 1149,High,3834,8,65 1150,High,3570,8,61 1151,Medium,2960,8,65 1152,Medium,3400,9,60 1153,Low,2430,5,81 1154,Medium,2470,5,63 1155,Medium,2430,9,60 1156,Low,2395,6,86 1157,Medium,2240,12,50 1158,Low,2420,5,92 1159,Low,2379,4,86 1160,Medium,2460,11,54 Name,TWENTY-FIRST CENTURY FOX CDI.'B',AGL ENERGY,ALS,AMP,AUS.AND NZ.BANKING GP.,APA GROUP,ASX,AWE,ABACUS PROPERTY GROUP,ACRUX,ADELAIDE BRIGHTON,ALACER GOLD CDI.,ALUMINA,AMCOR,ANSELL,AQUILA RESOURCES,ARDENT LEISURE GROUP,ARISTOCRAT LEISURE,ARRIUM,ASCIANO,ATLAS IRON,AURIZON HOLDINGS,AURORA OIL & GAS,AUSDRILL,AUSTRALAND PR.GP.,AUTOMOTIVE HOLDINGS GP.,AVEO GROUP,BC IRON,BHP BILLITON,BWP TRUST,BANK OF QLND.,BEACH ENERGY,BEADELL RESOURCES,BENDIGO & ADELAIDE BANK,BLUESCOPE STEEL,BORAL,BRADKEN,BRAMBLES,BREVILLE GROUP,BURU ENERGY,CFS RETAIL PR.TST.GROUP,CSL,CSR,CABCHARGE AUSTRALIA,CALTEX AUSTRALIA,CARDNO,CARSALES.COM,CHALLENGER,CHARTER HALL GROUP,CHARTER HALL RETAIL REIT,COCA-COLA AMATIL,COCHLEAR,COMMONWEALTH BK.OF AUS.,COMPUTERSHARE,CROMWELL PROPERTY GROUP,CROWN RESORTS,DAVID JONES,DECMIL GROUP,DEXUS PROPERTY GROUP,DOMINO'S PIZZA ENTS.,DOWNER EDI,DRILLSEARCH ENERGY,DUET GROUP,DULUXGROUP,ECHO ENTERTAINMENT GROUP,ENERGY WORLD,ENVESTRA,EVOLUTION MINING,FAIRFAX MEDIA,FEDERATION CENTRES,FLETCHER BUILDING (ASX),FLEXIGROUP,FLIGHT CENTRE TRAVEL GP.,FORTESCUE METALS GP.,G8 EDUCATION,GPT GROUP,GUD HOLDINGS,GWA GROUP,GOODMAN FIELDER,GOODMAN GROUP,GRAINCORP,HARVEY NORMAN HOLDINGS,HENDERSON GROUP CDI.,HORIZON OIL,ILUKA RESOURCES,INCITEC PIVOT,INDEPENDENCE GROUP,INSURANCE AUS.GROUP,INVESTA OFFICE FUND,INVOCARE,IOOF HOLDINGS,IRESS,JAMES HARDIE INDS.CDI.,JB HI-FI,KAROON GAS AUSTRALIA,KATHMANDU HOLDINGS (ASX),LEIGHTON HOLDINGS,LEND LEASE GROUP,LYNAS,M2 GROUP,MACQUARIE ATLAS ROADS,MACQUARIE GROUP,MAGELLAN FINANCIAL GP.,MCMILLAN SHAKESPEARE,MEDUSA MINING,MERMAID MARINE AUS.,MESOBLAST,METCASH,MINERAL RESOURCES,MIRVAC GROUP,MONADELPHOUS GROUP,MOUNT GIBSON IRON,MYER HOLDINGS,NRW HOLDINGS,NATIONAL AUS.BANK,NAVITAS,NEWCREST MINING,NEWS CL.B VTG.CS.CDI. DEFERRED,NINE ENTERTAINMENT,NORTHERN STAR,NUFARM,OZ MINERALS,OIL SEARCH,ORICA,ORIGIN ENERGY (EX BORAL),ORORA,PACIFIC BRANDS,PALADIN ENERGY,PANAUST,PERPETUAL,PLATINUM ASSET MAN.,PREMIER INVESTMENT,PRIMARY HEALTH CARE,QBE INSURANCE GROUP,QANTAS AIRWAYS,QUBE HOLDINGS,REA GROUP,RAMSAY HEALTH CARE,RECALL HOLDINGS,REGIS RESOURCES,RESMED CDI.,RESOLUTE MINING

Oct 14, 2021
SOLUTION.PDF

Get Answer To This Question

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here