COMP 5070 Exam SP5 2018 COMP 5070 Statistical Programming for Data Science TakeHomeExam DUE:by11:55PM(CST),Friday23rd November • Thetake---homeexamisworth30%ofyouroverallgrade....

1 answer below »
All the instructions are mentioned in the attached file.



COMP 5070 Exam SP5 2018 COMP 5070 Statistical Programming for Data Science TakeHomeExam DUE:by11:55PM(CST),Friday23rd November • Thetake---homeexamisworth30%ofyouroverallgrade.Theexamisoutof100marks. • Theexamistobesubmittedonlineasacompressedfile(e.g..zip,.tar.gz,.gz).This compressedfileshouldincludeALLcodeneededtorunyourprogramandanyotherfilesyou createdyourself.YoudoNOTneedtoincludeanydatafilesprovidedtoyou,asitwillbe assumedItoohavethemJ • Toobtainthemaximumavailablemarksyoushouldaimto: 1. Codeallrequestedcomponents(30%). 2. Useaclearstyleofcodepresentation(10%).Codeclarityisanimportantpartofyour submission.Thusyoushouldchoosemeaningfulvariablenamesandadopttheuseof comments---youdon'tneedtocommenteverysingleline,asthiswillaffectreadability--- howeveryoushouldaimtocommentatleasteachsectionofcode. 3. Havethecoderunsuccessfully(5%). 4. Outputtheinformationinapresentablemannerandpresentyourwrittenanalysisofthe output.(55%). • Plagiarismisaspecificformofacademicmisconduct.AlthoughtheUniversityencourages discussingworkwithothersandtheSocialForumwillsupportthis,ultimatelythissubmissionis torepresentyourindividualwork.Ifplagiarismisfound,allpartieswillbepenalised.Youshould retaincopiesofallassignmentcomputerfilesusedduringdevelopment.Thesefilesmustremain unchangedaftersubmission,forthepurposeofcheckingifrequired. • Forthepurposeofthisexam,a“paragraph”isconsideredtoconsistofapproximately6---8lines. YouarewelcometoexceedthisamountJ • Thisexamappearslongerthanitactuallyis–explanationsaregiventohelpyouunderstand therequestedanalysesandIhavealsoprovidedhints. • Youdonotneedtowritespecialisedcodeasyoudidfortheassignments.Youshouldbeable tofindnearlyallthecodeyouneedfromtheRfilesprovidedthroughoutthecourse,viacase studiesandotherexamples.Ifyoucopy/pastecodefromtheRcodeIhaveprovided,this shouldgiveyounearly100%ofthecodeneededforthisexam,withafewalterationsonyour behalf(e.g.filenames,variablenamesetc). Question1(60Marks) It’s All in the Taste ExpertsvsAmateurs Whoisbetteratdiscerningthetastesof supermarketchocolate?Doyoureallyneed trainingtoknowifyoulikeit?Ordoesitall justtastereallygood? TheExpertsbattleitoutagainstagroupof dedicatedchocolate-eatingAmateurs! IwouldreallyliketohavethatjobJ Thedataforthisquestionaretheresponsestothesensometricqualitiesofchocolatethatcanbepurchasedin supermarkets.Twogroupswereaskedtoratethequalitiesofthechocolates:thefirstgroupcontainedapanel ofsensometricexpertswithresponsesrecordedover9differenttastingsessions.Theaccompanyingdataisin chocolate_experts.csv. Thesecondgroupcontainedapanelofvolunteerschosentorepresent ‘regularshoppers’whounderwenta three-hour sensometric training sessionbefore rating thequalitiesof the chocolateover 2different tasting sessions.Theaccompanyingdataisinchocolate_amateurs.csv. The responses were recorded over a continuous scale from 0 to 10 with 0 indicating the absence of the sensometricqualityand10indicatingfullypresent.Itisofinteresttodetermineifexpertsperceivesupermarket chocolatedifferentlytonon-experts(theamateurs)using14sensometricvariables(ChocolateAromathrough toGranularTextureinthedatafiles). Forthisquestionyouneedtorandomlyobtaintwosessionidsfortheexpertresponsesonlybymakingacallto sampleasshownbelow.Thetwonumbersthatarereturnedareyoursessionidsthatyouneedtoextractfor youranalysis. sample(9,2) Fortheexpertdatayouwillonlyneedtoanalysetheresponsescorrespondingtothetworandomlyselected sessionids.Amateurdataneedstobeusedinfull. Youareaskedtocomparetheresponsesbetweenthetwogroupsasrequestedineachpartbelow.Apartiallywritten Rscriptisavailableaspartoftheexampackage.Youmustusethisscriptforyouranalysisandfollowtheinstructions therein.Anylinesmarkedwith ####!!!EXAMTIP!!! requiresyoutochangethatlineofcodetosuityourpurposes.Furtherdetailsareprovidedinthecodecomments aroundthatline. Forthepurposesofthisexamaparagraphis8-12linesoftext.Specifically,youranalysisshouldinclude: i) Initial Data Discussion: Write a short explanation (approximately 1 paragraph) of the analysis to be performedandanexplanationofthedata.IncludeyoursessionIDsfortheexpertresponses,andanydata manipulationperformedpriortoanalysisshouldyoudoso. ii) ExploratoryFactorAnalysis:conducttwoseparateexploratoryfactoranalyses:thefirstforyourselectedid sessions for theexpert responses, theother for the full set of amateur responses. Youmaypresent the analysesside-by-sideorinsequence;howeveryoubelieveisbest.ForeachExploratoryFactorAnalysisyou onlyneedtoincludethefollowing: ForeachExploratoryFactorAnalysisyouneedtoincludethefollowing: v Ifappropriate,CronbachAlphaoutputanda shortdiscussion (2---3 lines)ofwhether thedataistrustworthyandwhy. v Correlation output of your choosing (graphical and/or numerical) with an accompanyingdiscussion(3---4lines).Ifnumerical,roundthecorrelationsto2digits; v Asingleparagraphexplainingtheoutcomeofthedeterminanttest,Bartlett’stestof sphericityandtheKMOstatisticforbothdatasets.DonotincludeRoutput. v Yourdecisionregardingthenumberoffactorstoestimate(screeplotmaybeshown, donotshowtheRconsoleoutput). v TheFINALfactorsolution.Youdonotneedtodiscussresultsofanyoftheothersolutions, however you should justify your final factor solution, including loadings, and name the factorsineachanalysis.Youshouldalsoincludeuptotwosentencesindicatingwhetherthe testofresidualswaspassedandwhetherthefactorsarecorrelated. v Allfactorsshouldbenamedandanexplanationastohowyoucomeupwiththese namesshouldbeincluded. v Basedonthefactoranalysisresultsandyourchosenfactornames,discussthefactors thathaveemergedfromthestudy.Whattypesofdifferences(ifany)existbetween theexpertandamateursensometricratings? iii) Conclusions:write2paragraphsofconclusionsbasedonyouranalysis. Hints: v Tomakethecorrelationmatrixmorereadable,usetheround() commandinR,e.g. round(cor(df, 2)) willcomputethecorrelationmatrixofthedatainthematrixdf,totwodecimalplaces.Youcanuse thistipforanyothermatricestoo. v Thebestsolutionmayormaynotbetherotatedsolution,basedonyourrandomlyselected sessions.ChooseyoursolutionbasedontheprinciplesofagoodExploratoryFactor Analysis(EFA). v Ifitemsarenotloadingontoafactor,onereasoncouldbethatyouhavenotextracted enoughfactorsfromthedata.Reconsideryouranalysisifnecessaryhoweverthismaynot solvetheproblem.UsetheprinciplesofEFAtomakeyourfinaldecision. v WhilenosplitloadingsaredesirableinEFA,asmallnumbermaybeunavoidable.Againyou shouldultimatelychooseyourfinalsolutionbasedontheprinciplesofwhatconstitutesa goodExploratoryFactorAnalysis. v Ifthecorrelationsbetweenfactorssuggestanobliquerotationisrequired,simplynotethis inyourdiscussion.Donotre-runtheanalysis. Question2(40Marks) Are We There Yet? ClusteringCitiesAroundtheWorld Thedataforthisquestionaredistancesbetweencitiesindifferentregionsoftheworld. Youwillneedtousethedatasetindividuallyassignedtoyou. Thefilecities.xlsxontheAssignmentspageindicatesthecontinentassignedtoeachstudent. Each data set contains a distancematrix and can be found on the assignments page, in a file of the form RegionCitiesClustering.dat. For example, for the European data the file will be called EuropeanCitiesClustering.dat. For this question, you are asked to conduct clustering analysis using both hierarchicalandpartitionalclusteringtechniques. Forthepurposesofthisexamaparagraphis8-12linesoftext.Specifically,youranalysisshouldinclude: i) Initial Data Discussion: Write a short explanation (approximately 1 paragraph) of the analysis to be performedandanexplanationofthedataincludinganydatamanipulationperformedpriortoclustering. ii) Hierarchical clustering: conduct hierarchical clustering on the data, choosing an appropriate AGNES- basedmethodbasedoneithersingle,complete,average-linkageorWard’smethod.Ensureyoujustify your choice in your write-up and include the resulting dendrogram, as well as a discussion of the outcomesofhierarchicalclusteringonyourdata. iii) Partitionalclustering:conductapartitionalclusteringofyourdatausingK-means.Ensureyouexplain and include any relevant R output (including graphics) supporting your choice of k, the number of clusters. iv) Discussion:(1-2paragraphs)ofyourresults. v) Validation:asaformofclustervalidation,considerthefollowing: Ifthereareobviousoutliersordistancesthatshouldberemoved,identifytheseinyourwrite-upandre-run yourchosenPartitionalClusteringalgorithm,adjustingkifnecessary.Includejustificationofyourchoiceof thenewvaluefork. If there are no obvious outliers/distances that should be removed, then explain this conclusion with justification.Inthiscasere-runyourchosenPartitionalClusteringalgorithmforadifferentvalueofktothat usedinStep3above.Includejustificationofyourchoiceforthenewvaluefork. vi) Conclusions:write2paragraphsofconclusionsbasedonyouranalysis includingastatementregardingwhich clusteringsolutionisthebetteroneandwhy. Hint: v Forhierarchicalclustering,ensureyoudefinetheheightofthedendrogramaccordingtothesizeofthevalues intheoutput.
Answered Same DayNov 10, 2020COMP 5070University Of South Australia

Answer To: COMP 5070 Exam SP5 2018 COMP 5070 Statistical Programming for Data Science TakeHomeExam...

Aakarsh answered on Nov 20 2020
153 Votes
Ques1/Chocolate.pdf
Sensometric qualities of Chocolates
Two groups were asked to rate the qualities of the chocolates:
The responses were recorded over a continuous scale from 0 to 10 with 0 indicating the absence of the sensometric quality and 10
indicating fully present.
The first group contained a panel of sensometric experts with responses recorded over 9 different tasting sessions.
The second group contained a panel of volunteers chosen to represent ‘regular shoppers’ who underwent a three-hour sensometric training
session before rating the qualities of the chocolate over 2 different tasting sessions
Let’s determine if experts perceive supermarket chocolate differently to non-experts (the amateurs) using 14 sensometric variables.
Initial Data Discussion
Following sensometric variables of chocolate quality were responded by group of experts and amatuers on the scale of 1 to 10.
## [1] "Chocolate.Aroma" "Milk.Aroma" "Sweetness"
## [4] "Acidit
y" "Bitterness" "Chocolate.Flavour"
## [7] "Milk.Flavour" "Caramel.Flavour" "Vanilla.Flavour"
## [10] "Astringency" "Crispy.Texture" "Melting.Texture"
## [13] "Sticky.Texture" "Granular.Texture"
Lets do Exploratory Factor Analysis for both the groups seperately and find out how they are related to each other. We will try to find most
useful sensometric variables selected by users and compare results for both experts and amateurs. We will check whether data is
trustworthy and how variables are correlated using various statistical methods and tests. Also visualise them using plots for better analysis.
Finally we will find some conclusions based on the analysis performed.
Exploratory Factor Analysis
For experts data
Cronbach Alpha output
Cronbach’a alpha is the measure of the reliability and consistency of the sampling instrument and examine whether all the data is
measuring the same underlying construct.
##
## Reliability analysis
## Call: alpha(x = choc_e_sess)
##
## raw_alpha std.alpha G6(smc) average_r S/N ase mean sd median_r
## 0.46 0.48 0.73 0.061 0.91 0.046 3.7 0.78 0.054
##
## lower alpha upper 95% confidence boundaries
## 0.37 0.46 0.55
##
## Reliability if an item is dropped:
## raw_alpha std.alpha G6(smc) average_r S/N alpha se
## Chocolate.Aroma 0.41 0.44 0.69 0.056 0.78 0.051
## Milk.Aroma 0.50 0.51 0.73 0.074 1.04 0.042
## Sweetness 0.49 0.49 0.73 0.068 0.96 0.043
## Acidity 0.42 0.45 0.72 0.059 0.81 0.051
## Bitterness 0.47 0.49 0.72 0.068 0.95 0.046
## Chocolate.Flavour 0.44 0.46 0.71 0.062 0.87 0.049
## Milk.Flavour 0.49 0.48 0.70 0.066 0.91 0.043
## Caramel.Flavour 0.44 0.43 0.70 0.056 0.77 0.048
## Vanilla.Flavour 0.43 0.42 0.71 0.054 0.74 0.049
## Astringency 0.36 0.41 0.69 0.050 0.68 0.056
## Crispy.Texture 0.44 0.47 0.72 0.063 0.88 0.048
## Melting.Texture 0.46 0.47 0.73 0.064 0.89 0.046
## Sticky.Texture 0.41 0.42 0.72 0.053 0.73 0.050
## Granular.Texture 0.45 0.48 0.74 0.065 0.91 0.048
## var.r med.r
## Chocolate.Aroma 0.102 0.0388
## Milk.Aroma 0.097 0.0604
## Sweetness 0.110 0.0604
## Acidity 0.118 0.0604
## Bitterness 0.098 0.0604
## Chocolate.Flavour 0.098 0.0604
## Milk.Flavour 0.091 0.0604
## Caramel.Flavour 0.106 0.0604
## Vanilla.Flavour 0.116 0.0604
## Astringency 0.114 0.0309
## Crispy.Texture 0.108 0.0322
## Melting.Texture 0.117 0.0604
## Sticky.Texture 0.123 -0.0033
## Granular.Texture 0.117 0.0322
##
## Item statistics
## n raw.r std.r r.cor r.drop mean sd
## Chocolate.Aroma 318 0.47 0.44 0.43 0.2980 6.1 2.2
## Milk.Aroma 318 0.11 0.16 0.12 -0.0857 2.1 2.1
## Sweetness 318 0.21 0.25 0.16 -0.0033 4.3 2.3
## Acidity 318 0.45 0.40 0.33 0.2594 3.1 2.3
## Bitterness 318 0.34 0.26 0.22 0.0958 4.2 2.7
## Chocolate.Flavour 318 0.38 0.34 0.32 0.2012 6.2 2.1
## Milk.Flavour 318 0.22 0.29 0.29 0.0110 1.9 2.3
## Caramel.Flavour 318 0.37 0.45 0.43 0.2067 1.6 1.8
## Vanilla.Flavour 318 0.39 0.48 0.42 0.2648 1.3 1.4
## Astringency 318 0.60 0.53 0.51 0.4162 3.6 2.6
## Crispy.Texture 318 0.37 0.33 0.26 0.1788 5.9 2.2
## Melting.Texture 318 0.29 0.31 0.22 0.0940 4.8 2.2
## Sticky.Texture 318 0.46 0.48 0.38 0.2764 3.7 2.2
## Granular.Texture 318 0.33 0.30 0.18 0.1393 2.9 2.1
Alpha value is around 50 % that is acceptable but weak and even dropping any variable won’t make much effect in its value therefore
keeping it as usual. This shows data is not much reliable.
Correlation Matrix
Here correlation is represented using color intensity.
## Chocolate.Aroma Milk.Aroma Sweetness Acidity Bitterness
## Chocolate.Aroma 1.00 -0.56 -0.26 0.28 0.48
## Milk.Aroma -0.56 1.00 0.30 -0.05 -0.41
## Sweetness -0.26 0.30 1.00 -0.22 -0.51
## Acidity 0.28 -0.05 -0.22 1.00 0.42
## Bitterness 0.48 -0.41 -0.51 0.42 1.00
## Chocolate.Flavour 0.72 -0.49 -0.43 0.24 0.61
## Milk.Flavour -0.42 0.77 0.42 -0.13 -0.50
## Caramel.Flavour -0.20 0.48 0.30 -0.03 -0.31
## Vanilla.Flavour -0.01 0.28 0.21 -0.07 -0.21
## Astringency 0.34 -0.21 -0.15 0.49 0.59
## Crispy.Texture 0.60 -0.47 -0.06 0.11 0.33
## Melting.Texture -0.11 0.28 0.38 -0.24 -0.19
## Sticky.Texture 0.05 0.13 0.31 0.01 -0.21
## Granular.Texture 0.27 -0.24 -0.07 0.21 0.19
## Chocolate.Flavour Milk.Flavour Caramel.Flavour
## Chocolate.Aroma 0.72 -0.42 -0.20
## Milk.Aroma -0.49 0.77 0.48
## Sweetness -0.43 0.42 0.30
## Acidity 0.24 -0.13 -0.03
## Bitterness 0.61 -0.50 -0.31
## Chocolate.Flavour 1.00 -0.47 -0.24
## Milk.Flavour -0.47 1.00 0.70
## Caramel.Flavour -0.24 0.70 1.00
## Vanilla.Flavour -0.06 0.45 0.61
## Astringency 0.33 -0.26 -0.09
## Crispy.Texture 0.48 -0.46 -0.32
## Melting.Texture -0.30 0.38 0.27
## Sticky.Texture -0.03 0.27 0.25
## Granular.Texture 0.35 -0.29 -0.19
## Vanilla.Flavour Astringency Crispy.Texture
## Chocolate.Aroma -0.01 0.34 0.60
## Milk.Aroma 0.28 -0.21 -0.47
## Sweetness 0.21 -0.15 -0.06
## Acidity -0.07 0.49 0.11
## Bitterness -0.21 0.59 0.33
## Chocolate.Flavour -0.06 0.33 0.48
## Milk.Flavour 0.45 -0.26 -0.46
## Caramel.Flavour 0.61 -0.09 -0.32
## Vanilla.Flavour 1.00 0.01 -0.16
## Astringency 0.01 1.00 0.26
## Crispy.Texture -0.16 0.26 1.00
## Melting.Texture 0.20 0.00 0.00
## Sticky.Texture 0.26 0.08 0.07
## Granular.Texture -0.13 0.30 0.26
## Melting.Texture Sticky.Texture Granular.Texture
## Chocolate.Aroma -0.11 0.05 0.27
## Milk.Aroma 0.28 0.13 -0.24
## Sweetness 0.38 0.31 -0.07
## Acidity -0.24 0.01 0.21
## Bitterness -0.19 -0.21 0.19
## Chocolate.Flavour -0.30 -0.03 0.35
## Milk.Flavour 0.38 0.27 -0.29
## Caramel.Flavour 0.27 0.25 -0.19
## Vanilla.Flavour 0.20 0.26 -0.13
## Astringency 0.00 0.08 0.30
## Crispy.Texture 0.00 0.07 0.26
## Melting.Texture 1.00 0.14 -0.24
## Sticky.Texture 0.14 1.00 0.08
## Granular.Texture -0.24 0.08 1.00
Chocolate.Aroma is positively correlated with Bitterness,Chocolate.Flavour and Crispy.Texture and
negatively with Milk.Aroma and Milk.Flavour
Milk.Aroma is positively correlated with Milk.Flavour, Sweetness, Vanilla Flavour and Caramel flavours.
Sticky Texture and Granular Texture are correlated positively
Determinant, Bartlett and KMO Test
## [1] 0.001128851
## $chisq
## [1] 634.5429
##
## $p.value
## [1] 2.186798e-82
##
## $df
## [1] 91
## Kaiser-Meyer-Olkin factor adequacy
## Call: KMO(r =...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here