Master of Business Analytics assignment. Involves Coding in r programme

1 answer below »

Master of Business Analytics assignment. Involves Coding in r programme

Western Sydney University The Nature of Data (MATH 7016) Assignment 2022 Q2 Due 11:59pm Sunday 22nd of May 2022 Introduction This assignment consists of four questions, each of equal value, giving a total contribution of 40% to this subject. The beginning of each question provides a breakdown of marks for each part in that question. For example, a breakdown of (1 + 3 + 6 = 10) implies a question consisting of three parts, where the first, second and third parts are worth 1, 3 and 6 marks respectively. Important R and only packages (that is, R libraries) described in the lectures and tutorials for this subject can be used for generating answers for this assignment! In addition, RMarkdown must not be used1. Penalties may apply for non-conformance. Answer structure In doing this assignment, you should not need to use the maximum word limit declared in the Learning Guide. Note that marks are not awarded for using many words, rather, for using an economy of words and only stating what is relevant to the question being answered. Consider the old adage – “less can be more”. Therefore, show you know what is relevant and have mastered the ability to get to the point using few, simple words and clear sentences. Seek to also apply this philosophy to the code you write. Your answers to this assignment are to be provided in a single R script file. All material in your script file should be logically organised, so that related material can be easily and quickly located. Clearly identify yourself in this file, as a minimum: full name and student ID, as comments at the beginning of the file. R script file Textual answers should be included as comments in your script file, refer to listing 1 for examples on including comments. The comments in your script file should be: • Brief and to the point • Stating a high level perspective • Stating what is not immediately obvious, but worth mentioning 1In short, only use R and R packages described in the lectures and tutorials for this subject. You are also not allowed to use RMarkdown for this assignment. All these requirements will also apply for the exam. 1 # In an R script file , comments are prefixed with the hash symbol # A line with just a comment on it # Generate a distribution of mean values from a sequence of digits d <- replicate="" (1000="" ,="" {="" s=""><- sample (0:9, replace = true) # generate a sequence of 10 numeric digits mean(s) # a comment to end a line with r code on it }) hist(d, main = ’distribution of means ’) # show distribution of means # of cause you would use smarter comments than those used here # only state what is not immediately obvious listing 1: some r code with comments (shown in green) be brief and to-the-point with respect to comments. the approach described in this section should be the same as used for the exam. make judicious use of comments a priority in doing this assignment. after all, comments are meant to communicate important and useful details. make sure you also communicate well through wise choices in variable and function names. also make wise decisions regarding the layout of everything inside your r script file. note if things go wrong, good organisation and comments can help you, since they can show if appropriate logic was intended. plagiarism this is an individual effort assignment, therefore the answers you provide must be your own. you may learn from others, but the understanding claimed by your assignment must be yours. if you include any material in this assignment that is not your own, you must acknowledge that fact and declare the source of that material. be warned, your answers will be checked for plagiarism and if caught, significant penalties may apply. submission once you have completed the assignment, you must upload your r script file via turnitin; if you wish, you can also e-mail your r script file directly to me2. this maybe wise if you are having trouble with turnitin or vuws and are at risk of submitting late. once you have e-mailed, seek to successfully submit via turnitin. be aware that you may need to rename your r script file by adding the extension “.txt”, otherwise you may not be successful in submitting via turnitin. on a windows machine you can easily add a “.txt” extension via file explorer. select the “view” tab and tick “file name extensions”, refer figure 1. then select the file to be renamed, press f2 to enter edit mode and add “.txt” to the very end of the file name; do not remove the “.r” portion of the file name. hopefully a similar process is available on other platforms. determine the method you will use and test it prior to submission. you must submit your assignment no later than the due date declared on the first page of this assignment, otherwise late submission penalties will apply, as described in the section titled “late submission penalties”. prior to the due date, you may replace a previously submitted version, but only the last submitted version will be marked! [email protected] 2 figure 1: how to add “.txt” to the file extension on windows late submission penalties late submission penalties exist. the contribution value of the assignment will reduce by 10% per day, for each day after the submission date; therefore four marks per day. for example, if your assignment is four days late, the maximum possible mark you can score for the assignment is 24 out of 40. 3 question 1 (1 + 2 + 2 + 3 + 2 = 10) are the distributions the same? table 1 contains the distribution of students across two faculties and levels of degrees. the table contains the number of students successfully completing a degree in the specified faculty. does a statistically significant difference exist between the faculties? bachelor masters doctorate engineering 360 141 14 science 616 309 32 table 1: distribution of degrees awarded (i) using r code, load the above data within an appropriate data structure(s). include labels in your data structure(s). include brief descriptions regarding the key parts of your code as comments within your r code. (ii) using r, produce what you believe is the most useful visualisation that shows how the distribu- tions of faculty and degrees vary. make the visualisation worthy of inclusion in a report. briefly describe any key observations seen in the visualisation and make a prediction whether a statistically significant difference exists. (iii) perform a hypothesis test in order to determine whether a statistically significant difference exists between the distributions. you are free to use the simplest method available, as used in lectures or tutorials. but make sure you include the following: • the null and alternative hypotheses used • any assumptions, or important details / parameters used • declare the result of the hypothesis test • what does the hypothesis test result mean with respect to the distributions (iv) repeat (iii) in its entirety, except for the following: develop your own code from scratch, hence make use of the r replicate() function and other basic functions as used in lectures or tutorials. (v) finally, compare and contrast the results from (iii) and (iv) and briefly comment on what you found. 4 question 2 (2 + 2 + 4 + 2 = 10) is there a statistically significant difference? the provided dataset, contained in the file called “question2 dataset.csv”, reports on the perfor- mance of two new drugs. the dataset contains two columns labeled val and grp. val is a measure of the drug’s performance, assume performance / effectiveness is proportional to the magnitude of val. consider grp to simply state the particular drug trialled. (i) produce a single but useful visualisation of the dataset and briefly interpret what you see. make the visualisation worthy of a report. briefly state why you chose that visualisation. (ii) perform a hypothesis test using the most appropriate statistical method. note that you are free to use a single line of code, as used in lectures or tutorials. make sure you clearly include the following: • given what you saw in (i), state how you will compare the drugs and why • the null and alternative hypotheses used • any assumptions, or important details / parameters used • declare the results and interpretation of the results of the hypothesis test (iii) repeat (ii) in entirety, developing r code from scratch, but using a permutation test approach, as used in lectures or tutorials. (iv) briefly compare and contrast the results of (ii) and (iii). 5 question 3 (2 + 3 + 2 + 3 = 10) predicting demand a particular help desk, always has seventeen operators on duty. on average only fourteen operators are simultaneously busy helping customers. (i) what is the probability that all operators will be simultaneously busy? briefly explain the approach used. (ii) what is the probability that one or more callers will have to wait for an operator to become available? briefly explain the approach used. (iii) draw a nicely presented plot showing operator demand for zero to twenty-five operators. make this plot appropriate for a report to management. estimating probability imagine an exam that consists of seven multiple choice questions and each question has five possible answers, but only one is correct. also imagine that answers to the questions are to be randomly selected. (iv) write r code to determine the theoretic probability of getting five or more questions correct by luck. make sure you briefly include code comment(s) stating the logic of how you calculated the answer. 6 question 4 (6 + 2 + 2 = 10) interval estimation using a 90% confidence interval approach, what is the fewest and largest number of heads expected when a biased coin (75% probably of heads) is tossed 50 times? (i) using a simulation method as shown in the lectures and tutorials of this subject, generate a distri- bution suitable for determining a confidence interval; use code you developed from scratch3. briefly explain the key steps involved as simple comments within your source code. (ii) for the distribution obtained above in (i):- • determine the mean number of heads • determine the confidence interval (iii) produce an appropriate plot showing the distribution obtained in (i) and include the details obtained in (ii). make the plot worthy of a report. 3from scratch means making use of r replicate() and associated functions. 7 sample="" (0:9,="" replace="TRUE)" #="" generate="" a="" sequence="" of="" 10="" numeric="" digits="" mean(s)="" #="" a="" comment="" to="" end="" a="" line="" with="" r="" code="" on="" it="" })="" hist(d,="" main="’Distribution" of="" means="" ’)="" #="" show="" distribution="" of="" means="" #="" of="" cause="" you="" would="" use="" smarter="" comments="" than="" those="" used="" here="" #="" only="" state="" what="" is="" not="" immediately="" obvious="" listing="" 1:="" some="" r="" code="" with="" comments="" (shown="" in="" green)="" be="" brief="" and="" to-the-point="" with="" respect="" to="" comments.="" the="" approach="" described="" in="" this="" section="" should="" be="" the="" same="" as="" used="" for="" the="" exam.="" make="" judicious="" use="" of="" comments="" a="" priority="" in="" doing="" this="" assignment.="" after="" all,="" comments="" are="" meant="" to="" communicate="" important="" and="" useful="" details.="" make="" sure="" you="" also="" communicate="" well="" through="" wise="" choices="" in="" variable="" and="" function="" names.="" also="" make="" wise="" decisions="" regarding="" the="" layout="" of="" everything="" inside="" your="" r="" script="" file.="" note="" if="" things="" go="" wrong,="" good="" organisation="" and="" comments="" can="" help="" you,="" since="" they="" can="" show="" if="" appropriate="" logic="" was="" intended.="" plagiarism="" this="" is="" an="" individual="" effort="" assignment,="" therefore="" the="" answers="" you="" provide="" must="" be="" your="" own.="" you="" may="" learn="" from="" others,="" but="" the="" understanding="" claimed="" by="" your="" assignment="" must="" be="" yours.="" if="" you="" include="" any="" material="" in="" this="" assignment="" that="" is="" not="" your="" own,="" you="" must="" acknowledge="" that="" fact="" and="" declare="" the="" source="" of="" that="" material.="" be="" warned,="" your="" answers="" will="" be="" checked="" for="" plagiarism="" and="" if="" caught,="" significant="" penalties="" may="" apply.="" submission="" once="" you="" have="" completed="" the="" assignment,="" you="" must="" upload="" your="" r="" script="" file="" via="" turnitin;="" if="" you="" wish,="" you="" can="" also="" e-mail="" your="" r="" script="" file="" directly="" to="" me2.="" this="" maybe="" wise="" if="" you="" are="" having="" trouble="" with="" turnitin="" or="" vuws="" and="" are="" at="" risk="" of="" submitting="" late.="" once="" you="" have="" e-mailed,="" seek="" to="" successfully="" submit="" via="" turnitin.="" be="" aware="" that="" you="" may="" need="" to="" rename="" your="" r="" script="" file="" by="" adding="" the="" extension="" “.txt”,="" otherwise="" you="" may="" not="" be="" successful="" in="" submitting="" via="" turnitin.="" on="" a="" windows="" machine="" you="" can="" easily="" add="" a="" “.txt”="" extension="" via="" file="" explorer.="" select="" the="" “view”="" tab="" and="" tick="" “file="" name="" extensions”,="" refer="" figure="" 1.="" then="" select="" the="" file="" to="" be="" renamed,="" press="" f2="" to="" enter="" edit="" mode="" and="" add="" “.txt”="" to="" the="" very="" end="" of="" the="" file="" name;="" do="" not="" remove="" the="" “.r”="" portion="" of="" the="" file="" name.="" hopefully="" a="" similar="" process="" is="" available="" on="" other="" platforms.="" determine="" the="" method="" you="" will="" use="" and="" test="" it="" prior="" to="" submission.="" you="" must="" submit="" your="" assignment="" no="" later="" than="" the="" due="" date="" declared="" on="" the="" first="" page="" of="" this="" assignment,="" otherwise="" late="" submission="" penalties="" will="" apply,="" as="" described="" in="" the="" section="" titled="" “late="" submission="" penalties”.="" prior="" to="" the="" due="" date,="" you="" may="" replace="" a="" previously="" submitted="" version,="" but="" only="" the="" last="" submitted="" version="" will="" be="" marked!="" [email protected]="" 2="" figure="" 1:="" how="" to="" add="" “.txt”="" to="" the="" file="" extension="" on="" windows="" late="" submission="" penalties="" late="" submission="" penalties="" exist.="" the="" contribution="" value="" of="" the="" assignment="" will="" reduce="" by="" 10%="" per="" day,="" for="" each="" day="" after="" the="" submission="" date;="" therefore="" four="" marks="" per="" day.="" for="" example,="" if="" your="" assignment="" is="" four="" days="" late,="" the="" maximum="" possible="" mark="" you="" can="" score="" for="" the="" assignment="" is="" 24="" out="" of="" 40.="" 3="" question="" 1="" (1="" +="" 2="" +="" 2="" +="" 3="" +="" 2="10)" are="" the="" distributions="" the="" same?="" table="" 1="" contains="" the="" distribution="" of="" students="" across="" two="" faculties="" and="" levels="" of="" degrees.="" the="" table="" contains="" the="" number="" of="" students="" successfully="" completing="" a="" degree="" in="" the="" specified="" faculty.="" does="" a="" statistically="" significant="" difference="" exist="" between="" the="" faculties?="" bachelor="" masters="" doctorate="" engineering="" 360="" 141="" 14="" science="" 616="" 309="" 32="" table="" 1:="" distribution="" of="" degrees="" awarded="" (i)="" using="" r="" code,="" load="" the="" above="" data="" within="" an="" appropriate="" data="" structure(s).="" include="" labels="" in="" your="" data="" structure(s).="" include="" brief="" descriptions="" regarding="" the="" key="" parts="" of="" your="" code="" as="" comments="" within="" your="" r="" code.="" (ii)="" using="" r,="" produce="" what="" you="" believe="" is="" the="" most="" useful="" visualisation="" that="" shows="" how="" the="" distribu-="" tions="" of="" faculty="" and="" degrees="" vary.="" make="" the="" visualisation="" worthy="" of="" inclusion="" in="" a="" report.="" briefly="" describe="" any="" key="" observations="" seen="" in="" the="" visualisation="" and="" make="" a="" prediction="" whether="" a="" statistically="" significant="" difference="" exists.="" (iii)="" perform="" a="" hypothesis="" test="" in="" order="" to="" determine="" whether="" a="" statistically="" significant="" difference="" exists="" between="" the="" distributions.="" you="" are="" free="" to="" use="" the="" simplest="" method="" available,="" as="" used="" in="" lectures="" or="" tutorials.="" but="" make="" sure="" you="" include="" the="" following:="" •="" the="" null="" and="" alternative="" hypotheses="" used="" •="" any="" assumptions,="" or="" important="" details="" parameters="" used="" •="" declare="" the="" result="" of="" the="" hypothesis="" test="" •="" what="" does="" the="" hypothesis="" test="" result="" mean="" with="" respect="" to="" the="" distributions="" (iv)="" repeat="" (iii)="" in="" its="" entirety,="" except="" for="" the="" following:="" develop="" your="" own="" code="" from="" scratch,="" hence="" make="" use="" of="" the="" r="" replicate()="" function="" and="" other="" basic="" functions="" as="" used="" in="" lectures="" or="" tutorials.="" (v)="" finally,="" compare="" and="" contrast="" the="" results="" from="" (iii)="" and="" (iv)="" and="" briefly="" comment="" on="" what="" you="" found.="" 4="" question="" 2="" (2="" +="" 2="" +="" 4="" +="" 2="10)" is="" there="" a="" statistically="" significant="" difference?="" the="" provided="" dataset,="" contained="" in="" the="" file="" called="" “question2="" dataset.csv”,="" reports="" on="" the="" perfor-="" mance="" of="" two="" new="" drugs.="" the="" dataset="" contains="" two="" columns="" labeled="" val="" and="" grp.="" val="" is="" a="" measure="" of="" the="" drug’s="" performance,="" assume="" performance="" effectiveness="" is="" proportional="" to="" the="" magnitude="" of="" val.="" consider="" grp="" to="" simply="" state="" the="" particular="" drug="" trialled.="" (i)="" produce="" a="" single="" but="" useful="" visualisation="" of="" the="" dataset="" and="" briefly="" interpret="" what="" you="" see.="" make="" the="" visualisation="" worthy="" of="" a="" report.="" briefly="" state="" why="" you="" chose="" that="" visualisation.="" (ii)="" perform="" a="" hypothesis="" test="" using="" the="" most="" appropriate="" statistical="" method.="" note="" that="" you="" are="" free="" to="" use="" a="" single="" line="" of="" code,="" as="" used="" in="" lectures="" or="" tutorials.="" make="" sure="" you="" clearly="" include="" the="" following:="" •="" given="" what="" you="" saw="" in="" (i),="" state="" how="" you="" will="" compare="" the="" drugs="" and="" why="" •="" the="" null="" and="" alternative="" hypotheses="" used="" •="" any="" assumptions,="" or="" important="" details="" parameters="" used="" •="" declare="" the="" results="" and="" interpretation="" of="" the="" results="" of="" the="" hypothesis="" test="" (iii)="" repeat="" (ii)="" in="" entirety,="" developing="" r="" code="" from="" scratch,="" but="" using="" a="" permutation="" test="" approach,="" as="" used="" in="" lectures="" or="" tutorials.="" (iv)="" briefly="" compare="" and="" contrast="" the="" results="" of="" (ii)="" and="" (iii).="" 5="" question="" 3="" (2="" +="" 3="" +="" 2="" +="" 3="10)" predicting="" demand="" a="" particular="" help="" desk,="" always="" has="" seventeen="" operators="" on="" duty.="" on="" average="" only="" fourteen="" operators="" are="" simultaneously="" busy="" helping="" customers.="" (i)="" what="" is="" the="" probability="" that="" all="" operators="" will="" be="" simultaneously="" busy?="" briefly="" explain="" the="" approach="" used.="" (ii)="" what="" is="" the="" probability="" that="" one="" or="" more="" callers="" will="" have="" to="" wait="" for="" an="" operator="" to="" become="" available?="" briefly="" explain="" the="" approach="" used.="" (iii)="" draw="" a="" nicely="" presented="" plot="" showing="" operator="" demand="" for="" zero="" to="" twenty-five="" operators.="" make="" this="" plot="" appropriate="" for="" a="" report="" to="" management.="" estimating="" probability="" imagine="" an="" exam="" that="" consists="" of="" seven="" multiple="" choice="" questions="" and="" each="" question="" has="" five="" possible="" answers,="" but="" only="" one="" is="" correct.="" also="" imagine="" that="" answers="" to="" the="" questions="" are="" to="" be="" randomly="" selected.="" (iv)="" write="" r="" code="" to="" determine="" the="" theoretic="" probability="" of="" getting="" five="" or="" more="" questions="" correct="" by="" luck.="" make="" sure="" you="" briefly="" include="" code="" comment(s)="" stating="" the="" logic="" of="" how="" you="" calculated="" the="" answer.="" 6="" question="" 4="" (6="" +="" 2="" +="" 2="10)" interval="" estimation="" using="" a="" 90%="" confidence="" interval="" approach,="" what="" is="" the="" fewest="" and="" largest="" number="" of="" heads="" expected="" when="" a="" biased="" coin="" (75%="" probably="" of="" heads)="" is="" tossed="" 50="" times?="" (i)="" using="" a="" simulation="" method="" as="" shown="" in="" the="" lectures="" and="" tutorials="" of="" this="" subject,="" generate="" a="" distri-="" bution="" suitable="" for="" determining="" a="" confidence="" interval;="" use="" code="" you="" developed="" from="" scratch3.="" briefly="" explain="" the="" key="" steps="" involved="" as="" simple="" comments="" within="" your="" source="" code.="" (ii)="" for="" the="" distribution="" obtained="" above="" in="" (i):-="" •="" determine="" the="" mean="" number="" of="" heads="" •="" determine="" the="" confidence="" interval="" (iii)="" produce="" an="" appropriate="" plot="" showing="" the="" distribution="" obtained="" in="" (i)="" and="" include="" the="" details="" obtained="" in="" (ii).="" make="" the="" plot="" worthy="" of="" a="" report.="" 3from="" scratch="" means="" making="" use="" of="" r="" replicate()="" and="" associated="" functions.="">

assignment2022q2-reqkhk0q.pdf question2dataset-1-negrotej.csv

Answered 2 days AfterMay 19, 2022

Answer To: Master of Business Analytics assignment. Involves Coding in r programme

Mohd answered on May 21 2022

95 Votes

-
-
-
5/19/2022
library(readr)
library(magrittr)
library(dplyr)
library(ggplot2)
library(rmarkdown)
library(tidyr)
Bachelor Masters Doctorate Engineering 360 141 14 Science 616 309 32
mytab <- matrix(c(360,616, 141,309, 14,32), ncol=3, byrow=FALSE)
colnames(mytab) <- c('Bachelor','Masters','Doctorate')
rownames(mytab) <- c('Engineering','Science')
mytab <- as.table(mytab)
Null and Alternative Hypotheses used Null Hypothesis: There is no association between between the faculties and degrees. Alternative Hypothesis: There is a association between the faculties and degrees. As we can see from Chi square test output, t(2)=4.6062 and P value>0.05. we will accept the null hypothesis and reject the alternative hypothesis. hence there is no association between the faculties and degrees at five percent significance level. if we increase significance lvel to 10 percent, then there is a association between the faculties and degrees | P value<0.1.
chisq.test(mytab)
##
## Pearson's Chi-squared test
##
## data: mytab
## X-squared = 4.6062, df = 2, p-value = 0.09995
library(readr)
question2dataset <- read_csv("New folder (3)/question2dataset.csv")
View(question2dataset)
ggplot(data=question2dataset)+
geom_boxplot(mapping = aes(x=Grp,y=Val),outlier.colour = "red",
outlier.shape = 2,
outlier.size = 3)+
geom_point(mapping = aes(x=Grp,y=Val))
hist(question2dataset$Val,main="Histogram of values")
t.test(Val~Grp, data=question2dataset)
##
## Welch Two Sample t-test
##
## data: Val by Grp
## t = -1.9016, df = 47.286, p-value = 0.06333
## alternative hypothesis: true difference in means between group a and group b is not equal to 0
## 95 percent confidence interval:
## -4.2206415 0.1185016
## sample estimates:
## mean in group a mean in group b
## 100.3086 102.3596
mean(question2dataset$Val[question2dataset$Grp == "a"])
## [1] 100.3086
mean(question2dataset$Val[question2dataset$Grp == "b"])
## [1] 102.3596
test.stat1 <- abs(mean(question2dataset$Val[question2dataset$Grp == "a"]) -
mean(question2dataset$Val[question2dataset$Grp == "b"]))
test.stat1
## [1] 2.05107
median(question2dataset$Val[question2dataset$Grp == "a"])
## [1] 100.2253
median(question2dataset$Val[question2dataset$Grp == "b"])
## [1] 102.1962
test.stat2 <- abs(median(question2dataset$Val[question2dataset$Grp == "a"]) -median(question2dataset$Val[question2dataset$Grp == "b"]))
test.stat2
## [1] 1.970877
Permutation Test
set.seed(1979)
n <- length(question2dataset$Grp)
P <- 100000
variable <- question2dataset$Val
PermSamples <- matrix(0,...

SOLUTION.PDF

Master of Business Analytics assignment. Involves Coding in r programme

Answer To: Master of Business Analytics assignment. Involves Coding in r programme

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment