Microsoft Word - SIT743-Assignment-1 Page 1 of 8 SIT743 Bayesian Learning and Graphical Models Assignment-1 Total Marks = 120, Weighting - 25% Due date: 26 April 2020 by 11.30 PM...

1 answer below »
bayesian learning and graphical models


Microsoft Word - SIT743-Assignment-1 Page 1 of 8 SIT743 Bayesian Learning and Graphical Models Assignment-1 Total Marks = 120, Weighting - 25% Due date: 26 April 2020 by 11.30 PM --------------------------------------------------------------------------------------------------------------- INSTRUCTIONS: • For this assignment, you need to submit the following THREE files. 1. A written document (A single pdf only) covering all of the items described in the questions. All answers to the questions must be written in this document, i.e, not in the other files (code files) that you will be submitting. All the relevant results (outputs, figures) obtained by executing your R code must be included in this document. For questions that involve mathematical formulas, you may write the answers manually (hand written answers), scan it to pdf and combine with your answer document. Submit a combined single pdf of your answer document. 2. A separate “.R” file or ‘.txt’ file containing your code (R-code script) that you implemented to produce the results. Name the file as “name-StudentID-Ass1- Code.R" (where `name' is replaced with your name - you can use your surname or first name, and StudentID with your student ID). 3. A data file named “name-StudentID-LzMyData.txt" (where `name' is replaced with your name - you can use your surname or first name, and StudentID with your student ID). • All the documents and files should be submitted (uploaded) via SIT 743 Clouddeakin Assignment Dropbox by the due date and time. • Zip files are NOT accepted. All three files should be uploaded separately to the CloudDeakin. • E-mail or manual submissions are NOT allowed. Photos of the document are NOT allowed. • The questions Q2 and Q3 do not require any R programming. ================================================================= Some of the questions in this assignment require you to use the “Lizard Island” dataset. This dataset is given as a CSV file, named “LZIsData.csv”. You can download this from the Assignment folder in CloudDeakin. Below is the description of this dataset. Lizard Island dataset: This dataset gives the weather measurements collected at Lizard Island, which is an island in the Great Barrier Reef (North Queensland, Australia). [http://weather.aims.gov.au/#/station/1166 ]. The data gives 10 minutes sample measurements collected over a 1 month period between May 2019 and June 2019. The variables include the following (4 variables; in the same order of columns appear in the file LZIsData.csv): Page 2 of 8 Air Temperature: Air temperature in degrees Celsius. Humidity: Humidity in percentage. Wind Speed: Maximum Wind speed in kilometre per hour Air Pressure: pressure measurements expressed in units of Hectopascals Q1) [19 Marks]: • Download the data file “LZIsData.csv” and save it to your R working directory. • Assign the data to a matrix, e.g. using the.data <- as.matrix(read.csv("lzisdata.csv",="" header="FALSE," sep="," ))="" •="" generate="" a="" sample="" of="" 1500="" data="" using="" the="" following:="" my.data=""><- the.data [sample(1: 4464,1500),c(1:4)] save “my.data” to a text file titled “name-studentid-lzmydata.txt" using the following r code (note: you ‘must’ upload this data text file and the r code along with your submission. if not, zero marks will be given for this whole question). write.table(my.data,"name-studentid-lzmydata.txt") use the sampled data (“my.data”) to answer the following questions. 1.1) draw histograms for ‘air temperature’ and ‘air pressure” values, and comment on them. [2 marks] 1.2) draw a parallel box plot using the two variables; ‘air temperature’ and the ‘wind speed’. find five number summaries of these two variables. use both five number summaries and the boxplots to compare and comment on them. [5 marks] 1.3) which summary statistics would you choose to summarize the center and spread for the ‘humidity’ data? why (support your answer with proper plot/s)? find those summary statistics for the “humidity” data. [4 marks] 1.4) draw a scatterplot of ‘‘air temperature’ (as x) and ‘humidity’ (as y) for the first 1000 data vectors selected from the “my.data” (name the axes). fit a linear regression model to the above two variables and plot the (regression) line on the same scatter plot. write down the linear regression equation. compute the correlation coefficient and the coefficient of determination. explain what these results reveal. [8 marks] page 3 of 8 q2) [21 marks] 2.1) the table shows results of a survey conducted about the favorite sports, in different states over some period in 2020. state new south wales (n) victoria (v) queensland (q) s p o rt s footy (f) 1000 2000 1300 basketball (b) 1500 500 500 cricket (c) 1400 1000 800 suppose we select a person at random, a) what is the probability that the person is from victoria (v)? [1 mark] b) what is the probability that the person likes cricket (c) and from new south wales (n)? [1 mark] c) what is the probability that the person likes footy (f) given that he/she is from queensland (q)? [2 marks] d) what is the probability that the person, who likes basketball (b) is from victoria (v)? [2 marks] e) what is the probability that the person is from victoria (v) or likes cricket (c)? [2 marks] f) find the marginal distribution of sports. [3 marks] g) are sports and state mutually exclusive? explain [2 marks] h) are sports and state independent? explain [3 marks] page 4 of 8 2.2) the weather in victoria can be summarised as follows if it rains one day there is a 75% chance it will rain the following day. if it is sunny one day there is a 30% chance it will be sunny the following day. assume that the prior probability it rained yesterday is 0.6, what is the probability that it was sunny yesterday given that it is rainy today? [5 marks] q3) [5 marks] 3.1) state two differences between frequentist way and the bayesian way of estimating a parameter [2 marks] 3.2) why conjugate priors are useful in bayesian statistics? [1 mark] 3.3) give two examples of conjugate pairs (i.e., give two pairs of distributions that can be used for prior and likelihood) [2 marks] q4) frequentist and bayesian estimations [31 marks] an artificial intelligence solutions provider, bigsecai ltd. houses several computing servers to perform computationally intensive processing, such as deep learning, on sensitive (secure) data for customers, including government agencies. in order to provide reliable service, bigsecai wants to improve their monitoring and maintenance activities of their computer servers. as part of their planning, the bigsecai wants to model the lifetime pattern of their servers. bigsecai assumes that the length of time �� (in years) a computer server � lasts follows a form of exponential distribution with an unknown parameter �, as shown below. here, the quantity ���� represents on average, how long a certain server last. �� ~ ���(�) ���(�) = �(��|�) = ���(���) assume that there are � servers used, and each of their lifetime are independently and identically distributed (iid). 4.1) bigsecai first decided to use a frequentist approach to arrive at an estimate for �. answer the following questions. a) show that the joint distribution of lifetime of � servers can be given by the below equation (show the steps clearly). �(�|�) = �� ��(��), , , , where � = ∑ ������ [3 marks] b) find a simplified expression for the log-likelihood function �(�) = �� (�(�|�)) [3 marks] page 5 of 8 c) show that the maximum likelihood estimate (�� ) of the parameter � is given by: �� = �� , "ℎ$%$ � = �� & �� � ��� [4 marks] d) suppose that the lifetimes of six of their servers are {2, 7, 6, 10, 8, 3}, what is the maximum likelihood estimate �� (mle) of parameter � given this data? [2 marks] e) hence, on the average, how long would 7 servers last if they are used one after another? [2 marks] f) what is the probability that a server lasts between six and twelve years? hint: use cumulative distribution function (cdf) of exponential distribution. the cdf of the exponential distribution is given by '(() = 1 − $−+(�, . [4 marks] 4.2) bigsecai has now consulted an overseas computer hardware vendor, hardwareexpert, which has more experience working with large servers, and obtained some prior information about the lifetime of servers of similar capacity and processing capabilities. the hardwareexpert mentioned that their � value follows a pattern that can be described using a form of gamma distribution, gamma (a,b), where - and . are the hyper-parameters of the gamma distribution, with - = 0.1 and . = 0.1. 12332 (4, 5) = 6 54�(4��)��5� , where 7 is a constant. a) bigsecai has decided to use this prior information from hardwareexpert for their estimation. if it uses the gamma distribution prior, gamma (a,b), obtain an expression for the posterior distribution (show all the steps). show that the posterior distribution is also a gamma distribution, gamma (a’, b’), with different hyper-parameters -8 and .′. express -8 and .′ in terms of 4, 5, � and �. [5 marks] b) use the values for a and b hyper-parameters suggested by the hardwareexpert, and the server lifetimes that has been observed from 6 servers: {2, 7, 6, 10, 8, 3}, to find the value of -8 and .′. what is the posterior mean the.data="" [sample(1:="" 4464,1500),c(1:4)]="" save="" “my.data”="" to="" a="" text="" file="" titled="" “name-studentid-lzmydata.txt"="" using="" the="" following="" r="" code="" (note:="" you="" ‘must’="" upload="" this="" data="" text="" file="" and="" the="" r="" code="" along="" with="" your="" submission.="" if="" not,="" zero="" marks="" will="" be="" given="" for="" this="" whole="" question).="" write.table(my.data,"name-studentid-lzmydata.txt")="" use="" the="" sampled="" data="" (“my.data”)="" to="" answer="" the="" following="" questions.="" 1.1)="" draw="" histograms="" for="" ‘air="" temperature’="" and="" ‘air="" pressure”="" values,="" and="" comment="" on="" them.="" [2="" marks]="" 1.2)="" draw="" a="" parallel="" box="" plot="" using="" the="" two="" variables;="" ‘air="" temperature’="" and="" the="" ‘wind="" speed’.="" find="" five="" number="" summaries="" of="" these="" two="" variables.="" use="" both="" five="" number="" summaries="" and="" the="" boxplots="" to="" compare="" and="" comment="" on="" them.="" [5="" marks]="" 1.3)="" which="" summary="" statistics="" would="" you="" choose="" to="" summarize="" the="" center="" and="" spread="" for="" the="" ‘humidity’="" data?="" why="" (support="" your="" answer="" with="" proper="" plot/s)?="" find="" those="" summary="" statistics="" for="" the="" “humidity”="" data.="" [4="" marks]="" 1.4)="" draw="" a="" scatterplot="" of="" ‘‘air="" temperature’="" (as="" x)="" and="" ‘humidity’="" (as="" y)="" for="" the="" first="" 1000="" data="" vectors="" selected="" from="" the="" “my.data”="" (name="" the="" axes).="" fit="" a="" linear="" regression="" model="" to="" the="" above="" two="" variables="" and="" plot="" the="" (regression)="" line="" on="" the="" same="" scatter="" plot.="" write="" down="" the="" linear="" regression="" equation.="" compute="" the="" correlation="" coefficient="" and="" the="" coefficient="" of="" determination.="" explain="" what="" these="" results="" reveal.="" [8="" marks]="" page="" 3="" of="" 8="" q2)="" [21="" marks]="" 2.1)="" the="" table="" shows="" results="" of="" a="" survey="" conducted="" about="" the="" favorite="" sports,="" in="" different="" states="" over="" some="" period="" in="" 2020.="" state="" new="" south="" wales="" (n)="" victoria="" (v)="" queensland="" (q)="" s="" p="" o="" rt="" s="" footy="" (f)="" 1000="" 2000="" 1300="" basketball="" (b)="" 1500="" 500="" 500="" cricket="" (c)="" 1400="" 1000="" 800="" suppose="" we="" select="" a="" person="" at="" random,="" a)="" what="" is="" the="" probability="" that="" the="" person="" is="" from="" victoria="" (v)?="" [1="" mark]="" b)="" what="" is="" the="" probability="" that="" the="" person="" likes="" cricket="" (c)="" and="" from="" new="" south="" wales="" (n)?="" [1="" mark]="" c)="" what="" is="" the="" probability="" that="" the="" person="" likes="" footy="" (f)="" given="" that="" he/she="" is="" from="" queensland="" (q)?="" [2="" marks]="" d)="" what="" is="" the="" probability="" that="" the="" person,="" who="" likes="" basketball="" (b)="" is="" from="" victoria="" (v)?="" [2="" marks]="" e)="" what="" is="" the="" probability="" that="" the="" person="" is="" from="" victoria="" (v)="" or="" likes="" cricket="" (c)?="" [2="" marks]="" f)="" find="" the="" marginal="" distribution="" of="" sports.="" [3="" marks]="" g)="" are="" sports="" and="" state="" mutually="" exclusive?="" explain="" [2="" marks]="" h)="" are="" sports="" and="" state="" independent?="" explain="" [3="" marks]="" page="" 4="" of="" 8="" 2.2)="" the="" weather="" in="" victoria="" can="" be="" summarised="" as="" follows="" if="" it="" rains="" one="" day="" there="" is="" a="" 75%="" chance="" it="" will="" rain="" the="" following="" day.="" if="" it="" is="" sunny="" one="" day="" there="" is="" a="" 30%="" chance="" it="" will="" be="" sunny="" the="" following="" day.="" assume="" that="" the="" prior="" probability="" it="" rained="" yesterday="" is="" 0.6,="" what="" is="" the="" probability="" that="" it="" was="" sunny="" yesterday="" given="" that="" it="" is="" rainy="" today?="" [5="" marks]="" q3)="" [5="" marks]="" 3.1)="" state="" two="" differences="" between="" frequentist="" way="" and="" the="" bayesian="" way="" of="" estimating="" a="" parameter="" [2="" marks]="" 3.2)="" why="" conjugate="" priors="" are="" useful="" in="" bayesian="" statistics?="" [1="" mark]="" 3.3)="" give="" two="" examples="" of="" conjugate="" pairs="" (i.e.,="" give="" two="" pairs="" of="" distributions="" that="" can="" be="" used="" for="" prior="" and="" likelihood)="" [2="" marks]="" q4)="" frequentist="" and="" bayesian="" estimations="" [31="" marks]="" an="" artificial="" intelligence="" solutions="" provider,="" bigsecai="" ltd.="" houses="" several="" computing="" servers="" to="" perform="" computationally="" intensive="" processing,="" such="" as="" deep="" learning,="" on="" sensitive="" (secure)="" data="" for="" customers,="" including="" government="" agencies.="" in="" order="" to="" provide="" reliable="" service,="" bigsecai="" wants="" to="" improve="" their="" monitoring="" and="" maintenance="" activities="" of="" their="" computer="" servers.="" as="" part="" of="" their="" planning,="" the="" bigsecai="" wants="" to="" model="" the="" lifetime="" pattern="" of="" their="" servers.="" bigsecai="" assumes="" that="" the="" length="" of="" time="" ��="" (in="" years)="" a="" computer="" server="" �="" lasts="" follows="" a="" form="" of="" exponential="" distribution="" with="" an="" unknown="" parameter="" �,="" as="" shown="" below.="" here,="" the="" quantity="" ����="" represents="" on="" average,="" how="" long="" a="" certain="" server="" last.="" ��="" ~="" ���(�)="" ���(�)="�(��|�)" =="" ���(���)="" assume="" that="" there="" are="" �="" servers="" used,="" and="" each="" of="" their="" lifetime="" are="" independently="" and="" identically="" distributed="" (iid).="" 4.1)="" bigsecai="" first="" decided="" to="" use="" a="" frequentist="" approach="" to="" arrive="" at="" an="" estimate="" for="" �.="" answer="" the="" following="" questions.="" a)="" show="" that="" the="" joint="" distribution="" of="" lifetime="" of="" �="" servers="" can="" be="" given="" by="" the="" below="" equation="" (show="" the="" steps="" clearly).="" �(�|�)="��" ��(��),="" ,="" ,="" ,="" where="" �="∑" ������="" [3="" marks]="" b)="" find="" a="" simplified="" expression="" for="" the="" log-likelihood="" function="" �(�)="��" (�(�|�))="" [3="" marks]="" page="" 5="" of="" 8="" c)="" show="" that="" the="" maximum="" likelihood="" estimate="" (��="" )="" of="" the="" parameter="" �="" is="" given="" by:="" ��="��" ,="" "ℎ$%$="" �="��" &="" ��="" �="" ���="" [4="" marks]="" d)="" suppose="" that="" the="" lifetimes="" of="" six="" of="" their="" servers="" are="" {2,="" 7,="" 6,="" 10,="" 8,="" 3},="" what="" is="" the="" maximum="" likelihood="" estimate="" ��="" (mle)="" of="" parameter="" �="" given="" this="" data?="" [2="" marks]="" e)="" hence,="" on="" the="" average,="" how="" long="" would="" 7="" servers="" last="" if="" they="" are="" used="" one="" after="" another?="" [2="" marks]="" f)="" what="" is="" the="" probability="" that="" a="" server="" lasts="" between="" six="" and="" twelve="" years?="" hint:="" use="" cumulative="" distribution="" function="" (cdf)="" of="" exponential="" distribution.="" the="" cdf="" of="" the="" exponential="" distribution="" is="" given="" by="" '(()="1" −="" $−+(�,="" .="" [4="" marks]="" 4.2)="" bigsecai="" has="" now="" consulted="" an="" overseas="" computer="" hardware="" vendor,="" hardwareexpert,="" which="" has="" more="" experience="" working="" with="" large="" servers,="" and="" obtained="" some="" prior="" information="" about="" the="" lifetime="" of="" servers="" of="" similar="" capacity="" and="" processing="" capabilities.="" the="" hardwareexpert="" mentioned="" that="" their="" �="" value="" follows="" a="" pattern="" that="" can="" be="" described="" using="" a="" form="" of="" gamma="" distribution,="" gamma="" (a,b),="" where="" -="" and="" .="" are="" the="" hyper-parameters="" of="" the="" gamma="" distribution,="" with="" -="0.1" and="" .="0.1." 12332="" (4,="" 5)="6" 54�(4��)��5�="" ,="" where="" 7="" is="" a="" constant.="" a)="" bigsecai="" has="" decided="" to="" use="" this="" prior="" information="" from="" hardwareexpert="" for="" their="" estimation.="" if="" it="" uses="" the="" gamma="" distribution="" prior,="" gamma="" (a,b),="" obtain="" an="" expression="" for="" the="" posterior="" distribution="" (show="" all="" the="" steps).="" show="" that="" the="" posterior="" distribution="" is="" also="" a="" gamma="" distribution,="" gamma="" (a’,="" b’),="" with="" different="" hyper-parameters="" -8="" and="" .′.="" express="" -8="" and="" .′="" in="" terms="" of="" 4,="" 5,="" �="" and="" �.="" [5="" marks]="" b)="" use="" the="" values="" for="" a="" and="" b="" hyper-parameters="" suggested="" by="" the="" hardwareexpert,="" and="" the="" server="" lifetimes="" that="" has="" been="" observed="" from="" 6="" servers:="" {2,="" 7,="" 6,="" 10,="" 8,="" 3},="" to="" find="" the="" value="" of="" -8="" and="" .′.="" what="" is="" the="" posterior="">
Answered Same DayApr 24, 2021SIT743Deakin University

Answer To: Microsoft Word - SIT743-Assignment-1 Page 1 of 8 SIT743 Bayesian Learning and Graphical Models...

Pushpendra answered on Apr 28 2021
147 Votes
Ans:
2000 500 1000
( )
3500
3900 3500 2600
0.35
P V
Total
 


 

Ans:
1400
( )
10000
0.14
P C NSW 

Ans:
1300
( | )
1300 500 800
1300
1300 500 800
0.5
TotalP F Q
Total

 

 

Ans:
500
( )
10000
0.05
P B V 

Ans:
( ) ( ) ( ) ( )
2000 500 1000 1400 1000 800 1000
10000 10000 100000
0.66
P B C P V P C P V C    
   
  

Ans: The marginal distribution of sports is:-
3900
,
10000
3500
( ) ,
10000
2600
,
10000
x N
P X x x V
x Q
 
 
 
 
   
 
 
 
 
Ans: Two events are said to be mutually exclusive, if they cannot occur at the same time.
1000
( )
10000
0
P F N 


No, they are not mutually exclusive.
Ans: Two events are said to be independent when the occurrence of one does not depend on the
occurrence of the o=another event, i.e. ( ) ( ) ( )P A B P A P B  
1000 4300 3900
( ) , ( ) , ( )
10000 10000 10000
( ) ( ) ( )
P F N P F P N
P F N P F P N
   
  

Therefore, they are not independent.
Ans:
( ) 0.6, ( ) 0.4
( | ) 0.75, ( | ) 0.3
P rain P not rain
P rain rained yesterday P sunny not rain
 
 

Using bayes theorem of conditional probability:
0.7 0.4
( | )
0.7 0.4 0.6 0.75
0.3835
P sunny yesterday raintoday


  

Ans: In a Bayesian framework, we model the data probabilistically as well as the parameters
that govern the distribution of the data. In a frequentist framework, the data is modeled
probabilistically, but the...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here