Stat and R ProgrammingIntegrity Agreement Problem 1 (40 points) Problem 2 (20 points) Problem 3 (40...

Question

Stat and R ProgrammingIntegrity Agreement Problem 1 (40 points) Problem 2 (20 points) Problem 3 (40 points) Penn State STAT 440 Final Exam Assessment Guideline Please read the following instructions carefully. This assessment is take-home and open-book. You must complete this assessment independently, and you cannot seek help from anyone else including, but not limited to, the course staff, classmates, relatives, colleagues, teachers and internet forums. Prior to submissions, students can ask for clarifications about exam problems at the Canvas Discussion Page. To receive credit, you must show your work and/or explain your reasoning clearly, legibly and concisely. Problems marked [A] require you to derive analytic mathematical expressions for the solution, whereas problems marked [C] require you to show computer codes and corresponding output. There are 3 problems, and the total point is 100. Problems are not ordered or weighted by difficulty. Please follow our online submission guidelines in the syllabus. Please submit your work to Canvas by 8 AM EST on 2022-12-14. Late submissions are not graded. Integrity Agreement Please complete and add the following agreement at the beginning of your submitted solution. Submissions without the complete agreement are not graded. Assessment Guideline I, [Your Printed Name and Penn State User ID] agree to complete this take-home, open-book assessment independently, and I agree not to seek help from anyone else including, but not limited to, the course staff, classmates, relatives, colleagues, teachers and internet forums. I agree not to share any copy of solutions with any person or organization. I agree not to distribute any copy of solutions in any public or private domain. I understand that if I am found to have violated any agreement listed above, I will be subject to disciplinary action including the possibility of failing STAT 440. Problem 1 (40 points) Let  and let . Let . Suppose we observe that  and . [A] Find the maximum likelihood estimator (MLE)  by analytically maximizing the likelihood of . [A] Find the conditional probability distribution of  and compute the conditional expectation . [A] Let  be the quantile function for the conditional probability distribution of . Find the probability distribution of . [C] Estimate  using importance sampling with the proposal distribution being . Use 100,000 proposal samples and set.seed(440)  in your simulation. [A] & [C] Find the shortest possible 95% credible interval for . [A] Suppose you want to use Metropolis-Hastings to sample from . Suppose you use a symmetric transition kernel, your current position is , and your proposed position is . What is the probability of accepting this proposal? [A] Suppose you want to use rejection sampling to sample from , your proposal distribution is , and you again propose . What is the probability of accepting this proposal? [A] & [C] Find the maximum a posteriori probability estimator  by analytically maximizing the density of the conditional probability distribution of . Derive a Newton’s method algorithm to find . Then write your own R codes to find  with a convergence tolerance of . , … , ∣ θ Bernoulli(θ)X1 Xn ∼ iid θ ∼ Uniform(0, 1) ≜Sn ∑ n i=1 Xn n = 12 = 4Sn θ̂MLE θ θ ∣ Sn E(θ ∣ )Sn Q θ ∣ Sn Q(θ) E(θ ∣ )Sn Beta(2, 2) θ θ ∣ Sn = 1/2θ0 = 3/5θ∗ θ ∣ Sn Unif(0, 1) = 3/5θ∗ θ̂MAP θ ∣ Sn θ̂MAP θ̂MAP ϵ = 0.001 Problem 2 (20 points) Consider a simple random sample  from the following two-class mixture model: where  denotes the density value of normal distribution with mean  and variance  at . We assume that  is known throughout this problem. [A] We assume that both  and  are known and  is unknown in this part. Mimic the arguments in our lecture notes and derive an EM algorithm to find MLE of the unknown parameter . [A] We assume that both  and  are unknown and  is known in this part. We further assume a normal prior  on both  and  where  is also known. Derive an Gibbs sampling algorithm to find the posterior means of  and . Problem 3 (40 points) Brushtail possum is a marsupial that lives in Australia and New Guinea Researchers (Lindenmayer et al, Australian Journal of Zoology, 1995 (https://doi.org/10.1071/ZO9950449)) captured 104 of these animals and took body measurements before releasing the animals back into the wild. In this problem we consider two of these measurements: the total length of each possum, from head to tail, and the length of each possum’s head. You can download this dataset and see the data format here (https://www.openintro.org/data/index.php?data=possum). Each possum  provides two measurements , where  is the total length (cm) of this possum and  is the head length (mm) of this possum. Now consider the following model: where  and  is an unknown scalar. Let , which is an unknown two-dimensional vector. [C] Compute sample correlation between the total length (cm) and the head length (mm) across 104 possums. Create a scatter plot of total length (cm) and head length (mm), and then discuss if this plot is consistent with the sample correlation. , , … ,X1 X2 Xn f(x) = π ⋅ N (x; , ) + (1 − π) ⋅ N (x; , ),μ1 σ 2 μ2 σ 2 N (x; μ, )σ2 μ σ2 x σ2 μ1 μ2 π π μ1 μ2 π N (0, /τ)σ2 μ1 μ2 τ > 0 μ1 μ2 i ( , )Xi Yi Xi Yi = + + ,Yi β0 Xiβ1 ϵi N(0, )ϵi ∼ i.i.d. σ2 σ2 β = ( ,β0 β1) ⊤ https://doi.org/10.1071/ZO9950449 https://www.openintro.org/data/index.php?data=possum [A] & [C] Derive the the least square estimator of , which is denoted as . Based on the same mathematical operations as you use in your derivations, write your own codes to compute  on this dataset. [C] Use R  built-in function lm  to compute  and the standard errors. [C] Use QR decomposition to compute  and the standard errors. [A] & [C] Use singular value decomposition to compute  and the standard errors. [C] Use R  built-in function optim  with method BFGS  to compute . [C] Estimate the bias of . [C] Use non-parametric bootstrap (10,000 replications) to estimate  and find the corresponding 95% confidence interval. Use set.seed(440)  in your simulation. [C] Use parametric bootstrap (10,000 replications) based on multivariate normal distribution to estimate  and find the corresponding 95% confidence interval. Use set.seed(440)  in your simulation. [C] Use permutation to test if  or not. [C] Use leave-one-out and 3-fold cross validations to compare the following two models:  Session information β β̂ β̂ β̂ β̂ β̂ β̂ β̂ β1 β1 = 0β1 Model 1:  = + + versus Model 2:  = + + + .Yi β0 Xiβ1 ϵi Yi β0 Xiβ1 X 2 i β2 ϵi 

Prithwijit · Accepted Answer

Applied Stochastic Process
Penn-State University
1. Given that X1, X2, …., Xn~ Bernoulli> We have to find the MLE of .
The joint distribution of X1, X2, …., Xn is 
 =     where  = 0,1 and 
	Or,  =     
So, the log-likelihood,
	l () = *log () + (n-)*log (1- )
Differentiating with respect to ,
	l’ () =   -  = 0
	  = 
Now,    l’’ () =-  - 
Putting the value of , we get
	l’’ () = 
So, the MLE of is  = 
· Now the conditional distribution |=s
f (|=s).1
	          
So,  |=s ~Beta (s+1, n-s+1)
It is given that s = 4 and n= 12
So,  |=s ~eta (5, 9)
Now the conditional expectation is E (|=4) = 
· The quantile function is defined by F-1(x) where F(x) is the cumulative distribution function.
So, for |=4, it would be,
	d = p
Now solving this equation is not possible or may involve too much mathematics.

Integrity AgreementProblem 1 (40 points)Problem 2 (20 points)Problem 3 (40 points)Penn State STAT 440 Final ExamAssessment GuidelinePlease read the following instructions carefully.This...

Answer To: Integrity AgreementProblem 1 (40 points)Problem 2 (20 points)Problem 3 (40 points)Penn State...

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment