STAT XXXXXXXXXXAssignment 2 STAT XXXXXXXXXXAssignment 2 XXXXXXXXXXAbdul Sattar Question 1 a) Appropriate Statistical Model couple |z|) ## (Intercept XXXXXXXXXX597 4.21e-11 *** ## anxiety XXXXXXXXXX...

i want this assignment to be complete by tomorrow


STAT 8111 - Assignment 2 STAT 8111 - Assignment 2 45830541 - Abdul Sattar Question 1 a) Appropriate Statistical Model couple <- read.csv(file='couple.csv' )="" df1="subset(couple," select="c(upb,education,anxiety))" library(ggally)="" ##="" loading="" required="" package:="" ggplot2="" ##="" registered="" s3="" method="" overwritten="" by="" 'ggally':="" ##="" method="" from="" ##="" +.gg="" ggplot2="" ggpairs(df1)="" corr:="" −0.099="" corr:="" 0.312**="" corr:="" 0.108="" upb="" education="" anxiety="" upb="" education="" anxiety="" 0="" 10="" 20="" 30="" 0.00="" 0.25="" 0.50="" 0.75="" 1.00="" −1="" 0="" 1="" 2="" 0.00="" 0.05="" 0.10="" 0.15="" 0.00="" 0.25="" 0.50="" 0.75="" 1.00="" −1="" 0="" 1="" 2="" from="" the="" above="" correlation="" matrix="" we="" can="" say="" “anxiety”="" is="" a="" strong="" variable="" to="" explain="" upb.="" library(mass)="" model1=""><- glm(upb="" ~="" anxiety,="" data="couple)" model2=""><- glm.nb(upb="" ~="" anxiety,="" data="couple)" model3=""><- glm(upb="" ~="" anxiety,="" family="poisson," data="couple)" model4=""><- glm(upb="" ~="" anxiety,="" family="poisson(link" =="" 'identity'),="" data="couple)" model5=""><- glm(upb="" ~="" anxiety,="" family="poisson(link" =="" 'log'),="" data="couple)" summary(model2)="" 1="" ##="" ##="" call:="" ##="" glm.nb(formula="upb" ~="" anxiety,="" data="couple," init.theta="0.5048401576," ##="" link="log)" ##="" ##="" deviance="" residuals:="" ##="" min="" 1q="" median="" 3q="" max="" ##="" -1.6320="" -1.2150="" -0.5189="" 0.2119="" 2.1604="" ##="" ##="" coefficients:="" ##="" estimate="" std.="" error="" z="" value="" pr(="">|z|) ## (Intercept) 1.0249 0.1554 6.597 4.21e-11 *** ## anxiety 0.6110 0.1524 4.009 6.09e-05 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## (Dispersion parameter for Negative Binomial(0.5048) family taken to be 1) ## ## Null deviance: 119.03 on 101 degrees of freedom ## Residual deviance: 104.69 on 100 degrees of freedom ## AIC: 455.46 ## ## Number of Fisher Scoring iterations: 1 ## ## ## Theta: 0.5048 ## Std. Err.: 0.0970 ## ## 2 x log-likelihood: -449.4620 AIC(model1,model2,model3,model4,model5) ## df AIC ## model1 3 642.7006 ## model2 3 455.4624 ## model3 2 791.2528 ## model4 2 785.6547 ## model5 2 791.2528 Our Model Selection Criterion will be based on AIC, the AIC score is lowest for Model 2 - which is a Negative Binomial Fit Model. We will now proceed ahead with Model 2. i) Fitted Model Equation: From the above model2 summary we have: Y = β0 + β1X Intercept β0 = 1.0249 β1 = 0.6110 The fitted model can be represented as: Yupb = 1.0249 + 0.6110Xanxiety ii) Interpreting the model parameters: β0 = The intercept for our predictor variable Y (number of unwanted pursuit behaviour perpetrations). β1 = It’s the change in the mean of Y (or UPB), that means for every 1 unit increase in anxiety the UPB increases by 0.6110. The parameters have a strong positive relationship and can be verified from the correlation coefficient as well. iii) Diagnostic Checking of our final model We can see from the summary of model2 devres = residuals(model2, type="deviance") op = par(mfrow=c(1,2)) hist(devres, freq=FALSE, breaks=12) lines(density(devres), col="red") lines(seq(-4,4, by=.1), dnorm(seq(-4,4, by=.1), mean(devres), sd(devres)), col="blue") 2 legend("topright",legend=c("density estimate","normal curve"), lty=1,col=c("red","blue"),cex=.6) qqnorm(devres) qqline(devres) Histogram of devres devres D en si ty −2 −1 0 1 2 0. 0 0. 1 0. 2 0. 3 0. 4 0. 5 0. 6 density estimate normal curve −2 −1 0 1 2 − 1 0 1 2 Normal Q−Q Plot Theoretical Quantiles S am pl e Q ua nt ile s • There is some deviation in normality in the righ tail. The data looks right skewed. We can diagnose further by looking at quantile residuals. Quantile Residuals library(statmod) q = qresid(model2) op = par(mfrow=c(1,2)) hist(q, freq=FALSE, breaks=12) lines(density(q), col="red") lines(seq(-4,4, by=.1), dnorm(seq(-4,4, by=.1), mean(q), sd(q)), col="blue") legend("bottomright",legend=c("density estimate","normal curve"), lty=1,col=c("red","blue"),cex=.6) qqnorm(q) qqline(q) 3 Histogram of q q D en si ty −3 −1 0 1 2 3 0. 0 0. 1 0. 2 0. 3 0. 4 density estimate normal curve −2 −1 0 1 2 − 3 − 2 − 1 0 1 2 Normal Q−Q Plot Theoretical Quantiles S am pl e Q ua nt ile s The Quantile Residual plots show the density estimate of upb to be normal. dev4 = residuals(model2, type="deviance") plot(dev4) 0 20 40 60 80 100 − 1 0 1 2 Index de v4 From the above, we cannot see any relatively large residuals in the given dataset. library(boot) diag4 = glm.diag(model2) lev4 = diag4$h plot(lev4) text(1:length(lev4), lev4, 1:length(lev4), cex=0.6, pos=4) 4 0 20 40 60 80 100 0. 01 0. 02 0. 03 0. 04 0. 05 0. 06 Index le v4 1 2 3 4 5 6 7 8 9 10 11 12 1314 15 16 1718 19 20 21 22 23 24 2526 27 28 29 30 31 32 33 34 35 3637 38 3940 41 42 4344 45 46 47 48 495051 52 53 54 55 56 57 58 59 60 61 6263 64 65 66 67 68 69 70 71 72 73 74 75 7677 78 79 80 8182 83 84 85 86 87 88 89 90 91 92 93 9495 96 97 98 99 100101 102 Island 79,98,54,90,74 are having leverage statistic which are relatively larger than the rest. But considering the number of data points we have it should be fine as only 5 data points are giving larger leverage statistic. b) Analysis with the added data couple1 <- read.csv(file='couple1.csv' )="" df2="subset(couple1,select" =="" c(upb,education,anxiety))="" library(ggally)="" ggpairs(df2)="" corr:="" −0.096="" corr:="" 0.312**="" corr:="" 0.123="" upb="" education="" anxiety="" upb="" education="" anxiety="" 0="" 10="" 20="" 30="" 0.00="" 0.25="" 0.50="" 0.75="" 1.00="" −1="" 0="" 1="" 2="" 0.00="" 0.05="" 0.10="" 0.15="" 0.00="" 0.25="" 0.50="" 0.75="" 1.00="" −1="" 0="" 1="" 2="" 5="" library(mass)="" m1=""><- glm(upb="" ~="" anxiety,="" data="couple1)" m2=""><- glm.nb(upb="" ~="" anxiety,="" data="couple1)" m3=""><- glm(upb="" ~="" anxiety,="" family="poisson," data="couple1)" m4=""><- glm(upb="" ~="" anxiety,="" family="poisson(link" =="" 'identity'),="" data="couple1)" m5=""><- glm(upb="" ~="" anxiety,="" family="poisson(link" =="" 'log'),="" data="couple1)" summary(m2)="" ##="" ##="" call:="" ##="" glm.nb(formula="upb" ~="" anxiety,="" data="couple1," init.theta="0.5210091641," ##="" link="log)" ##="" ##="" deviance="" residuals:="" ##="" min="" 1q="" median="" 3q="" max="" ##="" -1.6463="" -1.2215="" -0.4811="" 0.2137="" 2.2035="" ##="" ##="" coefficients:="" ##="" estimate="" std.="" error="" z="" value="" pr(="">|z|) ## (Intercept) 1.0168 0.1519 6.694 2.17e-11 *** ## anxiety 0.6109 0.1492 4.093 4.25e-05 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## (Dispersion parameter for Negative Binomial(0.521) family taken to be 1) ## ## Null deviance: 122.14 on 103 degrees of freedom ## Residual deviance: 107.11 on 102 degrees of freedom ## AIC: 463.42 ## ## Number of Fisher Scoring iterations: 1 ## ## ## Theta: 0.5210 ## Std. Err.: 0.0991 ## ## 2 x log-likelihood: -457.4240 AIC(m1,m2,m3,m4,m5) ## df AIC ## m1 3 653.4009 ## m2 3 463.4242 ## m3 2 797.6029 ## m4 2 792.2521 ## m5 2 797.6029 • Fitted Model Equation: From the above m2 summary we have: Y = β0 + β1X Intercept β0 = 1.0168 β1 = 0.6109 The fitted model can be represented as: Yupb = 1.0168 + 0.6109Xanxiety • Interpreting the model parameters: β0 = The intercept for our predictor variable Y (number of unwanted pursuit behaviour perpetrations). β1 = It’s the change in the mean of Y (or UPB), that means for every 1 unit increase in anxiety the UPB increases by 0.6109. We can comment that even after adding the data to existing dataset, the results are consistent and negative binomial model fits well for the data with low AIC score devres = residuals(m2, type="deviance") op = par(mfrow=c(1,2)) 6 hist(devres, freq=FALSE, breaks=12) lines(density(devres), col="red") lines(seq(-4,4, by=.1), dnorm(seq(-4,4, by=.1), mean(devres), sd(devres)), col="blue") legend("topright",legend=c("density estimate","normal curve"), lty=1,col=c("red","blue"),cex=.6) qqnorm(devres) qqline(devres) Histogram of devres devres D en si ty −2 −1 0 1 2 0. 0 0. 1 0. 2 0. 3 0. 4 0. 5 density estimate normal curve −2 −1 0 1 2 − 1 0 1 2 Normal Q−Q Plot Theoretical Quantiles S am pl e Q ua nt ile s library(statmod) q = qresid(m2) op = par(mfrow=c(1,2)) hist(q, freq=FALSE, breaks=12) lines(density(q), col="red") lines(seq(-4,4, by=.1), dnorm(seq(-4,4, by=.1), mean(q), sd(q)), col="blue") legend("bottomright",legend=c("density estimate","normal curve"), lty=1,col=c("red","blue"),cex=.6) qqnorm(q) qqline(q) 7 Histogram of q q D en si ty −3 −1 0 1 2 3 0. 0 0. 1 0. 2 0. 3 0. 4 0. 5 density estimate normal curve −2 −1 0 1 2 − 3 − 2 − 1 0 1 2 Normal Q−Q Plot Theoretical Quantiles S am pl e Q ua nt ile s dev41 = residuals(m2, type="deviance") plot(dev41) 0 20 40 60 80 100 − 1 0 1 2 Index de v4 1 library(boot) diag41 = glm.diag(m2) lev41 = diag41$h plot(lev41) text(1:length(lev41), lev41, 1:length(lev41), cex=0.6, pos=4) 8 0 20 40 60 80 100 0. 01 0. 02 0. 03 0. 04 0. 05 0. 06 Index le v4 1 1 2 3 4 5 6 7 8 9 10 11 12 1314 15 16 1718 19 20 21 22 23 24 2526 27 28 29 30 31 32 33 34 35 3637 38 3940 41 42 4344 45 46 47 48 495051 52 53 54 55 56 57 58 59 60 61 6263 64 65 66 67 68 69 70 71 72 73 74 75 7677 78 79 80 8182 83 84 85 86 87 88 89 90 91 92 93 9495 96 97 98 99 100101 102 103 104 Comments: The Results of the second model (with added data points) are relatively similar and consistent. We can see that from Intercept values, Slope of the fitted equation, correlation coefficients, residual plots, quantile plots, leverage statistics remain to be constant and similar for this model c) Investigating potential interaction between covariates plot(upb ~ anxiety + factor(education), data=couple1) Commenting on the above results: Looking at the above box plots we can see that the spread for both the students with atleast a bachelors degree (1) or otherwise(0) is almost equivalent. Also, a generalized linear model was fitted just to cross check our box plot results which satisfied our assumption that the interaction term (anxiety*factory(education)) was insignificant. So we can say that the covariates do not have any significant relationship or effect on each other. Also, from the Correlation scatter plot, the correlation coefficient of anxiety and
Oct 21, 2021
SOLUTION.PDF

Get Answer To This Question

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here