InfluentialVarbsObs.html Influential variables and observations Anthon Eff 1 Try downloading this webpage The functionality of this webpage is constrained in D2L, and you might find it easier to read...

Can you help create a restricted and unrestricted model using dependent and independent variables using Rstudios software?


InfluentialVarbsObs.html Influential variables and observations Anthon Eff 1 Try downloading this webpage The functionality of this webpage is constrained in D2L, and you might find it easier to read and navigate if you download this html file to your own computer. 2 Resources to learn R R tutorial on YouTube (21 videos, total time: 1 hour, 7 minutes) Cheat sheets R web search Use this to hunt for documentation for specific package or function 3 Miscellaneous international data Please download worldData.xlsx to your working directory. The xlsx file contains two sheets: one containing variable descriptions, labeled variables, and the other containing data, labeled data. Each observation is a country, and the variables are drawn from a variety of sources. xr<-data.frame(readxl::read_xlsx(path="worlddata.xlsx",sheet="data"))><-xr$iso3 #="" using="" as="" rowname="" the="" iso="" 3-character="" code="" for="" each="" country=""><-scale(xr$prcp*xr$temp) #="" values="" will="" be="" high="" when="" the="" climate="" is="" warm="" and="" wet;="" low="" when="" dry="" and="" cold="" 4="" a="" model="" of="" healthy="" life="" expectancy="" at="" birth="" we="" can="" extract="" some="" variables="" to="" make="" a="" model="" explaining="" the="" variation="" across="" countries="" in="" healthy="" life="" expectancy="" at="" birth.="" this="" is="" the="" number="" of="" healthy="" years="" the="" average="" new="" born="" is="" expected="" to="" live="" (the="" figure="" is="" from="" around="" 2012).="" 4.1="" unrestricted="" model=""><-c("pcturb","simlang","pctylow40","netprimed","santot","wattot","calorie97","prcpxtemp","gcgdp","x2009.overall","christian")><-"hlebirth"><-formula(paste(dpv,paste(mhh,collapse="+"),sep="~"))><-lm(xur,xr)) ##="" ##="" call:="" ##="" lm(formula="xur," data="xr)" ##="" ##="" residuals:="" ##="" min="" 1q="" median="" 3q="" max="" ##="" -12.6870="" -3.0476="" 0.1886="" 3.0076="" 9.8266="" ##="" ##="" coefficients:="" ##="" estimate="" std.="" error="" t="" value="" pr(="">|t|) ## (Intercept) -4.707440 5.548952 -0.848 0.39850 ## pctUrb 0.132423 0.038673 3.424 0.00093 *** ## simlang 7.104192 2.851683 2.491 0.01456 * ## pctYlow40 0.478732 0.147753 3.240 0.00168 ** ## NetPrimEd 0.067901 0.058044 1.170 0.24516 ## sanTot 0.114436 0.037552 3.047 0.00303 ** ## watTot 0.135329 0.059723 2.266 0.02585 * ## CALORIE97 0.003235 0.001699 1.904 0.06006 . ## prcpXtemp 1.704599 0.742838 2.295 0.02408 * ## GCGDP -0.317173 0.126362 -2.510 0.01386 * ## X2009.Overall 0.213336 0.071512 2.983 0.00367 ** ## Christian -1.977590 1.563444 -1.265 0.20918 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 4.878 on 90 degrees of freedom ## (89 observations deleted due to missingness) ## Multiple R-squared: 0.8523, Adjusted R-squared: 0.8343 ## F-statistic: 47.22 on 11 and 90 DF, p-value: < 2.2e-16="" the="" above="" is="" our="" unrestricted="" model,="" so="" we="" will="" create="" our="" descriptive="" statistics="" table="" now.=""><-data.frame(psych::describe(xr[,c(dpv,mhh)])) #="" we="" will="" output="" our="" descriptive="" statistics="" now=""><-u[,c("n","mean","sd","min","max")] #="" so="" here="" we="" select="" only="" the="" columns="" that="" we="" want="" write.csv(u,file="descrip.csv" )="" #="" this="" writes="" the="" object="" u="" to="" a="" csv-format="" file="" called="" "descrip.csv"="" descriptive="" statistics="" n="" mean="" sd="" min="" max="" hlebirth="" 189="" 57.5197="" 11.1386="" 28.5607="" 74.9935="" pcturb="" 190="" 55.2158="" 23.6784="" 11.0000="" 100.0000="" simlang="" 191="" 0.7874="" 0.2160="" 0.2551="" 1.0000="" pctylow40="" 125="" 16.8480="" 4.3125="" 6.0000="" 25.0000="" netprimed="" 183="" 85.0164="" 15.8001="" 22.0000="" 100.0000="" santot="" 155="" 67.1935="" 30.1783="" 5.0000="" 100.0000="" wattot="" 159="" 83.4591="" 18.3033="" 22.0000="" 100.0000="" calorie97="" 139="" 2,686.6475="" 513.3608="" 1,685.0000="" 3,699.0000="" prcpxtemp="" 179="" 0.0000="" 1.0000="" -1.8454="" 2.8088="" gcgdp="" 140="" 15.1893="" 5.6270="" 4.6000="" 32.0000="" x2009.overall="" 172="" 59.5581="" 10.5864="" 22.7000="" 87.1000="" christian="" 191="" 0.5753="" 0.3738="" 0.0000="" 0.9945="" 4.2="" f-test="" to="" identify="" irrelevant="" variables="" next,="" we="" will="" identify="" the="" coefficients="" with="" a="" p-value="" above="" 0.10,="" and="" use="" an="" f-test="" to="" confirm="" that="" those="" variables="" may="" be="" dropped="" from="" the="" model.=""><-summary(q)$coefficients #="" get="" the="" table="" of="" regression="" coefficients,="" p-values,="" etc.=""><-intersect(rownames(j)[which(j[,4]>.1)],mhh) # get the names of the variables with p-values above 0.10 linearHypothesis(q,drpt) # perform the F-test ## Linear hypothesis test ## ## Hypothesis: ## NetPrimEd = 0 ## Christian = 0 ## ## Model 1: restricted model ## Model 2: HLEbirth ~ pctUrb + simlang + pctYlow40 + NetPrimEd + sanTot + ## watTot + CALORIE97 + prcpXtemp + GCGDP + X2009.Overall + ## Christian ## ## Res.Df RSS Df Sum of Sq F Pr(>F) ## 1 92 2208.3 ## 2 90 2141.3 2 67.078 1.4097 0.2496 The p-value of the F-test is above 0.05; therefore, we cannot reject the null hypothesis that the true values of the coefficients are zero. So we drop the variables, creating our restricted model as the final model. 4.3 Restricted model mhh<-setdiff(mhh,drpt)><-formula(paste(dpv,paste(mhh,collapse="+"),sep="~"))><-lm(fr,xr)) ##="" ##="" call:="" ##="" lm(formula="fr," data="xr)" ##="" ##="" residuals:="" ##="" min="" 1q="" median="" 3q="" max="" ##="" -13.023="" -2.780="" 0.112="" 2.842="" 11.399="" ##="" ##="" coefficients:="" ##="" estimate="" std.="" error="" t="" value="" pr(="">|t|) ## (Intercept) -2.752275 5.145257 -0.535 0.593985 ## pctUrb 0.121508 0.037929 3.204 0.001860 ** ## simlang 7.690603 2.691687 2.857 0.005274 ** ## pctYlow40 0.539948 0.137020 3.941 0.000157 *** ## sanTot 0.128905 0.033981 3.793 0.000264 *** ## watTot 0.157277 0.056965 2.761 0.006944 ** ## CALORIE97 0.003503 0.001682 2.083 0.040036 * ## prcpXtemp 1.753336 0.717317 2.444 0.016399 * ## GCGDP -0.349862 0.120966 -2.892 0.004763 ** ## X2009.Overall 0.193388 0.069378 2.787 0.006441 ** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 4.875 on 93 degrees of freedom ## (88 observations deleted due to missingness) ## Multiple R-squared: 0.849, Adjusted R-squared: 0.8344 ## F-statistic: 58.1 on 9 and 93 DF, p-value: < 2.2e-16="" now="" that="" we="" have="" the="" final="" model,="" we="" can="" output="" our="" regression="" results="" table.=""><-summary(q)$coefficients #="" get="" the="" table="" of="" regression="" coefficients,="" p-values,="" etc.=""><-data.frame(j,vif=c(na,vif(q))) # adding the vif as the last column of the table write.csv(j,file="regres.csv") # this writes the object j to a csv-format file called "regres.csv" regression results for final modelestimatestd..errort.valuepr…t..vif (intercept)-2.75235.1453-0.53490.5940na pcturb0.12150.03793.20360.00192.8667 simlang7.69062.69172.85720.00531.4669 pctylow400.53990.13703.94070.00021.5231 santot0.12890.03403.79340.00034.3245 wattot0.15730.05702.76090.00693.8197 calorie970.00350.00172.08260.04002.8525 prcpxtemp1.75330.71732.44430.01641.8877 gcgdp-0.34990.1210-2.89220.00481.6359 x2009.overall0.19340.06942.78750.00641.7612 5 influential variables you have used a t-statistic to judge whether an independent variable is significantly related to the dependent variable. here we look at a related question: of all the significant independent variables, which exerts the greatest influence on the dependent variable? there are three main ways to answer this question. the first is the use of standardized coefficients (often called beta coefficients). the standardized coefficients show by how many standard deviations the dependent variable will change, for a one standard deviation change in the independent variable. this is equivalent to the estimated coefficient times the standard deviation of the independent variable divided by the standard deviation of the dependent variable. the second is the calculation of elasticities. an elasticity is the percentage change in a dependent variable caused by a one percent change in the independent variable (e.g., the percentage change in quantity demanded caused by a one percent change in price). in a model in which all variables are converted to their natural logs (such as the cobb-douglas production function), the coefficient estimates can be directly interpreted as elasticities. in a linear regression, one multiplies the estimated coefficient by the mean of the independent variable divided by the mean of the dependent variable. the third way to determine the relative influences of the independent variables is to decompose the model \(r^2\) into the portion attributable to each independent variable. the \(r^2\) is the percent of the variation in the dependent variable that can be explained by the model. if the independent variables are perfectly orthogonal (that is, not correlated with each other), then each independent variable will explain a unique portion of the variation in the dependent variable. but independent variables are almost never orthogonal—they will share some variation, so that two or more independent variables will account for a portion of the variation in the dependent variables. decomposition of \(r^2\) into the portion accounted for by each independent variable is thus not straightforward. r contains several algorithms to perform this. 5.1 standardized coefficients we rescale our coefficients so they can be interpreted as the change in the dependent variable (measured in units of the standard deviation of the dependent variable) for a one standard deviation increase in the #="" adding="" the="" vif="" as="" the="" last="" column="" of="" the="" table="" write.csv(j,file="regres.csv" )="" #="" this="" writes="" the="" object="" j="" to="" a="" csv-format="" file="" called="" "regres.csv"="" regression="" results="" for="" final="" model="" estimate="" std..error="" t.value="" pr…t..="" vif="" (intercept)="" -2.7523="" 5.1453="" -0.5349="" 0.5940="" na="" pcturb="" 0.1215="" 0.0379="" 3.2036="" 0.0019="" 2.8667="" simlang="" 7.6906="" 2.6917="" 2.8572="" 0.0053="" 1.4669="" pctylow40="" 0.5399="" 0.1370="" 3.9407="" 0.0002="" 1.5231="" santot="" 0.1289="" 0.0340="" 3.7934="" 0.0003="" 4.3245="" wattot="" 0.1573="" 0.0570="" 2.7609="" 0.0069="" 3.8197="" calorie97="" 0.0035="" 0.0017="" 2.0826="" 0.0400="" 2.8525="" prcpxtemp="" 1.7533="" 0.7173="" 2.4443="" 0.0164="" 1.8877="" gcgdp="" -0.3499="" 0.1210="" -2.8922="" 0.0048="" 1.6359="" x2009.overall="" 0.1934="" 0.0694="" 2.7875="" 0.0064="" 1.7612="" 5="" influential="" variables="" you="" have="" used="" a="" t-statistic="" to="" judge="" whether="" an="" independent="" variable="" is="" significantly="" related="" to="" the="" dependent="" variable.="" here="" we="" look="" at="" a="" related="" question:="" of="" all="" the="" significant="" independent="" variables,="" which="" exerts="" the="" greatest="" influence="" on="" the="" dependent="" variable?="" there="" are="" three="" main="" ways="" to="" answer="" this="" question.="" the="" first="" is="" the="" use="" of="" standardized="" coefficients="" (often="" called="" beta="" coefficients).="" the="" standardized="" coefficients="" show="" by="" how="" many="" standard="" deviations="" the="" dependent="" variable="" will="" change,="" for="" a="" one="" standard="" deviation="" change="" in="" the="" independent="" variable.="" this="" is="" equivalent="" to="" the="" estimated="" coefficient="" times="" the="" standard="" deviation="" of="" the="" independent="" variable="" divided="" by="" the="" standard="" deviation="" of="" the="" dependent="" variable.="" the="" second="" is="" the="" calculation="" of="" elasticities.="" an="" elasticity="" is="" the="" percentage="" change="" in="" a="" dependent="" variable="" caused="" by="" a="" one="" percent="" change="" in="" the="" independent="" variable="" (e.g.,="" the="" percentage="" change="" in="" quantity="" demanded="" caused="" by="" a="" one="" percent="" change="" in="" price).="" in="" a="" model="" in="" which="" all="" variables="" are="" converted="" to="" their="" natural="" logs="" (such="" as="" the="" cobb-douglas="" production="" function),="" the="" coefficient="" estimates="" can="" be="" directly="" interpreted="" as="" elasticities.="" in="" a="" linear="" regression,="" one="" multiplies="" the="" estimated="" coefficient="" by="" the="" mean="" of="" the="" independent="" variable="" divided="" by="" the="" mean="" of="" the="" dependent="" variable.="" the="" third="" way="" to="" determine="" the="" relative="" influences="" of="" the="" independent="" variables="" is="" to="" decompose="" the="" model="" \(r^2\)="" into="" the="" portion="" attributable="" to="" each="" independent="" variable.="" the="" \(r^2\)="" is="" the="" percent="" of="" the="" variation="" in="" the="" dependent="" variable="" that="" can="" be="" explained="" by="" the="" model.="" if="" the="" independent="" variables="" are="" perfectly="" orthogonal="" (that="" is,="" not="" correlated="" with="" each="" other),="" then="" each="" independent="" variable="" will="" explain="" a="" unique="" portion="" of="" the="" variation="" in="" the="" dependent="" variable.="" but="" independent="" variables="" are="" almost="" never="" orthogonal—they="" will="" share="" some="" variation,="" so="" that="" two="" or="" more="" independent="" variables="" will="" account="" for="" a="" portion="" of="" the="" variation="" in="" the="" dependent="" variables.="" decomposition="" of="" \(r^2\)="" into="" the="" portion="" accounted="" for="" by="" each="" independent="" variable="" is="" thus="" not="" straightforward.="" r="" contains="" several="" algorithms="" to="" perform="" this.="" 5.1="" standardized="" coefficients="" we="" rescale="" our="" coefficients="" so="" they="" can="" be="" interpreted="" as="" the="" change="" in="" the="" dependent="" variable="" (measured="" in="" units="" of="" the="" standard="" deviation="" of="" the="" dependent="" variable)="" for="" a="" one="" standard="" deviation="" increase="" in="">
Nov 08, 2021
SOLUTION.PDF

Get Answer To This Question

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here