1. Run the model on slide 10 of Lecture “Polynomial Predictors”. Make sure to save the re-gression object. Then estimate emmeans as per slide 11 for:a.b.c.d.age, but leave out the “at =”...

Complete both questions on r markdown


1. Run the model on slide 10 of Lecture “Polynomial Predictors”. Make sure to save the re- gression object. Then estimate emmeans as per slide 11 for: a. b. c. d. age, but leave out the “at =” list. age, but specify the mean value of age in the regression sample. You can get the mean as mean(model.matrix(modInc2[, ‘age’]). First store this value in some variable, say “a”, and use this variable as “at = list(age = a)”. by female without specifying age. by female specifying age as mean in regression sample. 2. Run the fractional polynomial model from slide 46, but add female to the regression, “fp(age) + female”. poop Generate emmeans by age from 20 to 80 in increments of 10 years. Generate emmeans by female. Generate emmeans by female and by age from 20 to 80 in increments of 10 years. Generate a emmeans plot like in slide 48, but include a breakdown by female as well. To add female to the graph, add a term to the aes() statement so the aes() statement becomes “aes(x = age, y = emmean, colour = female)”. Otherwise the plot command looks the same. ECON 4041H Research Methodology Polynomial Predictors ECON 4041H Research Methodology Polynomial Predictors Byron Lew Department of Economics Trent University 2023 ©Byron Lew, 2023 Authorization is given to students enrolled in the course to reproduce this material exclusively for their own personal use. Byron Lew ECON 4041H – Research Methodology in Economics 1 / 53 Learning Objectives quadratic, cubic and potentially higher-order polynomial functions of independent variables data transformation to accommodate scale effect when using high-order polynomials use of margins to calculate partial derivatives awareness of how functional form forces certain constrained fit to data use of fractional polynomials Byron Lew ECON 4041H – Research Methodology in Economics 2 / 53 Implementing and interpreting higher-order terms Relationship between many economic phenomenon are often non-linear Income in particular tends to be affected non-linearly Example: income by age path <- ’h:/classes/4041/data/’="" gss=""><- readrds(paste0(path,="" ’gss.rds’))="" gss=""><- subset(gss,="" age=""><= 80)="" base=""><- ggplot(data="gss," aes(x="age," y="realrinc))" +="" theme_bw()="" +="" labs(x="’age’," y="’real" income="" ($1,000)’)="" base="" +="" geom_jitter(size="0.5)" +="" scale_y_continuous(labels="seq(0,500,100))" byron="" lew="" econ="" 4041h="" –="" research="" methodology="" in="" economics="" 3="" 53="" income="" by="" age="" 0="" 100="" 200="" 300="" 400="" 500="" 20="" 40="" 60="" 80="" age="" re="" al="" in="" co="" m="" e="" ($="" 1,="" 00="" 0)="" byron="" lew="" econ="" 4041h="" –="" research="" methodology="" in="" economics="" 4="" 53="" test="" for="" functional="" form="" on="" income="" data="" fit="" a="" lowess="" curve="" to="" the="" data="" to="" see="" what="" shape="" it="" reveals="" first="" graph="" is="" the="" lowess="" curve="" on="" top="" of="" the="" scatterplot="" base="" +="" geom_jitter(size="0.5)" +="" geom_smooth(method="’loess’)" +="" scale_y_continuous(labels="seq(0,500,100))" then="" to="" adjust="" for="" scale,="" second="" graph="" is="" only="" the="" lowess="" fitted="" curve="" base="" +="" geom_smooth(method="’loess’)" +="" scale_y_continuous(breaks="seq(5000,30000,5000)," labels="seq(5," 30,="" 5))="" byron="" lew="" econ="" 4041h="" –="" research="" methodology="" in="" economics="" 5="" 53="" income="" by="" age:="" lowess="" on="" scatter="" byron="" lew="" econ="" 4041h="" –="" research="" methodology="" in="" economics="" 6="" 53="" income="" by="" age:="" lowess="" function="" plot="" byron="" lew="" econ="" 4041h="" –="" research="" methodology="" in="" economics="" 7="" 53="" lowess="" curves="" fit="" looks="" remarkably="" similar="" to="" a="" quadratic,="" and="" no="" particular="" form="" was="" given="" to="" the="" loess()="" function="" one="" problem="" with="" lowess="" fits="" is="" the="" computing="" power="" required:="" can="" be="" memory="" intensive="" and="" slow="" solution:="" use="" random="" sample="" of="" dataset="" to="" reduce="" size="" and="" speed="" up="" lowess="" example:="" generate="" random="" sample="" of="" 5000="" observations="" gsssamp=""><- gss[sample(nrow(gss),="" 5000,="" replace="FALSE),]" use="" “gsssamp”="" in="" plot="" commands="" byron="" lew="" econ="" 4041h="" –="" research="" methodology="" in="" economics="" 8="" 53="" implementing="" and="" interpreting="" higher-order="" terms="" a="" common="" estimator="" is="" of="" the="" form="" y="β0" +="" β1x="" +="" β2x="" 2="" if="" the="" sign="" on="" the="" second-order="" term="" coefficient="" β2="" is="" negative,="" then="" we="" have="" a="" quadratic="" with="" a="" maximum="" at="" some="" midpoint="" the="" slope="" of="" this="" function="" is="" defined="" as="" ∂y="" ∂x="β1" +="" 2β2x="" if="" we="" insert="" a="" value="" of="" x="" in="" which="" we="" are="" interested,="" we="" get="" the="" value="" of="" the="" slope="" of="" the="" function="" we="" can="" even="" solve="" for="" the="" point="" that="" maximizes="" the="" function="" by="" setting="" the="" slope="" function="0," yielding="" x="−1" 2="" β1="" β2="" byron="" lew="" econ="" 4041h="" –="" research="" methodology="" in="" economics="" 9="" 53="" estimate="" parameters="" for="" income="" by="" age="" modinc2=""><- lm(realrinc~age+i(age^2)+female,="" data="gss)" age="" 2,412.34∗∗∗="" (64.36)="" i(agê="" 2)="" −24.20∗∗∗="" (0.73)="" femalefemale="" −12,419.24∗∗∗="" (283.70)="" constant="" −25,679.77∗∗∗="" (1,326.66)="" observations="" 32,100="" r2="" 0.11="" note:=""><0.1;><0.05;><0.01 sign="" on="" quadratic="" coefficient="" is="" negative="" so="" the="" function="" rises="" to="" a="" maximum="" byron="" lew="" econ="" 4041h="" –="" research="" methodology="" in="" economics="" 10="" 53="" example:="" income="" by="" age="" common="" to="" use="" quadratic="" for="" income,="" but="" note="" that="" the="" functional="" form="" does="" put="" some="" structure="" on="" the="" results="" for="" example,="" what="" if="" income="" rose,="" but="" then="" flattened="" out="" rather="" than="" declined,="" to="" be="" addressed="" later="" graph="" our="" predicted="" income="" by="" age="" using="" emmeans="" library(emmeans)="" modinc2.em=""><- summary(emmeans(modinc2,="" ~age,="" at="list(age=seq(18,80,2))))" this="" will="" generate="" mean="" predicted="" income="" for="" each="" value="" of="" age="" and="" save="" it="" in="" object="" “modinc2.em”,="" to="" be="" passed="" to="" ggplot()="" ggplot(modinc2.em,="" aes(x="age," y="emmean))" +="" geom_line()="" +="" geom_ribbon(aes(ymax="upper.CL," ymin="lower.CL)," alpha="0.2)" +="" labs(y="’income" ($1,000)’,="" x="’age’)" +="" theme_bw()="" +="" scale_y_continuous(breaks="seq(5000," 30000,="" 5000),="" labels="seq(5," 30,="" 5))="" +="" scale_x_continuous(breaks="seq(20,80,5))" plot="" on="" next="" slide="" .="" .="" .="" byron="" lew="" econ="" 4041h="" –="" research="" methodology="" in="" economics="" 11="" 53="" income="" by="" age,="" quadratic="" estimate="" 5="" 10="" 15="" 20="" 25="" 20="" 25="" 30="" 35="" 40="" 45="" 50="" 55="" 60="" 65="" 70="" 75="" 80="" age="" in="" co="" m="" e="" ($="" 1,="" 00="" 0)="" byron="" lew="" econ="" 4041h="" –="" research="" methodology="" in="" economics="" 12="" 53="" example:="" income="" by="" age="" calculate="" maximum="" point:="" age="−1" 2="" βage="" βage2="" variable.names(modinc2)="" #check="" coefficient="" names="" [1]="" "(intercept)"="" "age"="" "i(age^2)"="" "femalefemale"="" agestar=""><- -0.5*coef(modinc2)[’age’]/coef(modinc2)[’i(age^2)’]="" agestar="" age="" 49.83769="" is="" the="" age="" at="" which="" income="" is="" maximized="" byron="" lew="" econ="" 4041h="" –="" research="" methodology="" in="" economics="" 13="" 53="" marginal="" effects="" we="" can="" use="" the="" r="" function="" margins()="" to="" find="" the="" value="" of="" age="" where="" the="" derivative="" of="" the="" estimated="" function="" is="" 0="" library(margins)="" modinc2.me=""><- summary(margins(modinc2,="" variables="’age’," at="list(age=seq(20,80,10))))" age="" ame="" se="" z="" p="" 20="" 1444.26="" 35.81="" 40.33="" 0.0000="" 30="" 960.22="" 22.25="" 43.15="" 0.0000="" 40="" 476.18="" 12.00="" 39.68="" 0.0000="" 50="" -7.86="" 14.55="" -0.54="" 0.5891="" 60="" -491.90="" 27.04="" -18.19="" 0.0000="" 70="" -975.93="" 40.80="" -23.92="" 0.0000="" 80="" -1459.97="" 54.65="" -26.71="" 0.0000="" the="" function="" margins()="" takes="" the="" derivative="" of="" the="" model="" with="" respect="" to="" each="" variable="" listed="" (and="" all="" variables="" if="" option="" “variables="”" not="" included)="" byron="" lew="" econ="" 4041h="" –="" research="" methodology="" in="" economics="" 14="" 53="" marginal="" effects="" plot="" plotting="" the="" marginal="" values,="" with="" confidence="" interval="" band,="" makes="" it="" easier="" to="" see="" what="" is="" going="" on="" ggplot(modinc2.me,="" aes(x="age," y="AME))" +="" geom_line()="" +="" geom_ribbon(aes(ymax="upper," ymin="lower)," alpha="0.2)" +="" labs(y="’marginal" effect="" ($)’,="" x="’age’)" +="" geom_hline(yintercept="0," linewidth="0.2)" +="" theme_bw()="" byron="" lew="" econ="" 4041h="" –="" research="" methodology="" in="" economics="" 15="" 53="" marginal="" effect="" of="" age="" in="" quadratic="" income="" model="" −1000="" 0="" 1000="" 20="" 40="" 60="" 80="" age="" m="" ar="" gi="" na="" l="" e="" ffe="" ct="" (="" $)="" byron="" lew="" econ="" 4041h="" –="" research="" methodology="" in="" economics="" 16="" 53="" marginal="" effects="" function="" judging="" by="" the="" graph="" we="" can="" see="" that="" the="" line="" crosses="" y="0" around="" age="" 50="" note="" also="" that="" the="" value="" of="" ame="" for="" age="50" was="" not="" statistically="" signficant,="" ie.="" indistinguishable="" from="" 0="" we="" can="" find="" the="" value="" of="" the="" marginal="" effects="" function="" at="" agestar="" summary(margins(modinc2,="" variables="’age’," at="list(age=ageStar)))" factor="" age="" ame="" se="" z="" p="" lower="" upper="" age="" 49.8377="" 0.0000="" 14.4320="" 0.0000="" 1.0000="" -28.2862="" 28.2862="" byron="" lew="" econ="" 4041h="" –="" research="" methodology="" in="" economics="" 17="" 53="" marginal="" effects="" function="" notice="" the="" following="" attributes="" 1="" crosses="" “age”="" axis="" at="" value="" just="" under="" 50:="" point="" of="" max="" income="" 2="" downward="" sloping:="" increases="" in="" income="" are="" decreasing="" with="" age="" 3="" slope="" is="" the="" decrease="" in="" the="" rate="" of="" increase="" in="" income="" per="" year="" of="" age="" 4="" slope="" is="" constant:="" determined="" by="" specifying="" quadratic="" function="" of="" age="" →="" income="" f="" recall,="" quadratic="" relationship="" chosen="" from="" lowess="" byron="" lew="" econ="" 4041h="" –="" research="" methodology="" in="" economics="" 18="" 53="" introduction="" to="" calculus:="" quadratic="" function="" and="" its="" first="" derivative="" byron="" lew="" econ="" 4041h="" –="" research="" methodology="" in="" economics="" 19="" 53="" use="" of="" cubic="" functions="" useful="" when="" relationship="" displays="" two="" “turns”="" demonstrate="" on="" fertility="" data="" gss=""><- readrds(paste0(path,="" ’gss.rds’))="" gss=""><- gss[gss$age="">= 45 & gss$age <= 55="" &="" gss$yrborn="">= 1920 & gss$yrborn <=1960 &="" gss$female="=" ’female’,="" ]="" illustrate="" pattern="" by="" graphing="" lowess="" fit="" base=""><- ggplot(data="gss," aes(x="yrborn," y="children))" +="" theme_bw()="" +="" labs(x="’year" of="" birth’,="" y="’number" of="" children="" born’)="" base="" +="" geom_jitter(size="0.5)" +="" geom_smooth(method="’loess’)" byron="" lew="" econ="" 4041h="" –="" research="" methodology="" in="" economics="" 20="" 53="" number="" of="" children="" born="" by="" mother’s="" year="" of="" birth="" 0="" 2="" 4="" 6="" 8="" 1920="" 1930="" 1940="" 1950="" 1960="" year="" of="" birth="" nu="" m="" be="" r="" of="" c="" hi="" ld="" re="" n="" bo="" rn="" byron="" lew="" econ="" 4041h="" –="" research="" methodology="" in="" economics="" 21="" 53="" number="" of="" children="" born="" by="" mother’s="" year="" of="" birth="" 2.0="" 2.5="" 3.0="" 1920="" 1930="" 1940="" 1950="" 1960="" year="" of="" birth="" nu="" m="" be="" r="" of="" c="" hi="" ld="" re="" n="" bo="" rn="" byron="" lew="" econ="" 4041h="" –="" research="" methodology="" in="" economics="" 22="" 53="" plot="" of="" estimated="" marginal="" means="" from="" indicator="" function="" another="" method="" of="" examining="" the="" relationship="" between="" two="" variables="" run="" regression="" on="" independent="" variable="" yrborn="" as="" factor,="" then="" plot="" emmeans="" modyrbornf=""><- lm(children="" ~="" factor(yrborn),="" data="gss)" plt=""><- summary(emmeans(modyrbornf,="" ~yrborn))="" ggplot(plt,="" aes(x="yrborn," y="emmean))" +="" geom_line()="" +="" geom_errorbar(aes(ymax="upper.CL," ymin="lower.CL)," size="0.15)" +="" theme_bw()="" +="" labs(y="’number" of="" children="" born’,="" x="’mother\’s" year="" of="" birth’)="" +="" scale_x_continuous(breaks="seq(1920,1960,5))" byron="" lew="" econ="" 4041h="" –="" research="" methodology="" in="" economics="" 23="" 53="" emmeans="" from="" indep="" var="" as="" indicator="" 1.5="" 2.0="" 2.5="" 3.0="" 3.5="" 1920="" 1925="" 1930="" 1935="" 1940="" 1945="" 1950="" 1955="" 1960="" mother's="" year="" of="" birth="" nu="" m="" be="" r="" of="" c="" hi="" ld="" re="" n="" bo="" rn="" byron="" lew="" econ="" 4041h="" –="" research="" methodology="" in="" economics="" 24="" 53="" cubic="" function?="" lowess="" graph="" looked="" cubic="" indicator="" variable="" plot="" has="" cubic-like="" appearance="" but="" not="" clear="" that="" pattern="" for="" year-of-birth="" after="" 1945="" is="" i="" increasing="" (turn)="" or="" i="" flat="" important="" to="" note="" because="" cubic="" structure="" imposes="" turn="" we="" will="" test="" statistically="" whether="" cubic="" fits="" data="" byron="" lew="" econ="" 4041h="" –="" research="" methodology="" in="" economics="" 25="" 53="" cubic="" function:="" model="" estimate="" fit="" cubic="" function="" to="" estimate="" the="" relationship="" modyrb3=""><- lm(children="" ~="" yrborn="" +="" i(yrborn^2)="" +="" i(yrborn^3),="" data="gss)" yrborn="" 2,364.3660∗∗∗="" (229.4207)="" i(yrborn̂="" 2)="" −1.2180∗∗∗="" (0.1182)="" i(yrborn̂="" 3)="" 0.0002∗∗∗="" (0.00002)="" constant="" −1,529,809.0000∗∗∗="" (148,401.8000)="" observations="" 5,049="" r2="" 0.0903="" note:=""><0.1;><0.05;><0.01 byron="" lew="" econ="" 4041h="" –="" research="" methodology="" in="" economics="" 26="" 53="" cubic="" function:="" scale="" notice="" the="" scale="" of="" the="" variable="" apply(model.matrix(modyrb3),2,min)="" (intercept)="" yrborn="" i(yrborn^2)="" i(yrborn^3)="" 1="" 1920="" 3686400="" 7077888000="" apply(model.matrix(modyrb3),2,max)="" (intercept)="" yrborn="" i(yrborn^2)="" i(yrborn^3)="" 1="" 1960="" 3841600="" 7529536000="" i="" yrborn:="" range="" is="" 1920="" –="" 1960="" i="" yrborn2:="" range="" is="" 3,686,400="" –="" 3,841,600="" i="" yrborn3:="" range="" is="" 7.078×="" 109="" –="" 7.530×="" 109="" solution:="" make="" number="" smaller="" by="" subtracting="" 1940="" already="" provided="" in="" variable="" yrborn40="" byron="" lew="" econ="" 4041h="" –="" research="" methodology="" in="" economics="" 27="" 53="" cubic="" function:="" rescaled="" estimate="" modyrb3a=""><- lm(children="" ~="" yrborn40="" +="" i(yrborn40^2)="" +="" i(yrborn40^3),="" data="gss)" yrborn40="" −0.0909∗∗∗="" (0.0052)="" i(yrborn40̂="" 2)="" −0.0008∗∗∗="" (0.0002)="" i(yrborn40̂="" 3)="" 0.0002∗∗∗="" (0.00002)="" constant="" 2.7414∗∗∗="" (0.0372)="" observations="" 5,049="" note:=""><0.1;><0.05;><0.01 notice="" the="" size="" of="" the="" coefficients="" on="" the="" higher="" order="" term="" byron="" lew="" econ="" 4041h="" –="" research="" methodology="" in="" economics="" 28="" 53="" cubic="" function:="" rescaled="" estimate="" useful="" trick="" is="" to="" scale="" up="" parameters="" by="" scaling="" down="" the="" covariates="" gss$yrb40s=""><- gss$yrborn40/10="" shrinking="" the="" covariates="" means="" increasing="" the="" size="" of="" the="" estimated="" coefficients,="" and="" vice="" versa="" byron="" lew="" econ="" 4041h="" –="" research="" methodology="" in="" economics="" 29="" 53="" cubic="" function:="" rescaled="" twice="" modyrb3b=""><- lm(children="" ~="" yrb40s="" +="" i(yrb40s^2)="" +="" i(yrb40s^3),="" data="gss)" yrborn40="" −0.0909∗∗∗="" (0.0052)="" i(yrborn40̂="" 2)="" −0.0008∗∗∗="" (0.0002)="" i(yrborn40̂="" 3)="" 0.0002∗∗∗="" (0.00002)="" yrb40s="" −0.9087∗∗∗="" (0.0524)="" i(yrb40ŝ="" 2)="" −0.0775∗∗∗="" (0.0212)="" i(yrb40ŝ="" 3)="" 0.2091∗∗∗="" (0.0203)="" constant="" 2.7414∗∗∗="" 2.7414∗∗∗="" (0.0372)="" (0.0372)="" observations="" 5,049="" 5,049="" note:=""><0.1;><0.05;><0.01byron lew econ 4041h – research methodology in economics 30 / 53 reading model with transformations easier to read, and shows more digits numbers are same, but scaled by transformation i coef on linear term is now 10× larger i coef on quadratic term is now 102× larger i coef on cubic term is now 103× larger just remember that yrborn has been transformed, particularly if using margins command byron lew econ 4041h – research methodology in economics 31 / 53 detour: viewing output when we want to compare several models, and to save output, we need a tool to combine results and save to file “stargazer” is a useful and easy-to-use output summary tool library(stargazer) stargazer(modyrb3, modyrb3a, modyrb3b, type =’text’, out=’yearborn.txt’) this will save results to files ‘yearborn.txt’ which can be opened in word or other wordprocessor results also appear in r window in our example, because the lew="" econ="" 4041h="" –="" research="" methodology="" in="" economics="" 30="" 53="" reading="" model="" with="" transformations="" easier="" to="" read,="" and="" shows="" more="" digits="" numbers="" are="" same,="" but="" scaled="" by="" transformation="" i="" coef="" on="" linear="" term="" is="" now="" 10×="" larger="" i="" coef="" on="" quadratic="" term="" is="" now="" 102×="" larger="" i="" coef="" on="" cubic="" term="" is="" now="" 103×="" larger="" just="" remember="" that="" yrborn="" has="" been="" transformed,="" particularly="" if="" using="" margins="" command="" byron="" lew="" econ="" 4041h="" –="" research="" methodology="" in="" economics="" 31="" 53="" detour:="" viewing="" output="" when="" we="" want="" to="" compare="" several="" models,="" and="" to="" save="" output,="" we="" need="" a="" tool="" to="" combine="" results="" and="" save="" to="" file="" “stargazer”="" is="" a="" useful="" and="" easy-to-use="" output="" summary="" tool="" library(stargazer)="" stargazer(modyrb3,="" modyrb3a,="" modyrb3b,="" type="’text’," out="’yearBorn.txt’)" this="" will="" save="" results="" to="" files="" ‘yearborn.txt’="" which="" can="" be="" opened="" in="" word="" or="" other="" wordprocessor="" results="" also="" appear="" in="" r="" window="" in="" our="" example,="" because="">
Feb 21, 2023
SOLUTION.PDF

Get Answer To This Question

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here