--- title: "A2 Regression" subtitle: "Econ 4210" date: "30/01/2021" author: "Your name here" output: html_document --- ```{r setup, include=FALSE} knitr::opts_chunk$set( echo = TRUE, message = FALSE,...

1 answer below »
be completed inRstudio as an HTML document .Write my name please at the top of left corner :Author :Alain KATEBA .


--- title: "A2 Regression" subtitle: "Econ 4210" date: "30/01/2021" author: "Your name here" output: html_document --- ```{r setup, include=FALSE} knitr::opts_chunk$set( echo = TRUE, message = FALSE, warning = FALSE, messages = FALSE ) # Set the graphical theme ggplot2::theme_set(ggplot2::theme_light()) library(tidyverse) library(AER) library(kableExtra) library(modelsummary) library(haven) ``` # Instructions The data load from online sources. As you work through the code chunks, remember to turn `eval = T` if it is set to `eval = F`. Upload `HTML` file to e-class when done. Its ok, even encouraged, to work in groups of up to four. Please put your group members names at the top. # Wine Prices For this problem, load the data on the price of wine (Chateau Latour) sold at a 1986 London auction using the link and code provided. Wines sold were from vintage years 1882 to 1983. The data set includes information on rainfall and temperature in the region in France where Chateau Latour grapes are grown. ```{r} raw <- read_sas('https://ocw.mit.edu/courses/economics/14-32-econometrics-spring-2007/assignments/wine.sas7bdat')="" ```="" ```{r}="" raw=""><- raw="" %="">% rename_with(tolower, everything()) ``` ## Data manipulation Using the wine data, perform the following tasks: 1. Subset the data to include only observations from 1952 or later. 2. Construct a new variable called `aveRain` that is the average rainfall in August (`aurain`) and September (`sprain`). 3. Construct a new variable called `aveTemp` that is the average temperature in July (`jltemp`) and August (`autemp`). 4. Construct a new variable called `logPrice` that is is the log price of wine (`price`). 5. Filter to keep all non-missing prices ```{r, eval = F} df <- raw="" %="">% filter(year >= ..., !is.na(price)) %>% mutate(aveRain = (... + ...)/2, aveTemp = (... + ...)/2, ... = log(price)) ``` ## Summary statistics Create a table of summary statistics using the package `modelsummary` and the function `datasummary()`. You can see examples on [this page](https://vincentarelbundock.github.io/modelsummary/articles/datasummary.html#datasummary-1). The function `datasummary()` requires as inputs the data frame and a formula. The formula has the syntax `rows ~ columns`. Thus, if we want variables as rows and summary statistics as columns, we would write `variable names ~ summary statistics`. I have outlined the code below. ```{r, eval = F} # Summary statistics datasummary(price + aveRain + aveTemp + time ~ (mean + sd + min + max) , data = ...) ``` ## Predicting wine prices The major determinant of wine quality is vintage (i.e., time passed since the wine was bottled.) This suggests that wine prices should increase with age. Let $t$ be the year a bottle of wine is bottled and $T$ be the year that it is sold at auction. When sold, the wine is $\tau = T-t$ years old. Suppose the unknown price of a wine aged $J$ is an increasing concave function $f(J)$. The present value of wine in the year it is bottled is therefore $$ v(\tau) = \frac{f(\tau)}{(1 + r)^{\tau}}, $$ where $r$ is the discount (interest) rate used by investors to evaluate future prospects. $f(0)$ is the price of the wine in the year it is bottled. Like financial investors holding a treasury bond, vintners and wine investors can sell their wine now or hold it and sell it at a later date. One theory of asset pricing in financial markets is that market forces will set prices to equalize the present value of an asset sold on different dates. This implies that: $$ log[f(\tau)] \approx r\cdot \tau + log[f(0)] $$ Where $ln[f(0)]$ is an unkown constant and $r$ is an unknown slope coefficient. The $\tau$ is the time since bottled, and is given in the data set by the variable `time`. The variable $ log[f(\tau)]$ is just the log price of wine `logPrice`. This relationship is approximate. To turn this into an econometric model, we can rewrite: $$ logPrice_i = \mu_0 + \mu_1 time_i + e_i $$ Answer the following: 1. What does $i$ index? 2. What is the interpretation of $\mu_0$? 3. What is the interpretation of $\mu_1$? 4. What is the interpretation of $e_i$ ## Estimation 3. Use the data for this problem set and the econometric model to estimate the discount rate used by those who buy and sell Chateau Latour. Answer: 1. Test whether the discount rate equals 3 percent. 2. Plot the data and fitted sample regression line. 3. Briefly discuss the fit of the regression. What is the $R^2$? 4. For which vintage does the model have the lowest prediction error? the highest? [Hint:The easiest way to do this is to add the residuals to the data frame, use [`slice_max` / `slice_min`](https://dplyr.tidyverse.org/reference/slice.html)]. ```{r, eval = F} # regression fit <- lm(...,="" data="df)" #="" display="" coefficients="" coeftest(...,="" sandwich)="" #="" r^2="" summary(fit)$r.squared="" #="" plot="" ggplot(df,="" aes(y="...," x="...))" +="" geom_point()="" +="" geom_smooth(method='lm' ,="" se="F)" #="" test="" linearhypothesis(fit,="" c(...),="" vcov.="...)" ```="" ##="" adding="" other="" controls="" a="" major="" determinant="" of="" wine="" quality="" besides="" age="" is="" the="" weather="" at="" the="" time="" the="" grapes="" used="" to="" make="" the="" wine="" were="" grown.="" good="" wine="" grapes="" are="" made="" by="" hot="" and="" dry="" summers.="" add="" the="" `averain`="" and="" `avetemp`="" to="" the="" regression.="" 1.="" by="" how="" much="" does="" the="" fit="" improve="" (using="" the="" $r^2$="" as="" a="" measure="" of="" fit).="" 2.="" interpret="" the="" coefficient="" on="" `avetemp`?="" 3.="" how="" does="" the="" interpretation="" of="" $\mu_0$="" change?="" 3.="" test="" that="" both="" `avetemp`="" and="" `averain`="" are="" **jointly**="" equal="" to="" zero="" using="" `linearhypothesis()`.="" ```{r,="" eval="F}" #="" new="" regression="" with="" controls="" fit2=""><- lm(...)="" #="" test="" linearhypothesis(fit2,="" c(...),="" vcov.="...)" ```="" #="" regression="" mechanics="" in="" this="" section,="" we="" practice="" the="" mechanical="" properties="" of="" regression.="" our="" baseline="" empircal="" equation="" will="" be:="" $$="" \log(wage)_i="\mu_0" +="" \mu_1="" yearsch_i="" +="" e_i="" $$="" which="" relates="" the="" log="" wage="" of="" individual="" $i$="" to="" their="" number="" of="" years="" of="" education="" plus="" an="" error="" term,="" $e_i$.="" essentially,="" this="" equation="" asks="" us="" to="" predict="" log="" wages="" given="" education.="" we="" use="" the="" morg="" data="" from="" last="" assignment.="" which="" is="" loaded="" below="" from="" an="" online="" source:="" ```{r}="" ##="" load="" morg="" data="" morg=""><- read_rds(url("https://github.com/ben-sand/ben-sand.github.io/blob/master/files/morg.rds?raw="true"))" ```="" ##="" data="" manipulation="" tasks:="" 1.="" filter="" the="" data="" so="" that="" it="" contains="" data="" from="" the="" years="" 1980="" and="" 2018,="" 2.="" create="" a="" variable="" called="" `lwage`="" that="" is="" equal="" to="" the="" log="" of="" wage,="" 3.="" construct="" a="" measure="" of="" potential="" work="" experience="" called="" `pexp`="" by="" computing:="" `age-yearsch-6`.="" in="" most="" data="" sets,="" we="" don't="" observe="" "actual"="" years="" of="" work="" experience,="" but="" we="" can="" proxy="" it="" by="" `pexp`="" which="" is="" the="" number="" of="" years="" since="" $i$="" left="" school="" (and="" presumably="" joined="" the="" work="" force),="" 4.="" convert="" `year`="" to="" a="" factor="" using="" `factor()`,="" ```{r,="" eval="F}" df=""><- morg="" %="">% filter(...) %>% mutate(lwage = ..., pexp = ..., year = factor(...)) ``` ## A Tasks: 1. Create a data frame called `df18` that contains only data 2018. 1. Regress `lwage` on `yearsch` using only data on 2018. 2. Interpret the coefficient on `yearsch`. 3. What is the $R^2$ from the regression? ```{r, eval = F} df18 <- df="" %="">% ...(... == ...) # Regression baseline <- lm(lwage="" ~="" yearsch="" ,="" data="df18)" #="" output="" coefficients="" and="" correct="" se="" coeftest(baseline,="" sandwich)="" #="" get="" r^2="" from="" summary="" summary(baseline)$...="" ```="" ##="" b="" tasks:="" 1.="" add="" `pexp`="" and="" its="" square="" to="" the="" regression.="" 2.="" what="" is="" the="" marginal="" return="" to="" an="" additional="" year="" of="" potential="" experience="" for="" someone="" who="" has="" 10="" years="" of="" experience.="" 3.="" test="" that="" potential="" experience="" is="" related="" to="" log="" wages.="" ```{r,="" eval="F}" baseline=""><- lm(lwage="" ~="" yearsch="" +="" ...="" +="" i(...^2),="" data="df18)" ceoftest(baseline,="" sandwich)="" ```="" ##="" c="" including="" both="" `pexp`="" and="" its="" square="" in="" the="" regression="" above="" allows="" our="" predictions="" to="" be="" a="" non-linear="" function="" of="" potential="" experience.="" the="" implication="" is="" that="" the="" **marginal**="" impact="" of="" `pexp`="" on="" `lwage`="" depends="" on="" the="" level="" of="" `pexp`="" of="" the="" individual.="" that="" is,="" those="" with="" one="" year="" of="" potential="" experience="" benefit="" more="" from="" an="" additional="" year="" than="" those="" with="" 25="" years.="" it="" can="" be="" helpful="" to="" visualize="" these="" impacts.="" to="" do="" so,="" we="" create="" a="" data="" frame="" with="" the="" same="" right="" hand="" side="" variables="" as="" the="" regression="" we="" wish="" to="" visualize,="" but="" allow="" online="" one="" variable="" to="" vary.="" in="" particular,="" below="" i="" create="" a="" data="" frame="" called="" `fig.data`="" that="" contains="" `yearsch`="" and="" `pexp`.="" however,="" i="" set="" `yearsch="12`" (a="" constant)="" but="" `pexp`="" a="" vector="" from="" 1="" to="" 45.="" thus,="" the="" predictions="" are="" for="" different="" values="" of="" `pexp`,="" fixing="" the="" number="" of="" years="" of="" education="" at="" 12.="" task:="" 1.="" complete="" the="" code="" below,="" 2.="" change="" `yearsch`="" to="" 16="" and="" re-run?="" what="" happens?="" is="" this="" expected?="" ```{r,="" eval="F}" fig.data=""><- data.frame(yearsch="12," pexp="1:45)" fig.data=""><- fig.data="" %="">% mutate(yhat = predict(baseline, newdata = fig.data)) ggplot(fig.data, aes(y = ..., x = ...)) + geom_line() ``` ## D The coefficient on `yearsch` changes from part **A** to part **B**. The reason is is that we added regressors. Understanding why coefficients change when regressors are added is extremely important for understanding regression coefficients, especially later on in the course. The only reason why the coefficient on `yearsch` changes between the two regressions is that `yearsch` and `pexp` are related. Tasks: 1.To see this, plot `yeasrch` against `pexp` using `df18` and the `geom_smooth()`. 2. What does your figure imply? ```{r, eval = F} ggplot(df18, aes(y = ..., x = ...)) + geom_smooth() ``` ## E When we control for `pexp`, the hypothetical thought experiment is that we want to find two people with the same number of years of potential experience but whose years of education differ. Then we compare their wages. We don't **actually** do this in practice.
Answered 6 days AfterFeb 16, 2021

Answer To: --- title: "A2 Regression" subtitle: "Econ 4210" date: "30/01/2021" author: "Your name here" output:...

Mohd answered on Feb 18 2021
140 Votes
Instructions
Author:“Alain KATEBA”
install.packages("ggplot2",repos = "https://cran.rstudio.com")
## Error in install.packages : Updating loaded packages
ggplot2::theme_set(ggplot2::theme_light())
install.packages("tidyverse",repos = "https://cran.rstudio.com")
## Installing package into 'C:/Users/sanchi.kalra/Documents/R/win-library/4.0'
## (as 'lib' is unspecified)
## package 'tidyverse' successfully unpacked and MD5 sums checked
##
## The downloaded binary packages are in
## C:\Users\sanchi.kalra\AppData\Local\Temp\Rtmpu0Z4jP\downloaded_packages
install.packages("AER",repos = "https://cran.rstudio.com")
## Installing package into 'C:/Users/sanchi.kalra/Documents/R/win-library/4.0'
## (as 'lib' is unspecified)
## package 'AER' successfully unpacked and MD5 sums checked
##
## The downloaded binary packages are in
## C:\Users\sanchi.kalra\AppData\Local\Temp\Rtmpu0Z4jP\downloaded_packages
install.packages("kableExtra",repos = "https://cran.rstudio.com")
## Installing package into 'C:/Users/sanchi.kalra/Documents/R/win-library/4.0'
## (as 'lib' is unspecified)
##
## There is a binary version available but the source version is later:
## binary source needs_compilation
## kableExtra 1.3.1 1.3.
4 FALSE
## installing the source package 'kableExtra'
install.packages("modelsummary",repos = "https://cran.rstudio.com")
## Installing package into 'C:/Users/sanchi.kalra/Documents/R/win-library/4.0'
## (as 'lib' is unspecified)
##
## There is a binary version available but the source version is later:
## binary source needs_compilation
## modelsummary 0.6.5 0.6.6 FALSE
## installing the source package 'modelsummary'
install.packages("haven",repos = "https://cran.rstudio.com")
## Installing package into 'C:/Users/sanchi.kalra/Documents/R/win-library/4.0'
## (as 'lib' is unspecified)
## package 'haven' successfully unpacked and MD5 sums checked
##
## The downloaded binary packages are in
## C:\Users\sanchi.kalra\AppData\Local\Temp\Rtmpu0Z4jP\downloaded_packages
library(tidyverse)
## -- Attaching packages ------------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.3 v purrr 0.3.4
## v tibble 3.0.6 v dplyr 1.0.3
## v tidyr 1.1.2 v stringr 1.4.0
## v readr 1.4.0 v forcats 0.5.1
## -- Conflicts ---------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(AER)
## Loading required package: car
## Loading required package: carData
##
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
##
## recode
## The following object is masked from 'package:purrr':
##
## some
## Loading required package: lmtest
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
## Loading required package: sandwich
## Loading required package: survival
library(kableExtra)
##
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
##
## group_rows
library(modelsummary)
library(haven)
Instructions
The data load from online sources. As you work through the code chunks, remember to turn eval = T if it is set to eval = F. Upload HTML file to e-class when done. Its ok, even encouraged, to work in groups of up to four. Please put your group members names at the top.
Wine Prices
For this problem, load the data on the price of wine (Chateau Latour) sold at a 1986 London auction using the link and code provided. Wines sold were from vintage years 1882 to 1983. The data set includes information on rainfall and temperature in the region in France where Chateau Latour grapes are grown.
raw <- read_sas('https://ocw.mit.edu/courses/economics/14-32-econometrics-spring-2007/assignments/wine.sas7bdat')
raw <- raw %>%
rename_with(tolower, everything())
summary(raw)
## year winrain jutemp jltemp autemp
## Min. :1882 Min. : 1.0 Min. : 2.000 Min. : 3.00 Min. : 4.0
## 1st Qu.:1918 1st Qu.: 1.0 1st Qu.: 2.000 1st Qu.: 3.00 1st Qu.: 4.0
## Median :1949 Median : 1.0 Median : 2.000 Median : 3.00 Median : 4.0
## Mean :1942 Mean :263.9 Mean : 9.448 Mean :10.96 Mean :11.3
## 3rd Qu.:1968 3rd Qu.:527.5 3rd Qu.:17.500 3rd Qu.:19.30 3rd Qu.:19.0
## Max. :1985 Max. :755.0 Max. :21.700 Max. :23.10 Max. :21.5
##
## sptemp jurain jlrain aurain
## Min. : 5.00 Min. : 3.00 Min. : 7.00 Min. : 8.00
## 1st Qu.: 5.00 1st Qu.: 6.00 1st Qu.: 7.00 1st Qu.: 8.00
## Median : 5.00 Median : 6.00 Median : 7.00 Median : 8.00
## Mean :10.83 Mean : 32.72 Mean : 29.34 Mean : 34.04
## 3rd Qu.:17.05 3rd Qu.: 51.50 3rd Qu.: 46.00 3rd Qu.: 50.50
## Max. :20.40 Max. :150.00 Max. :150.00 Max. :150.00
##
## sprain jusun jlsun ausun
## Min. : 4.00 Min. : 10.00 Min. : 11.00 Min. : 12.00
## 1st Qu.: 9.00 1st Qu.: 10.00 1st Qu.: 11.00 1st Qu.: 12.00
## Median : 9.00 Median : 10.00 Median : 11.00 Median : 12.00
## Mean : 42.49 Mean : 36.85 Mean : 40.53 Mean : 40.98
## 3rd Qu.: 69.50 3rd Qu.: 86.25 3rd Qu.: 89.00 3rd Qu.: 91.00
## Max. :211.00 Max. :139.00 Max. :132.00 Max. :131.00
## NA's :17 NA's :18 NA's :17
## spsun price rain temp time
## Min. : 13.0 Min. : 150 Min. : 8.50 Min. : 3.50 Min. : 1.00
## 1st Qu.: 13.0 1st Qu.: 368 1st Qu.: 8.50 1st Qu.: 3.50 1st Qu.: 18.50
## Median : 13.0 Median : 637 Median : 8.50 Median : 3.50 Median : 37.00
## Mean : 43.5 Mean :1253 Mean : 38.27 Mean :11.13 Mean : 43.56
## 3rd Qu.: 96.5 3rd Qu.:1796 3rd Qu.: 61.25 3rd Qu.:19.32 3rd Qu.: 68.50
## Max. :146.0 Max. :5760 Max. :146.00 Max. :21.70 Max. :104.00
## NA's :17 NA's :3
Data manipulation
Using the wine data, perform the following tasks:
    Subset the data to include only observations from 1952 or later.
    Construct a new variable called aveRain that is the average rainfall in August (aurain) and September (sprain).
    Construct a new variable called aveTemp that is the average temperature in July (jltemp) and August (autemp).
    Construct a new variable called logPrice that is is the log price of wine (price).
    Filter to keep all non-missing prices
df <- raw %>%
filter(year >= 1952,
!is.na(price)) %>%
mutate(aveRain = (aurain + sprain)/2,
aveTemp = (jltemp + autemp)/2,
logPrice = log(price))
Summary statistics
Create a table of summary statistics using the package modelsummary and the function datasummary(). You can see examples on this page. The function datasummary() requires as inputs the data frame and a formula. The formula has the syntax rows ~ columns. Thus, if we want variables as rows and summary statistics as columns, we would write variable names ~ summary statistics. I have outlined the code below.
install.packages("flextable",repos = "https://cran.rstudio.com")
## Installing package into 'C:/Users/sanchi.kalra/Documents/R/win-library/4.0'
## (as 'lib' is unspecified)
## package 'flextable' successfully unpacked and MD5 sums checked
##
## The downloaded binary packages are in
## C:\Users\sanchi.kalra\AppData\Local\Temp\Rtmpu0Z4jP\downloaded_packages
library(flextable)
##
## Attaching package: 'flextable'
## The following objects are masked from 'package:kableExtra':
##
## as_image, footnote
## The following object is masked from 'package:purrr':
##
## compose
# Summary statistics
ds<-datasummary(price + aveRain + aveTemp + time ~ (mean + sd + min + max) , data = df)
ds
          mean      sd      min      max
     price      446.52      342.19      150.00      1767.00
     aveRain      70.24      32.49      19.00      146.00
     aveTemp      19.44      1.00      17.40      21.70
     time      18.42      9.52      3.00      34.00
Predicting wine prices
The major determinant of wine quality is vintage (i.e., time passed since the wine was bottled.) This suggests that wine prices should increase with age. Let \(t\) be the year a bottle of wine is bottled and \(T\) be the year that it is sold at auction. When sold, the wine is \(\tau = T-t\) years old. Suppose the unknown price of a wine aged \(J\) is an increasing concave function \(f(J)\). The present value of wine in the year it is bottled is therefore
\[
v(\tau) = \frac{f(\tau)}{(1 + r)^{\tau}},
\]
where \(r\) is the discount (interest) rate used by investors to evaluate future prospects. \(f(0)\) is the price of the wine in the year it is bottled.
Like financial investors holding a treasury bond, vintners and wine investors can sell their wine now or hold it and sell it at a later date. One theory of asset pricing in financial markets is that market forces will set prices to equalize the present value of an asset sold on different dates. This implies that:
\[
log[f(\tau)] \approx r\cdot \tau + log[f(0)]
\]
Where \(ln[f(0)]\) is an unkown constant and \(r\) is an unknown slope coefficient. The \(\tau\) is the time since bottled, and is given in the data set by the variable time. The variable $ log[f(\tau)]$ is just the log price of wine logPrice. This relationship is approximate. To turn this into an econometric model, we can rewrite:
\[
logPrice_i = \mu_0 + \mu_1 time_i + e_i
\]
Answer the following:
    What does \(i\) index?
A.i index represent row number
    What is the interpretation of \(\mu_0\)?
its intercept of reression line, that we want to fit.
3. What is the interpretation of \(\mu_1\)?
A. it represents explanatory variable Or we can say predictor(time)indepedent variable.
    What is the interpretation of \(e_i\).
its a constant term added to linear regressions model equation.
Estimation
    Use the data for this problem set and the econometric model to estimate the discount rate used by those who buy and sell Chateau Latour.
Answer:
    Test whether the discount rate equals 3 percent.
    Plot the data and fitted sample regression line.
    Briefly discuss the fit of the regression. What is the \(R^2\)?
R-square is the percent of response variable (logPrice) variability, we can explain with this model.
in this model 11.2% of variability in response variable (logPrice), we can explain.
    For which vintage does the model have the lowest prediction error?
As we can see from graph those residuals are closest to fitted line are having lowest prediction error.
the highest? [Hint:The easiest way to do this is to add the residuals to the data frame, use slice_max / slice_min].
# regression
fit <- lm(logPrice~time, data = df)
#fit1 <- lm(price~time, data = df)
# display coefficients
coeftest(fit, sandwich)
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.4912867 0.1779380 30.861< 2e-16 ***
## time 0.0219401 0.0092573 2.370 0.02465 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#coeftest(fit1, sandwich)
# R^2
summary(fit)$r.squared
## [1] 0.1120178
df4<-df%>%
mutate(res=resid(fit))
ggplot(df4, aes(y =res, x = price)) +
geom_point() +
geom_smooth(method = 'lm', se = F)
## `geom_smooth()` using formula 'y ~ x'
# Test
#linearHypothesis(fit, c("LogPrice<=res","LogPrice=>res"), vcov. = hccm)
Adding other controls
A major determinant of wine quality besides age is the weather at the time the grapes used to make the wine were grown. Good wine grapes are made by hot and dry summers. Add the aveRain and aveTemp to the regression.
    By how much does the fit improve (using the \(R^2\) as a measure of fit).
47.29-11.20=36.09 percent
    Interpret the coefficient on aveTemp?
its negative, it has negative influence towards response variable. if we increase aveRain than logPrice will decrease.
    How does the interpretation of \(\mu_0\) change?
Now it...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here