be completed inRstudio as an HTML document .Write my name please at the top of left corner :Author :Alain KATEBA .
--- title: "A2 Regression" subtitle: "Econ 4210" date: "30/01/2021" author: "Your name here" output: html_document --- ```{r setup, include=FALSE} knitr::opts_chunk$set( echo = TRUE, message = FALSE, warning = FALSE, messages = FALSE ) # Set the graphical theme ggplot2::theme_set(ggplot2::theme_light()) library(tidyverse) library(AER) library(kableExtra) library(modelsummary) library(haven) ``` # Instructions The data load from online sources. As you work through the code chunks, remember to turn `eval = T` if it is set to `eval = F`. Upload `HTML` file to e-class when done. Its ok, even encouraged, to work in groups of up to four. Please put your group members names at the top. # Wine Prices For this problem, load the data on the price of wine (Chateau Latour) sold at a 1986 London auction using the link and code provided. Wines sold were from vintage years 1882 to 1983. The data set includes information on rainfall and temperature in the region in France where Chateau Latour grapes are grown. ```{r} raw <- read_sas('https://ocw.mit.edu/courses/economics/14-32-econometrics-spring-2007/assignments/wine.sas7bdat')="" ```="" ```{r}="" raw="">-><- raw="" %="">% rename_with(tolower, everything()) ``` ## Data manipulation Using the wine data, perform the following tasks: 1. Subset the data to include only observations from 1952 or later. 2. Construct a new variable called `aveRain` that is the average rainfall in August (`aurain`) and September (`sprain`). 3. Construct a new variable called `aveTemp` that is the average temperature in July (`jltemp`) and August (`autemp`). 4. Construct a new variable called `logPrice` that is is the log price of wine (`price`). 5. Filter to keep all non-missing prices ```{r, eval = F} df <- raw="" %="">% filter(year >= ..., !is.na(price)) %>% mutate(aveRain = (... + ...)/2, aveTemp = (... + ...)/2, ... = log(price)) ``` ## Summary statistics Create a table of summary statistics using the package `modelsummary` and the function `datasummary()`. You can see examples on [this page](https://vincentarelbundock.github.io/modelsummary/articles/datasummary.html#datasummary-1). The function `datasummary()` requires as inputs the data frame and a formula. The formula has the syntax `rows ~ columns`. Thus, if we want variables as rows and summary statistics as columns, we would write `variable names ~ summary statistics`. I have outlined the code below. ```{r, eval = F} # Summary statistics datasummary(price + aveRain + aveTemp + time ~ (mean + sd + min + max) , data = ...) ``` ## Predicting wine prices The major determinant of wine quality is vintage (i.e., time passed since the wine was bottled.) This suggests that wine prices should increase with age. Let $t$ be the year a bottle of wine is bottled and $T$ be the year that it is sold at auction. When sold, the wine is $\tau = T-t$ years old. Suppose the unknown price of a wine aged $J$ is an increasing concave function $f(J)$. The present value of wine in the year it is bottled is therefore $$ v(\tau) = \frac{f(\tau)}{(1 + r)^{\tau}}, $$ where $r$ is the discount (interest) rate used by investors to evaluate future prospects. $f(0)$ is the price of the wine in the year it is bottled. Like financial investors holding a treasury bond, vintners and wine investors can sell their wine now or hold it and sell it at a later date. One theory of asset pricing in financial markets is that market forces will set prices to equalize the present value of an asset sold on different dates. This implies that: $$ log[f(\tau)] \approx r\cdot \tau + log[f(0)] $$ Where $ln[f(0)]$ is an unkown constant and $r$ is an unknown slope coefficient. The $\tau$ is the time since bottled, and is given in the data set by the variable `time`. The variable $ log[f(\tau)]$ is just the log price of wine `logPrice`. This relationship is approximate. To turn this into an econometric model, we can rewrite: $$ logPrice_i = \mu_0 + \mu_1 time_i + e_i $$ Answer the following: 1. What does $i$ index? 2. What is the interpretation of $\mu_0$? 3. What is the interpretation of $\mu_1$? 4. What is the interpretation of $e_i$ ## Estimation 3. Use the data for this problem set and the econometric model to estimate the discount rate used by those who buy and sell Chateau Latour. Answer: 1. Test whether the discount rate equals 3 percent. 2. Plot the data and fitted sample regression line. 3. Briefly discuss the fit of the regression. What is the $R^2$? 4. For which vintage does the model have the lowest prediction error? the highest? [Hint:The easiest way to do this is to add the residuals to the data frame, use [`slice_max` / `slice_min`](https://dplyr.tidyverse.org/reference/slice.html)]. ```{r, eval = F} # regression fit <- lm(...,="" data="df)" #="" display="" coefficients="" coeftest(...,="" sandwich)="" #="" r^2="" summary(fit)$r.squared="" #="" plot="" ggplot(df,="" aes(y="...," x="...))" +="" geom_point()="" +="" geom_smooth(method='lm' ,="" se="F)" #="" test="" linearhypothesis(fit,="" c(...),="" vcov.="...)" ```="" ##="" adding="" other="" controls="" a="" major="" determinant="" of="" wine="" quality="" besides="" age="" is="" the="" weather="" at="" the="" time="" the="" grapes="" used="" to="" make="" the="" wine="" were="" grown.="" good="" wine="" grapes="" are="" made="" by="" hot="" and="" dry="" summers.="" add="" the="" `averain`="" and="" `avetemp`="" to="" the="" regression.="" 1.="" by="" how="" much="" does="" the="" fit="" improve="" (using="" the="" $r^2$="" as="" a="" measure="" of="" fit).="" 2.="" interpret="" the="" coefficient="" on="" `avetemp`?="" 3.="" how="" does="" the="" interpretation="" of="" $\mu_0$="" change?="" 3.="" test="" that="" both="" `avetemp`="" and="" `averain`="" are="" **jointly**="" equal="" to="" zero="" using="" `linearhypothesis()`.="" ```{r,="" eval="F}" #="" new="" regression="" with="" controls="" fit2="">-><- lm(...)="" #="" test="" linearhypothesis(fit2,="" c(...),="" vcov.="...)" ```="" #="" regression="" mechanics="" in="" this="" section,="" we="" practice="" the="" mechanical="" properties="" of="" regression.="" our="" baseline="" empircal="" equation="" will="" be:="" $$="" \log(wage)_i="\mu_0" +="" \mu_1="" yearsch_i="" +="" e_i="" $$="" which="" relates="" the="" log="" wage="" of="" individual="" $i$="" to="" their="" number="" of="" years="" of="" education="" plus="" an="" error="" term,="" $e_i$.="" essentially,="" this="" equation="" asks="" us="" to="" predict="" log="" wages="" given="" education.="" we="" use="" the="" morg="" data="" from="" last="" assignment.="" which="" is="" loaded="" below="" from="" an="" online="" source:="" ```{r}="" ##="" load="" morg="" data="" morg="">-><- read_rds(url("https://github.com/ben-sand/ben-sand.github.io/blob/master/files/morg.rds?raw="true"))" ```="" ##="" data="" manipulation="" tasks:="" 1.="" filter="" the="" data="" so="" that="" it="" contains="" data="" from="" the="" years="" 1980="" and="" 2018,="" 2.="" create="" a="" variable="" called="" `lwage`="" that="" is="" equal="" to="" the="" log="" of="" wage,="" 3.="" construct="" a="" measure="" of="" potential="" work="" experience="" called="" `pexp`="" by="" computing:="" `age-yearsch-6`.="" in="" most="" data="" sets,="" we="" don't="" observe="" "actual"="" years="" of="" work="" experience,="" but="" we="" can="" proxy="" it="" by="" `pexp`="" which="" is="" the="" number="" of="" years="" since="" $i$="" left="" school="" (and="" presumably="" joined="" the="" work="" force),="" 4.="" convert="" `year`="" to="" a="" factor="" using="" `factor()`,="" ```{r,="" eval="F}" df="">-><- morg="" %="">% filter(...) %>% mutate(lwage = ..., pexp = ..., year = factor(...)) ``` ## A Tasks: 1. Create a data frame called `df18` that contains only data 2018. 1. Regress `lwage` on `yearsch` using only data on 2018. 2. Interpret the coefficient on `yearsch`. 3. What is the $R^2$ from the regression? ```{r, eval = F} df18 <- df="" %="">% ...(... == ...) # Regression baseline <- lm(lwage="" ~="" yearsch="" ,="" data="df18)" #="" output="" coefficients="" and="" correct="" se="" coeftest(baseline,="" sandwich)="" #="" get="" r^2="" from="" summary="" summary(baseline)$...="" ```="" ##="" b="" tasks:="" 1.="" add="" `pexp`="" and="" its="" square="" to="" the="" regression.="" 2.="" what="" is="" the="" marginal="" return="" to="" an="" additional="" year="" of="" potential="" experience="" for="" someone="" who="" has="" 10="" years="" of="" experience.="" 3.="" test="" that="" potential="" experience="" is="" related="" to="" log="" wages.="" ```{r,="" eval="F}" baseline="">-><- lm(lwage="" ~="" yearsch="" +="" ...="" +="" i(...^2),="" data="df18)" ceoftest(baseline,="" sandwich)="" ```="" ##="" c="" including="" both="" `pexp`="" and="" its="" square="" in="" the="" regression="" above="" allows="" our="" predictions="" to="" be="" a="" non-linear="" function="" of="" potential="" experience.="" the="" implication="" is="" that="" the="" **marginal**="" impact="" of="" `pexp`="" on="" `lwage`="" depends="" on="" the="" level="" of="" `pexp`="" of="" the="" individual.="" that="" is,="" those="" with="" one="" year="" of="" potential="" experience="" benefit="" more="" from="" an="" additional="" year="" than="" those="" with="" 25="" years.="" it="" can="" be="" helpful="" to="" visualize="" these="" impacts.="" to="" do="" so,="" we="" create="" a="" data="" frame="" with="" the="" same="" right="" hand="" side="" variables="" as="" the="" regression="" we="" wish="" to="" visualize,="" but="" allow="" online="" one="" variable="" to="" vary.="" in="" particular,="" below="" i="" create="" a="" data="" frame="" called="" `fig.data`="" that="" contains="" `yearsch`="" and="" `pexp`.="" however,="" i="" set="" `yearsch="12`" (a="" constant)="" but="" `pexp`="" a="" vector="" from="" 1="" to="" 45.="" thus,="" the="" predictions="" are="" for="" different="" values="" of="" `pexp`,="" fixing="" the="" number="" of="" years="" of="" education="" at="" 12.="" task:="" 1.="" complete="" the="" code="" below,="" 2.="" change="" `yearsch`="" to="" 16="" and="" re-run?="" what="" happens?="" is="" this="" expected?="" ```{r,="" eval="F}" fig.data="">-><- data.frame(yearsch="12," pexp="1:45)" fig.data="">-><- fig.data="" %="">% mutate(yhat = predict(baseline, newdata = fig.data)) ggplot(fig.data, aes(y = ..., x = ...)) + geom_line() ``` ## D The coefficient on `yearsch` changes from part **A** to part **B**. The reason is is that we added regressors. Understanding why coefficients change when regressors are added is extremely important for understanding regression coefficients, especially later on in the course. The only reason why the coefficient on `yearsch` changes between the two regressions is that `yearsch` and `pexp` are related. Tasks: 1.To see this, plot `yeasrch` against `pexp` using `df18` and the `geom_smooth()`. 2. What does your figure imply? ```{r, eval = F} ggplot(df18, aes(y = ..., x = ...)) + geom_smooth() ``` ## E When we control for `pexp`, the hypothetical thought experiment is that we want to find two people with the same number of years of potential experience but whose years of education differ. Then we compare their wages. We don't **actually** do this in practice.->->->->->