Instructions: 17 questions worth 84 points (16 baseline points) and 1 extra credit question worth 4 points. Write answers directly on the exam. Would be quicker for me to grade if you use a different color font for your answer. Thanks!
Upload document to Canvas by the deadline.
Treatment Effect Precision (3 points)
1) Consider an experiment where treatment is randomly assigned and we estimate the average effect of treatment on an outcome variable using a simple linear regression:
Which of the following can increase the precision (i.e. reduce the uncertainty) of the treatment effect estimate? Check all that apply.
a) more observations
b) a more even split of observations across treatment status
c) adding control variables to the regression
d) none of the above
Returns to a Degree (12 points)
Returns to a College Degree
Assume an individual’s hourly wage is fully determined by the following equation:
where is a dummy variable for whether individual i has a college degree, is a dummy variable for whether person was born with high ability and is a random error term that has a mean of 0. This equation makes it clear that the average treatment effect of getting a college degree on one’s wage is an additional $20/hour.
2) What is the average wage of an individual with a college degree and low ability?
Consider a sample of 2,000 middle-aged employed adults. Assume the composition of this sample is as follows:
Table 2: Observations by type
3) Are the variables , and correlated in this sample?
a) yes, they are positively correlated
b) yes, they are negatively correlated
c) no, they are uncorrelated
Suppose our goal is to estimate the causal effect of a college degree on an individual’s wage from this sample, but we do not directly observe an individual’s natural born ability. Using ordinary least-squares we estimate the regression:
4) Do you expect the estimate of to be larger or smaller than ? Briefly explain your reasoning.
Stats Workshop Experiment (30 points)
Suppose a new rule was being considered that would require all ECO 231 students to take a week-long workshop immediately prior to the start of the course to review basic concepts. A trial run of the workshop was conducted last year. 50% of students were randomly assigned to attend the workshop (the treatment group) and the remaining 50% students were not permitted to attend (the control group). After the course, we assembled a data set for all ECO 231 students in that year with the following variables.
· is a dummy variable indicating whether student participated in the workshop
· is student ’s GPA at the end of the previous semester.
· is student ’s score on the first exam in ECO 231 out of a possible 100 points
· is student ’s final grade in ECO 231 out of a possible 100 points
We are interested in estimating the average treatment effect (the average change to a student’s grade caused by attending the workshop). We estimate the following regression:
5) What is the average grade for students who did not attend the workshop?
6) What is the difference in the average final grade for the treatment group (workshop participants) relative to the control group (students excluded from the workshop)? Is this difference statistically significant by conventional standards?
7) Evaluate the correctness of this statement and explain your reasoning: If we add to the regression model as a control variable the estimated coefficient on is likely to decrease.
8) Suppose we add as a control variable (in place of ). How would you expect the estimated coefficient on to change from the original model, if at all? Briefly explain.
9) In which regression model(s) do you expect the estimated coefficient on to be an unbiased estimate of the average treatment effect? Check all that apply and briefly defend why you chose some models and not others.
d. None of the above
10) External Validity: Suppose we are convinced that our workshop effect estimate is internally valid. Should we be confident that this estimate represents a good estimate of the average effect we would expect to see if we required all students in future 231 classes to attend the workshop?[footnoteRef:1] In other words, is the analysis externally valid for the population of interest? Why or why not? [1: Assume final grades are not curved so that a grade of 80 in one year represents the same level of performance as a grade of 80 in any other year.]
Microeconomics Grade (10 points)
Consider the example from class where we looked at student performance in Principles of Microeconomics course. Some of the variables in the data set include their final exam grade (), number of days attending class in the semester (), Cumulative GPA prior to the term (pGPA), ACT score (ACT), and their class year (freshman, sophomore, junior, senior) when they took the course. Consider the following 4 models and the subsequent regression results of the coefficient estimates.
Regression Results: Models 1-4
Outcome: Final Exam Score
Classes attended (out of 32)
Cumulative GPA prior to term
Class year dummies
Notes: Standard errors in parentheses. *** p<0.01, ** p<0.05, * p<0.1
11) Which model do you think gives the clearest picture of the average effect of attendance on a student's final exam score? Briefly defend your answer.
Central Bank Intervention (4 points)
The figure below plots the number of banks in business for each district before and after the Caldwell Bank collapse. District 6 aggressively lent funds to distressed banks and District 8 did not.
One attempt to estimate the effect of District 6’s aggressive lending policy would be to estimate the following regression:
where the variable is a dummy variable equal to 1 if the observation is from District 6 and 0 if from District 8, is a dummy variable equal to 1 if from 1931 and 0 from XXXXXXXXXXAnd the estimate of , the coefficient on the interaction term, would serve as our treatment effect estimate.
12) Would the estimate of in this regression be positive or negative? (Couldn’t hurt to show your work.)
c) Not enough information to determine
Toxic Spill and Housing Values (10 points)
Counties A and B are both located in industrial areas. At the beginning of 2010, there was a toxic chemical spill in County A. County B was unaffected by the spill. Below are average housing prices in 2009 to 2010 for Counties A and B:
13) Calculate the difference-in-differences (DD) estimate. What does this value represent in words?
In order to interpret the difference-in-differences estimate as the causal effect of the spill, we need to assume that County A would have followed the same time trend as County B if the spill never occurred. This is often called the “common trends” assumption. To assess this claim we expand our data set to include mean housing prices from 2007 and 2008 for each county.
14) Does this new data provide support for the common trends assumption? Briefly explain (1-3 sentences)
Medical Marijuana Policy (15 points)
Over the last two decades, many states have adopted medical marijuana laws (MMLs) that allow people over the age of 18 to legally acquire a prescription for marijuana for medicinal purposes (e.g. anti-seizure medication). Though there is evidence marijuana has costs and benefits for health, its downsides are thought to be more severe before adulthood. A common argument against the legalization of medical marijuana (let alone recreational marijuana) is that the passage of such laws encourages teenagers to use the drug. For example, the former director of the US Office of National Drug Control Policy (ONDCP), R. Gil Kerlikowske, is on record saying, “Well, if it’s called medicine and it’s given to patients by caregivers, then that’s really the wrong message to send to high school students.”
Around that same time, the ONDCP posted this to their official Twitter account.
The graph makes it clear that reported teen usage tends to be higher in states where medical marijuana is legal, and this correlation is unlikely to be due to random chance. Presumably the ONDCP tweeted this state-by-state comparison to suggest that medical marijuana laws have caused an increase in teen usage of marijuana.
Though their data come from the National Survey on Drug Use and Health of 12-17 year-olds, the Youth Risk Behavior Surveillance System (YRBSS) Survey is an easier dataset to access and asks a similar question to 14-18 year-olds about marijuana usage in the past month (this is the survey data we used in class to analyze this question). To match the time of the data in the tweeted graph, I restrict the dataset to the year 2011 only and calculate the proportion of teens using marijuana in the past 30 days () for each state and a dummy variable for whether or not medical marijuana was legal in that state in XXXXXXXXXXI use this cross-section of data to estimate the following regression.
The coefficient estimates for this regression are below:
15) Do these regression results coincide with the basic pattern shown in the graph? i.e. Is reported teen usage clearly higher in MML states in 2011? Briefly explain your answer.
16) Is this relationship strong evidence that medical marijuana laws have caused an increase in teen usage of marijuana? Are there other plausible explanations for these results? Briefly defend your answers.
17) Montana legalized medical marijuana in 2004. However, the states bordering it: Idaho (ID), North Dakota (ND), South Dakota (SD), and Wyoming (WY) did not adopt laws prior to 2016.[footnoteRef:2] Below, I plot the teen usage rates in Montana and its bordering states every two years[footnoteRef:3] from XXXXXXXXXXDoes this graph provide evidence that teen usage has increased because of medical marijuana legalization in Montana? Briefly explain. [2: Medical marijuana is currently illegal in South Dakota, Wyoming and Idaho. North Dakota legalized medical marijuana in November 2016.] [3: The YRBSS is a biannual survey.]
Extra Credit (4 points)
In class, we used the full YRBSS state-year panel dataset to estimate regression model of on and controlled for state and year effects [footnoteRef:4] [4: Or in compact notation, the model could be written as: .]
where indexes the state and indexes the year of the observation, the s represent state effects and the s represent year effects. The following table reports the estimate of and its standard error from this regression.
Table 1: Estimated Impact of Medical Marijuana Legalization on Teen Usage Rate
MML (0=illegal, 1=legal)
State Fixed Effects
Notes: Table 1 reports estimates for the regression slope on the variable MML controlling for state and year effects. Standard errors, corrected for clustering at the state level, are reported below the coefficient estimates in parentheses. Marginal statistical significance levels of coefficient estimates are indicated with asterisks: *** p<0.01, ** p<0.05, * p<0.1.
Suppose we use this analysis to conclude that there is little evidence that teen usage increased in states following the legalization of medical marijuana. Someone reads our analysis and has raised the following concern.
It seems possible the legalization of medical marijuana in one state could lead to an increase in teen usage across all states (perhaps each time a law was passed it helps promote or reinforce the notion that marijuana use is safe to teenagers throughout the country). For instance, a law passed in Montana may increase usage not only in Montana but in all other states as well. Wouldn’t your analysis miss this type of effect if it were present?
Is the critique correct? Briefly explain your reasoning.
Number of banks in business
District 6 District 8
MontanaBorder States (ID, ND, SD, WY)
Dashed line indicates year Montana legalized medical marijuana.
Teen Marijuana Usage Rate
_cons XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX
workshop XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX XXXXXXXXXX
grade XXXXXXXXXXCoef. Std. Err. t P>|t| [95% Conf. Interval]
. regress grade231 workshop, robust noheader