Rules & GuidelinesGround rulesThis assignment counts 30% of your final grade. You have to work through a set of tasks using R, and writeup your answers using Word, LaTeX, or R Markdown. The rules are...

Rules & GuidelinesGround rulesThis assignment counts 30% of your final grade. You have to work through a set of tasks using R, and writeup your answers using Word, LaTeX, or R Markdown. The rules are as follows:• Below you will find a set of tasks. Please answer all questions and work through all tasks. There is noword or page limit, but please be concise.• Deadline is November 15, 2021, at 11:59:59pm• For late submissions, UCD’s Late Submission of Coursework policy applies.• Papers are to be submitted on Brightspace → Assessment → Assignments• Submissions should be in one pdf, and should include: 1) the write-up of the assignment, 2) the R code.• Students are allowed to work in groups of up to five. If students work in a group, only one groupmember should submit the paper on Brightspace. On the first page of the paper it should be clearlystated that this was a group project and the names and student numbers of the group members shouldbe given.• UCD’s Student Plagiarism Policy will apply. I reserve the right to run plagiarism checks on Brightspace.• Questions should be posted on Brightspace.• A solution will not be provided after the deadline.GradingStudents will receive a letter grade for this assignment. Grading is based on the following criteria:• Correctness of the analysis and interpretations• Writing (clear and concise)• Exposition: are graphs and tables done well? They don’t need to look fancy, but it has to be clearwhat is shown. For regression tables, please use stargazer or alternative packages that give you nicelyformatted regression tables.• Bonus: a higher grade (1 notch, e.g. from B+ to A-) is given if all of the following are done:1. project written with R Markdown (can be done via RStudio); please indicate on the first page ifyou do so;2. all graphs and tables have been programmed with R, i.e. no copy & paste anywhere3. all graphs done with ggplot (but not with the default grey background);4. tidyverse functions (especially the pipe operator) are frequently used.1Some tipsThe aim of this assignment is to get students to “figure things out.” In the tutorials, clear instructions andcoding examples were given along with a clean data set. However, this is far away from the work data analystsare doing. Their projects typically have a clear goal, but the data are often messy and it is unclear how toreach the goal of the analysis. Simply put, the analyst has to “figure things out”: how to best clean thedata set, how to best visualise data, how to bring the data into a format that is suitable for visualisationand regression analysis, etc. If you’re working in a company, you neither refuse to do a project because “wehaven’t learned about a certain procedure in class”, nor can you run to your manager with every little errormessage you encounter. Ultimately, data analysts are paid for solving problems themselves or collaborativelywith team members. The sooner you get into that mindset, the better. This assignment is similar to a projectone would encounter in a data analytics project.How to figure things out?• Google is your friend. Get a strange error message? Type it into google; chances are someone elsehad the problem before. You can also search StackOverflow, the forum for all things programming (R,Python, C++, etc)• If one solution doesn’t work, try another one. Solving problems is often frustrating; it takes time anda decent bit of grit. So if you encounter a problem, solve it or find a way around. There is always asolution!PreparationFor some of the tasks below, you will need to know how to incorporate binary variables into a regression.Once you know how regression works, this is pretty straightforward. Here are some sources you may want toconsult:• When a regressor is a dummy: Chapter 5.3 in Stock & Watson; here is a good video• When the dependent variable is a dummy (also called linear probability model): Chapter 11.1. in Stock& Watson. See also this video, this video and this video. The latter video is based on Stock & Watson’smaterials.2A. Theory TasksSuppose you want to quantify the extent of discrimination in public service provision in the U.S.. Yourcompany has collected data on email queries of citizens to public offices (libraries, police, registry), wherebycitizens were asked in a survey to provide the relevant information. In particular, the survey includes data on1) a person’s race – whether the person is white or belongs to a minority (Black or Latinx), and 2) data onhow many hours it takes for a public office to respond.1. Suppose you want to estimate the effect of minority status (i.e. a dummy that equals one if a personbelongs to a minority – either Black or Latinx – and zero if the person is white) on the response timeto an email. Write down a regression equation that would allow you to estimate this effect.2. Explain what parameter you are interested in estimating and provide an interpretation of this parameter.3. Discuss the random sampling assumption and the conditional independence assumption (in the lectureit was E(u|X) = 0). Are these assumptions fulfilled in this case (explain why or why not)? Explainintuitively the likely consequences of these assumptions (not) being fulfilled for estimating the effect ofinterest.4. If you could run an experiment (regardless of ethical considerations) to estimate the effect of interest,what would this experiment look like and why? (N.B.: the ideal experiment asked for here is differentfrom an experiment described further below.)B. Empirical AnalysisIntroductionCorrado Giulietti, Mirco Tonin and Michael Vlassopoulos ran an experiment to quantify the extent of racialdiscrimination in local public services in the U.S.. The experiment is a so-called correspondence study,whereby the researchers sent email queries to over 19,000 public service providers (school districts, locallibraries, sheriff offices, county treasurers, job center veteran representatives, and county clerks). In eachquery, they asked for simple information, and they randomised whether the (fake) person writing it hada distinctively white-sounding name (Jake Mueller, Greg Walsh) or a distinctively black-sounding name(DeShawn Jackson, Tyrone Washington). After the emails were sent, the researchers took several measures ofproviders’ response behaviour, such as whether they wrote a response, how long it took to respond, and thelength of the response.Paper and dataYou can find the paper here and on Brightspace:• Corrado Giulietti, Mirco Tonin, Michael Vlassopoulos, Racial Discrimination in Local Public Services:A Field Experiment in the United States, Journal of the European Economic Association, Volume 17,Issue 1, February 2019, Pages 165–204, https://doi.org/10.1093/jeea/jvx045Along with the assignment on Brightspace, you find the dataset data_giulietti_etal.dta, which is inStata .dta format. We will use this dataset for the analysis to follow. Each observation is one email thatwas sent. The main variables for our analysis are shown in Table 1.3Table 1: Main VariablesVariable name Contentreply dummy: 1 if email received a reply, 0 if notcomplexity dummy: 1 if the email was complex, 0 if not complexcordial_reply dummy: 1 if the reply was cordial (such as name used, signed, etc), 0 if notlength_reply length of the reply in wordsdelay_reply response time in hoursrace dummy: 1 if black, 0 if whitesender 1: Tyrone Washington, 2: DeShawn Jackson, 3: Greg Walsh, 4: Jake Muellerrecipient 1: school district, 2: library, 3: sheriff, 4: treasurer, 5: job center, 6: county clerk4Tasks1. Load the dataset into R and produce a table of summary statistics (number of observations, mean, sd,median, min, max, number of missing observations) for the variables listed in Table 1. Interpret themean of reply, complexity, cordial_reply, length_reply and race.2. Produce a frequency table for the number of emails sent to each type of recipient. The table shouldinclude the number as well as the share of emails sent to each recipient.3. Produce a frequency table with senders on the horizontal and recipients on the vertical axis. Each cellshould show the share of all emails that was sent by a given sender to a given recipient (hint: searchfor cross tabulation). What does the result tell you about the quality of the randomisation in theexperiment?4. Run t-tests comparing the difference in means between Whites and Blacks for the following variables:reply, cordial_reply, length_reply, delay_reply. The results of the t-tests should be presentedin a table that shows the following: each row is a variable; columns: mean of the variable for Whites,mean of variable for Blacks, difference in means between Whites and Blacks, p-value of t-test. Interpretyour findings regarding magnitude and statistical significance.5. Regress the dummy reply on the dummy race. Interpret the coefficients of the slope and intercept,comment on statistical significance, and compare your results to those in the table produced in 4.6. Another way of analysing the results of an experiment like this is through bar charts with error bars.You plot the means for the treatment and control group and attach to each bar a so-called error bar(y ± sd(y)). The error bars give an indication of the variation in each group. Produce such a chart(separate bars for Black and White) for the following outcomes: reply, cordial_reply, length_reply.7. Not only did the researchers randomise whether the sender has a black-sounding name, but they alsorandomised whether an email had a complex or simple content. Run a regression of complexity onrace and interpret your result. Comment on the meaning of this result for the experimental design.8. Now regress delay_reply on race and interpret the coefficients in terms of magnitude and statisticalsignificance. Explain why your estimates are likely biased despite having run a clean experiment.C. Monte Carlo SimulationConsider the following data-generating processY = β1 × X + u1. Simulate 1000 samples of size n = 100 with β1 = 2, X ∼ N(100, 15) and u ∼ N(0, 8).2. In each sample, run a regression of Y on X and estimate βb1. Calculate the 95% confidence interval forβ13. Among the 1,000 confidence intervals you produced, calculate the share that include 2 and interpretyour results
Nov 12, 2021
SOLUTION.PDF

Get Answer To This Question

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here