FinalProblemSet

1 answer below »
This is the last problem set for my Data Sciences class. I need a workbook with completed code as well as a report (both detailed in the attached instructions).


FinalProblemSet
Answered 5 days AfterMay 04, 2021

Answer To: FinalProblemSet

Uttam answered on May 10 2021
144 Votes
Final/finalproblemset-gexy0jl5.html
Final Problem Set: Due Tuesday, May 10 at 5:00 pm EDT¶
Important: You need basque.RData to complete this set. All are located on eLC.
Answer the questions below. This notebook will be your workspace so add cells as you please. When you are finished, you will submit two objects to eLC:
        **Written pdf Report.** This should be short. You should include:
        The research question.
        The figure from your Exploratory Data Analysis along with discussion. Be sure to note the difference in per capita GDP pre and post 1975.
        Difference-in-differences result. Be sure to include the figure comparing the Basque country and the control region. Discuss how the control group satisfies our criteria. Be sure to discuss identification (see "interpretation" of DiD below) and results relative to the research question. Ideally, you should work your estimation result into the figure so the reader can quickly observe the "answer" to the research question.
        Synthetic Control result. Be sure to include the figure comparing the Basque country and the synthetic control region. Be sure to discuss identification (see "interpretation" of DiD below) and results relative to the research question. Ideally, you should work your estimation result into the figure so the reader can quickly observe the "answer" to the research question.
Your analysis should be professional: i.e., well-written, clear, and concise. Figures should be incorporated in your analysis. For example, it would be useful to set the title of each figure in this notebook (eg, "Figure 1: Per Capita GDP in the Basque Region") so in the final report you can reference the figure in the appopriate commentary. Save your report as a pdf ("File/Save As Adobe PDF") with the naming convention 'FinalReport[insert last name]'. For example, ''Final_Report_Thurk.pdf'.
        **Jupyter Notebook.** Print the notebook as a pdf. [To print: From the file menu, choose 'print preview'. A new tab will open with the notebook presented as html. Print as a pdf.] Save your pdf notebook with the naming conention 'FinalWorkbook[insert last name]'. For example, 'Final_Workbook_Thurk.pdf'. Think of the notebook as your opportunity to show your work.
Grading: The problem set is worth 90 points and partial credit is indicated for each exercise. I will grade your report (1) and use your notebook (2) to assign partial credit in the event there are errors.
        You may use your notes, books, and the internet.
        Do not consult with other people. This work should be entirely your own.
Exercise 1: Exploratory Data Analysis [30 points]¶
We're going to estimate the economic impact of terrorism in the Basque region -- an autonomous community in Spain. To do this, we'll use information from 17 other Spanish regions where our underlying assumption will be that terrorism affected economic activity in the Basque region but not elsewhere whereas other economic shocks are aggregate and affect all of Spain.
Here are the details:
        Coverage from 1955–1997 for 18 Spanish regions. One of the data "regions" is all of Spain which we won't use.
        The treatment region is “Basque Country (Pais Vasco)”.
        The "treatment" year is 1975 since there were several bombings around that year.
        We will measure "economic impact" via GDP per capita (in thousands).
Background: Euskadi Ta Askatasuna (ETA), was an armed leftist Basque nationalist and separatist organization in the Basque Country (in northern Spain and southwestern France). The group was founded in 1959 and later evolved from a group promoting traditional Basque culture to a paramilitary group engaged in a violent campaign of bombing, assassinations and kidnappings in the Southern Basque Country and throughout Spanish territory. Its goal was gaining independence for the Basque Country. ETA was the main group within the Basque National Liberation Movement and was the most important Basque participant in the Basque conflict. While its terrorist activities spanned several decades, the death of Spanish dictator Francisco Franco in 1975 led to a substantial increase in bombings. (Wikipedia)
Part A: Load Data¶
Thus far we've learned how to load data from csv, excel, and stata. Another popular programming language (and to some degree a rival of python) is the open-source language R. The data we want to access is from the following academic paper
Abadie, A. and Gardeazabal, J. (2003) "Economic Costs of Conflict: A Case Study of the Basque Country." American Economic Review 93 (1) 113--132.
where the data was saved in the R programming language. We'll do this by leveraging python's own open-source nature and load a user-created package called pyreadr which we install as usual in the terminal:
pip install pyreadr
We then use pyreadr to load basque.RData access the data as follows:
result = pyreadr.read_r('basque.RData')
data = result['basque'] # extract the pandas dataframe
In [1]:

import pyreadr, pandas as pd
pd.set_option('display.max_columns', 20)
result = pyreadr.read_r('\\basque-pehqaa4x.RData')
result = [pd.DataFrame(i) for i in result.values()][0]
result


Out[1]:
                regionno        regionname        year        gdpcap        sec.agriculture        sec.energy        sec.industry        sec.construction        sec.services.venta        sec.services.nonventa        school.illit        school.prim        school.med        school.high        school.post.high        popdens        invest
        rownames                                                                                                                                        
        1        1.0        Spain (Espana)        1955.0        2.354542        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN
        2        1.0        Spain (Espana)        1956.0        2.480149        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN
        3        1.0        Spain (Espana)        1957.0        2.603613        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN
        4        1.0        Spain (Espana)        1958.0        2.637104        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN
        5        1.0        Spain (Espana)        1959.0        2.669880        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN
        ...        ...        ...        ...        ...        ...        ...        ...        ...        ...        ...        ...        ...        ...        ...        ...        ...        ...
        770        18.0        Rioja (La)        1993.0        9.132391        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        16.765787
        771        18.0        Rioja (La)        1994.0        9.498000        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        16.469452
        772        18.0        Rioja (La)        1995.0        9.752213        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        20.275650
        773        18.0        Rioja (La)        1996.0        10.056413        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN
        774        18.0        Rioja (La)        1997.0        10.476292        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN
774 rows × 17 columns
Part B: Drop Spain¶
Drop the region "Spain (Espana)".
In [2]:

result = result[result['regionname'] != 'Spain (Espana)'] # Drop Spain

Part C: Plot GDP Per Capita Over Time¶
Before proceeding into the econometrics, it's useful to graph the data variation we're most interested in. We're interested in evaluating the effect of terrororism on per capita GDP where we're using the bombings in 1975 as a "natural experiment." Plot per capita (ie, gdpcap) in the Basque region across time. Put a vertical dashed line at 1975 and indicate Pre/Post periods.
In [3]:

import matplotlib.pyplot as plt
time = result['year']
position = result['gdpcap']
plt.plot(time, position)
plt.xlabel('year')
plt.ylabel('real per-capita GDP (1986 USD, thousand)')


Out[3]:
Text(0, 0.5, 'real per-capita GDP (1986 USD, thousand)')


In [ ]:

We observe that per capita GDP decreases after the 1975 increase in ETA terrorist bombings and then increases. How much of that decrease is due to terrorism and how much is due to general economic uncertainty from Franco's death?
Exercise 2: Difference-in-Differences [30 points]¶
Part A: Comparing Before and After¶
We'll begin by comparing the per capit GDP before and after the 1975 treament year. Solve for average per capita GDP in the Basque region before (and including) 1975 to after 1975 (i.e., two numbers).
In [126]:

year = compare['year']
compare['year'] = compare['year'].astype(int)



:2: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
compare['year'] = compare['year'].astype(int)
In [125]:

compare.plot(x="year", y="regionname", kind="scatter")


Out[125]:



In [124]:

compare[compare['regionname'] == 'Basque Country (Pais Vasco)']


Out[124]:
                regionname        year        gdpcap
        rownames                        
        689        Basque Country (Pais Vasco)        1955.0        3.853185
        690        Basque Country (Pais Vasco)        1956.0        3.945658
        691        Basque Country (Pais Vasco)        1957.0        4.033562
        692        Basque Country (Pais Vasco)        1958.0        4.023422
        693        Basque Country (Pais Vasco)        1959.0        4.013782
        694        Basque Country (Pais Vasco)        1960.0        4.285918
        695        Basque Country (Pais Vasco)        1961.0        4.574336
        696        Basque Country (Pais Vasco)        1962.0        4.898957
        697        Basque Country (Pais Vasco)        1963.0        5.197015
        698        Basque Country (Pais Vasco)        1964.0        5.338903
        699        Basque Country (Pais Vasco)        1965.0        5.465153
        700        Basque Country (Pais Vasco)        1966.0        5.545916
        701        Basque Country (Pais Vasco)        1967.0        5.614896
        702        Basque Country (Pais Vasco)        1968.0        5.852185
        703        Basque Country (Pais Vasco)        1969.0        6.081405
        704        Basque Country (Pais Vasco)        1970.0        6.170094
        705        Basque Country (Pais Vasco)        1971.0        6.283633
        706        Basque Country (Pais Vasco)        1972.0        6.555555
        707        Basque Country (Pais Vasco)        1973.0        6.810769
        708        Basque Country (Pais Vasco)        1974.0        7.105184
        709        Basque Country (Pais Vasco)        1975.0        7.377892
        710        Basque Country (Pais Vasco)        1976.0        7.232934
        711        Basque Country (Pais Vasco)        1977.0        7.089831
        712        Basque Country (Pais Vasco)        1978.0        6.786704
        713        Basque Country (Pais Vasco)        1979.0        6.639817
        714        Basque Country (Pais Vasco)        1980.0        6.562839
        715        Basque Country (Pais Vasco)        1981.0        6.500785
        716        Basque Country (Pais Vasco)        1982.0        6.545059
        717        Basque Country (Pais Vasco)        1983.0        6.595330
        718        Basque Country (Pais Vasco)        1984.0        6.761497
        719        Basque Country (Pais Vasco)        1985.0        6.937161
        720        Basque Country (Pais Vasco)        1986.0        7.332191
        721        Basque Country (Pais Vasco)        1987.0        7.742788
        722        Basque Country (Pais Vasco)        1988.0        8.120537
        723        Basque Country (Pais Vasco)        1989.0        8.509711
        724        Basque Country (Pais Vasco)        1990.0        8.776778
        725        Basque Country (Pais Vasco)        1991.0        9.025279
        726        Basque Country (Pais Vasco)        1992.0        8.873893
        727        Basque Country (Pais Vasco)        1993.0        8.718224
        728        Basque Country (Pais Vasco)        1994.0        9.018138
        729        Basque Country (Pais Vasco)        1995.0        9.440874
        730        Basque Country (Pais Vasco)        1996.0        9.686518
        731        Basque Country (Pais Vasco)        1997.0        10.170666
In [129]:

import matplotlib.pyplot as plt
plt.plot(compare["regionname"], compare["gdpcap"])
plt.show()



In [147]:

compare.groupby("year").apply(lambda s: pd.Series({"gdpSum": s["gdpcap"].sum()})).plot()


Out[147]:



In [122]:

compare = result[['regionname','year','gdpcap']]
compare_below_1975 = compare[compare['year'] <= 1975]
compare_above_1975 = compare[compare['year'] >= 1975]
compare


Out[122]:
                regionname        year        gdpcap
        rownames                        
        44        Andalucia        1955.0        1.688732
        45        Andalucia        1956.0        1.758498
        46        Andalucia        1957.0        1.827621
        47        Andalucia        1958.0        1.852756
        48        Andalucia        1959.0        1.878035
        ...        ...        ...        ...
        770        Rioja (La)        1993.0        9.132391
        771        Rioja (La)        1994.0        9.498000
        772        Rioja (La)        1995.0        9.752213
        773        Rioja (La)        1996.0        10.056413
        774        Rioja (La)        1997.0        10.476292
731 rows × 3 columns
In [127]:

compare_groupby = compare.groupby("year")["gdpcap"].sum().sort_values()

Part B: First-Difference Regression¶
We can evaluate the effect of the ETA is by constructing a "first difference" equation by having gdpcap as the dependent variable and adding a post indicator as an independent variable; i.e.,
Per Capita GDPt=β0+β1Pt+ϵtPer Capita GDPt=β0+β1Pt+ϵtwhere PtPt is equal to one for all years after 1975 and zero otherwise. Run the above regression using only information for the Basque region.
In [134]:

import numpy as np
import matplotlib.pyplot as plt

def estimate_coef(x, y):
n = np.size(x)

m_x = np.mean(x)
m_y = np.mean(y)

SS_xy = np.sum(y*x) - n*m_y*m_x
SS_xx = np.sum(x*x) - n*m_x*m_x

b_1 = SS_xy / SS_xx
b_0 = m_y - b_1*m_x

return (b_0, b_1)

def plot_regression_line(x, y, b):
plt.scatter(x, y, color = "m",
marker = "o", s = 30)

y_pred = b[0] + b[1]*x

plt.plot(x, y_pred, color = "g")

plt.xlabel('x')
plt.ylabel('y')

plt.show()
def main():
x = np.array(compare['gdpcap'])
y = np.array(compare['year'])
b = estimate_coef(x, y)
print("Estimated coefficients:\nb_0 = {} \
\nb_1 = {}".format(b[0], b[1]))

plot_regression_line(x, y, b)
if __name__ == "__main__":
main()



Estimated coefficients:
b_0 = 1952.1932542750592
b_1 = 4.411319205122376


What effect did the ETA bombings have on Basque per capita GDP? Compare your results from (A) and (B).
In [ ]:

Part C: Difference-in-Differences (DiD)¶
We'll establish causality by comparing the trend of a "control" group to the Basque region. This is like conducting a an experiment where you give some patients a vaccine, drug, etc. while you give other patients a placebo. In that experimental setting, we design the experiment such that the control and treatment groups are identical but for the treament applied. We then observe whether or not outcomes-of-interest are significantly different between the two groups.
The same rationale holds here but of course we don't have an experimental setting so we choose a control group after the fact under the assumption this group provides an adequate proxy for the trend that would have been observed in the treatment group in the absence of treatment. if true, the difference in change of slope would be the actual treatment effect. Hence, the "difference in differences" terminology. Note that our "identification strategy" requires the treatment and control groups follow the same pre-period (i.e., pre-treatment) trend which is something we can easilly show.
In fact, we'll choose our control group using the following criteria:
        High correlation between our control region's per capita GDP and the per capita GDP in the Basque country before 1975.
        The control region is not located close to the Basque region. Implicitly, this criterion is based on our assumption that ETA bombings did not affect the control region's per capita GDP.
Solve for the correlation in per capita gdp between the differrent regions and the Basque country using only the pre-1975 period. Your output should be a table of correlations between per capita GDP of non-Basque regions (e.g., "Asturias") and the Basque region. Choose a control region which satisfies the above two criteria.
In [140]:

compare.groupby('year')[['regionname','gdpcap']].corr()


Out[140]:
                        gdpcap
        year                
        1955        gdpcap        1.0
        1956        gdpcap        1.0
        1957        gdpcap        1.0
        1958        gdpcap        1.0
        1959        gdpcap        1.0
        1960        gdpcap        1.0
        1961        gdpcap        1.0
        1962        gdpcap        1.0
        1963        gdpcap        1.0
        1964        gdpcap        1.0
        1965        gdpcap        1.0
        1966        gdpcap        1.0
        1967        gdpcap        1.0
        1968        gdpcap        1.0
        1969        gdpcap        1.0
        1970        gdpcap        1.0
        1971        gdpcap        1.0
        1972        gdpcap        1.0
        1973        gdpcap        1.0
        1974        gdpcap        1.0
        1975        gdpcap        1.0
        1976        gdpcap        1.0
        1977        gdpcap        1.0
        1978        gdpcap        1.0
        1979        gdpcap        1.0
        1980        gdpcap        1.0
        1981        gdpcap        1.0
        1982        gdpcap        1.0
        1983        gdpcap        1.0
        1984        gdpcap        1.0
        1985        gdpcap        1.0
        1986        gdpcap        1.0
        1987        gdpcap        1.0
        1988        gdpcap        1.0
        1989        gdpcap        1.0
        1990        gdpcap        1.0
        1991        gdpcap        1.0
        1992        gdpcap        1.0
        1993        gdpcap        1.0
        1994        gdpcap        1.0
        1995        gdpcap        1.0
        1996        gdpcap        1.0
        1997        gdpcap        1.0
Plot per capita GDP for the Basque country and your choice of control region (i.e., the region which bes
t satisfies the above criteria). Put a vertical dashed line at 1975 and indicate Pre/Post periods.
In [142]:

def histogram_intersection(a, b):
v = np.minimum(a, b).sum().round(decimals=1)
return v
compare.corr(method=histogram_intersection)


Out[142]:
                year        gdpcap
        year        1.0        3945.0
        gdpcap        3945.0        1.0
In [152]:

import numpy as np
compare.groupby("year").apply(lambda s: pd.Series({"gdpSum": s["gdpcap"].sum()})).plot()


Out[152]:



Run the following difference-in-differences regression:
Per Capita GDPit=β0+β1Pt+β2Ti+β3Pt×Ti+ϵtPer Capita GDPit=β0+β1Pt+β2Ti+β3Pt×Ti+ϵtwhere PtPt is the "period" indicator equal to one for all years after 1975 and zero otherwise; TiTi is the "treatment" group indicator equal to one when the region is the Basque region; and ii is the region. Note the regression only uses observations from the treatment (Basque region) and control group.
In [164]:

result


Out[164]:
                regionno        regionname        year        gdpcap        sec.agriculture        sec.energy        sec.industry        sec.construction        sec.services.venta        sec.services.nonventa        school.illit        school.prim        school.med        school.high        school.post.high        popdens        invest
        rownames                                                                                                                                        
        44        2.0        Andalucia        1955.0        1.688732        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN
        45        2.0        Andalucia        1956.0        1.758498        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN
        46        2.0        Andalucia        1957.0        1.827621        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN
        47        2.0        Andalucia        1958.0        1.852756        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN
        48        2.0        Andalucia        1959.0        1.878035        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN
        ...        ...        ...        ...        ...        ...        ...        ...        ...        ...        ...        ...        ...        ...        ...        ...        ...        ...
        770        18.0        Rioja (La)        1993.0        9.132391        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        16.765787
        771        18.0        Rioja (La)        1994.0        9.498000        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        16.469452
        772        18.0        Rioja (La)        1995.0        9.752213        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        20.275650
        773        18.0        Rioja (La)        1996.0        10.056413        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN
        774        18.0        Rioja (La)        1997.0        10.476292        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN
731 rows × 17 columns
In [188]:

pd.set_option('display.max_columns', 30)
df = result[result['regionname'] != 'Spain (Espana)']
pivot = df.pivot_table(values='gdpcap', index='regionname', columns=['year'])
dfProp99 = pd.DataFrame(pivot.to_records())
allColumns = dfProp99.columns.values
dfProp99


Out[188]:
                regionname        1955.0        1956.0        1957.0        1958.0        1959.0        1960.0        1961.0        1962.0        1963.0        1964.0        1965.0        1966.0        1967.0        1968.0        ...        1983.0        1984.0        1985.0        1986.0        1987.0        1988.0        1989.0        1990.0        1991.0        1992.0        1993.0        1994.0        1995.0        1996.0        1997.0
        0        Andalucia        1.688732        1.758498        1.827621        1.852756        1.878035        2.010140        2.129177        2.280348        2.431020        2.508855        2.584690        2.694444        2.802342        2.987361        ...        4.291631        4.358683        4.426593        4.663239        4.900671        5.159597        5.417738        5.585261        5.749214        5.641245        5.534918        5.638817        5.720723        5.995930        6.300986
        1        Aragon        2.288775        2.445159        2.603399        2.639032        2.677092        2.881462        3.099543        3.359183        3.614182        3.680091        3.745287        3.883319        4.016138        4.243645        ...        6.260854        6.372894        6.495501        6.926521        7.358612        7.802770        8.242645        8.458298        8.668238        8.466866        8.256927        8.573979        8.846758        9.096687        9.518709
        2        Baleares (Islas)        3.143959        3.347758        3.549629        3.642673        3.734862        4.058841        4.360254        4.646173        4.911525        5.050700        5.184662        5.466795        5.737646        6.161454        ...        8.925307        9.275921        9.652242        10.257783        10.823336        11.120395        11.408169        11.512425        11.679520        11.319623        10.969723        11.419594        11.773779        11.926592        12.350043
        3        Basque Country (Pais Vasco)        3.853185        3.945658        4.033562        4.023422        4.013782        4.285918        4.574336        4.898957        5.197015        5.338903        5.465153        5.545916        5.614896        5.852185        ...        6.595330        6.761497        6.937161        7.332191        7.742788        8.120537        8.509711        8.776778        9.025279        8.873893        8.718224        9.018138        9.440874        9.686518        10.170666
        4        Canarias        1.914382        2.071837        2.226078        2.220866        2.213439        2.357684        2.445730        2.648243        2.844759        2.951157        3.054199        3.231791        3.403385        3.660312        ...        5.719866        5.801342        5.885604        6.256784        6.612682        6.977007        7.337903        7.345044        7.347187        7.220080        7.092188        7.410740        7.616395        7.817052        8.060554
        5        Cantabria        2.559412        2.693873        2.820337        2.879035        2.943730        3.137032        3.327621        3.555341        3.771423        3.839403        3.906098        4.032133        4.155955        4.375893        ...        5.941660        6.028849        6.138318        6.420451        6.713225        7.023422        7.333619        7.450729        7.596401        7.462154        7.327906        7.550700        7.777064        7.907741        8.226935
        6        Castilla Y Leon        1.729149        1.838332        1.947658        1.971365        1.995144        2.138817        2.239503        2.454227        2.672237        2.777778        2.882176        2.988075        3.094544        3.302271        ...        4.970722        5.113468        5.261997        5.647315        6.044630        6.354399        6.674022        6.870323        7.063196        7.045487        7.027635        7.074264        7.282919        7.611397        7.888460
        7        Castilla-La Mancha        1.327764        1.415096        1.503570        1.531420        1.559340        1.667524        1.752428        1.920451        2.091902        2.182591        2.274707        2.378392        2.482362        2.709083        ...        4.424664        4.550057        4.677664        4.980648        5.295559        5.677878        6.065339        6.279420        6.474507        6.330691        6.188589        6.230934        6.328763        6.614396        6.865396
        8        Cataluna        3.546630        3.690446        3.826835        3.875678        3.921737        4.241788        4.575335        4.838046        5.081334        5.158098        5.223651        5.332477        5.429449        5.674379        ...        7.397886        7.484290        7.569980        8.077692        8.583976        9.057412        9.525850        9.785062        10.050700        9.837903        9.625107        10.006427        10.339903        10.576264        11.045416
        9        Comunidad Valenciana        2.575978        2.738503        2.899886        2.963510        3.026207        3.219294        3.362468        3.569980        3.765210        3.823693        3.874179        3.978149        4.073408        4.279777        ...        6.139817        6.236861        6.336118        6.739360        7.144387        7.560697        7.969152        8.138389        8.306198        8.080548        7.857041        8.068409        8.289061        8.429734        8.725364
        10        Extremadura        1.243430        1.332548        1.422451        1.440231        1.458083        1.535847        1.596258        1.705584        1.817695        1.882819        1.948872        2.032633        2.117609        2.245501        ...        3.648957        3.737146        3.828478        4.161739        4.498572        4.769494        5.051414        5.234076        5.398315        5.365467        5.332619        5.439874        5.501357        5.905813        6.224579
        11        Galicia        1.634676        1.725578        1.816481        1.840903        1.865396        1.983290        2.005784        2.185661        2.366395        2.458797        2.549700        2.669666        2.787846        2.978363        ...        4.806341        4.899314        4.997786        5.277921        5.565767        5.909526        6.254499        6.453228        6.641603        6.544487        6.447515        6.556484        6.688660        6.862468        7.138532
        12        Madrid (Comunidad De)        4.594473        4.786632        4.963439        4.906170        4.846401        5.161097        5.632605        5.840831        6.024493        6.099329        6.152028        6.110469        6.057341        6.253142        ...        7.558555        7.699228        7.839189        8.347615        8.849615        9.254499        9.657955        9.806484        9.963582        9.840046        9.718652        9.882177        10.098543        10.322765        10.732648
        13        Murcia (Region de)        1.679520        1.764282        1.850328        1.887389        1.924093        2.118609        2.305484        2.521422        2.739074        2.851257        2.965938        3.099186        3.227292        3.461154        ...        4.906884        5.031991        5.154599        5.471508        5.788560        6.124893        6.450443        6.578192        6.710368        6.662882        6.616324        6.784847        6.885818        7.045416        7.295058
        14        Navarra (Comunidad Foral De)        2.555127        2.698158        2.839831        2.881891        2.930877        3.163525        3.335904        3.623393        3.894816        3.985147        4.072979        4.210011        4.352399        4.556984        ...        6.544702        6.797201        7.047772        7.449300        7.879178        8.349758        8.803913        9.197372        9.591545        9.345187        9.117395        9.365895        9.758640        10.060697        10.522708
        15        Principado De Asturias        2.502928        2.615538        2.725793        2.751857        2.777421        2.967295        3.143887        3.373536        3.597258        3.672594        3.743359        3.909383        4.073122        4.308626        ...        5.769066        5.887318        6.011283        6.234790        6.465296        6.688160        6.913525        6.983148        7.040631        6.922808        6.798343        6.954156        7.116467        7.217224        7.475721
        16        Rioja (La)        2.390460        2.535204        2.680020        2.726435        2.772851        2.969866        3.153171        3.404384        3.669238        3.803985        3.921808        4.032705        4.160311        4.373036        ...        6.502142        6.626893        6.775564        7.165096        7.580691        8.002713        8.453299        8.858897        9.229506        9.180948        9.132391        9.498000        9.752213        10.056413        10.476292
17 rows × 44 columns
In [ ]:

In [179]:

import numpy as np
states = list(np.unique(dfProp99['regionname']))
years = np.delete(allColumns, [0])
caStateKey = 'Basque Country (Pais Vasco)'
states.remove(caStateKey)
otherStates = states
yearStart = 1955
yearTrainEnd = 1975
yearTestEnd = 2097
p = 1.0

In [ ]:

trainingYears = []
for i in range(yearStart, yearTrainEnd, 1):
trainingYears.append(str(i))
testYears = []
for i in range(yearTrainEnd, yearTestEnd, 1):
testYears.append(str(i))
trainDataMasterDict = {}
trainDataDict = {}
testDataDict = {}
for key in otherStates:
series = dfProp99.loc[dfProp99['regionname'] == key]
trainDataMasterDict.update({key: pd.Series[trainingYears].values[0]})
(trainData, pObservation) = tsUtils.randomlyHideValues(copy.deepcopy(trainDataMasterDict[key]), p)
trainDataDict.update({key: trainData})
testDataDict.update({key: series[testYears].values[0]})
series = dfProp99[dfProp99['regionname'] == caStateKey]
trainDataMasterDict.update({caStateKey: series[trainingYears].values[0]})
trainDataDict.update({caStateKey: series[trainingYears].values[0]})
testDataDict.update({caStateKey: series[testYears].values[0]})
trainMasterDF = pd.DataFrame(data=trainDataMasterDict)
trainDF = pd.DataFrame(data=trainDataDict)
testDF = pd.DataFrame(data=testDataDict)

In [ ]:

trainDF.head()

Discuss the results. The interpretation of the coefficients is:
        β0β0: Average per capita gdp (y) for the control group during the sample.
        β1β1: Average change in per capita gdp (y) from the first to the second time period that is common to both groups
        β2β2: Average difference in per capita gdp (y) between the two groups that is common in both time periods
        β3β3: Average differential change in per capita gdp (y) from the first to the second time period of the treatment group relative to the control group
Our interest is β3β3. For you visual learners, the following image will give you some intuition as to how to interpret the results (and why this is called difference-in-differences):
Exercise 3: Synthetic Control [30 points]¶
The "synthetic control" (aka synthetic difference-in-differneces) method allows for estimation in settings where a single unit (a state, country, firm, etc.) is exposed to an event or intervention but it's not obvious who or what the control group should be. Specifically, synthetic control provides a data-driven procedure to construct a control group using a convex combination of comparison units. The idea is that this generated (ie, synthetic) control group approximates the characteristics of the unit of interest prior to treatment. The thought (hope) is that a combination of comparison units provides a better comparison for the unit exposed to the intervention than any single comparison unit.
In our example above, we identfied a control variable by generating a metric to assess pre-period trends. In many (most?) cases, we will have too much data to identify a control group by "eyeballing" the data and we'll likely not be able to satisfy the parallel trend pre-period assumption. What if instead we could create a control group via a convex combination of potential candidates? That's the idea behind "synthetic controls" -- we create a control group.
Synthetic control is a technique which is very similar to DiD in estimating the true impact of a treatment. Both the methods use the help of control groups to construct a counter-factual of the treated group giving us an idea of what the trend is if the treatment had not happened. The counter-factual GDP of the treated group would be predicted by the GDP of the control groups and also other possible covariates in the control group.
Just as the DiD approach used the control to construct a counter-factual of the treated group giving us an idea of what the trend is if the treatment had not happened, "synthetic control" predicts the counter-factual by assigning weights to the regressors in the control groups identify individual regressors and their influence in prediction. Ultimately, the true causal impact is the difference in GDP between actual GDP and the counter-factual GDP if the treatment had not happened which is the same idea as DiD.
As always, let's look at the raw data before proceeding.
Part A: Visualization¶
Create a figure of per capita GDP over time with different lines for each region. Put a dashed vertical line at 1975 and indicate pre and post periods.
In [ ]:

(U, s, Vh) = np.linalg.svd((trainDF) - np.mean(trainDF))
s2 = np.power(s, 2)
spectrum = np.cumsum(s2)/np.sum(s2)
plt.plot(spectrum)
plt.grid()
plt.title("Per gdp")
plt.figure()
plt.plot(s2)
plt.grid()
plt.xlabel("Ordered Singular Values")
plt.ylabel("Energy")
plt.title("Singular Value Spectrum")

Idea: What we'll be doing is picking the convex combination of regions to best match the evolution of per capita GDP prior to the ETA bombings (i.e., before 1975). We'll then hold these weights fixed and generate per capita GDP for the synthetic post-treatment control group. That's it. Identifying the causal effect then follows our DiD regression where we use the synthetic control rather than a specific control. A nice feature of this approach is that we can be hands-off in selection of our control group. Plus, construction of our synthetical control is not based on the treatment period so there's minimal risk of inadvertantly baking-in our results.
Part B: Construct a Synthetic Control Group¶
Use Lasso to regress per capita GDP of the Basque country (i.e., this is the y) on per capita GDP of the other regions for the years prior to (and including 1975). Be sure to use cross-validation (leave-one-out) to choose the best penalty parameter. Since there are 17 other regions, there are potentially 17 variables and one constant in the regression. You can use the pivot pandas method to reshape the main data frame. I say "potentially" b/c you should exclude regions which violate criterion two.
In [ ]:

singvals = 4
rscModel = RobustSyntheticControl(caStateKey, singvals, len(trainDF), probObservation=1.0, modelType='svd', svdMethod='numpy', otherSeriesKeysArray=otherStates)
rscModel.fit(trainDF)
denoisedDF = rscModel.model.denoisedDF()

Use the model to generate (predict) a control group for all the years.
In [15]:

predictions = []
predictions = np.dot(testDF[otherStates], rscModel.model.weights)
actual = dfProp99.loc[dfProp99['LocationDesc'] == caStateKey]
actual = actual.drop('LocationDesc', axis=1)
actual = actual.iloc[0]
model_fit = np.dot(trainDF[otherStates][:], rscModel.model.weights)

Graph the per capita GDP series for the Basque country and your synthetic control group. Put a dashed vertical line at 1975 and indicate pre and post periods.
In [ ]:

fig, ax = plt.subplots(1,1)
tick_spacing = 5
label_markings = np.insert(years[::tick_spacing], 0, 'dummy')
ax.set_xticks(np.arange(len(label_markings)))
ax.set_xticklabels(label_markings, rotation=45)
ax.xaxis.set_major_locator(ticker.MultipleLocator(tick_spacing))
plt.plot(years, actual ,label='actual')
plt.xlabel('Year')
plt.ylabel('Per capita cigarette consumption')
plt.plot(trainingYears, model_fit, label='fitted model')
plt.plot(testYears, predictions, label='counterfactual')
plt.title(caStateKey+', Singular Values used: '+str(singvals))
xposition = pd.to_datetime(yearTrainEnd, errors='coerce')
plt.axvline(x=str(yearTrainEnd), color='k', linestyle='--', linewidth=4)
plt.grid()
plt.legend()

Part C: Synthetic Difference-in-Differences¶
Run the following synthetic difference-in-differences regression:
Per Capita GDPit=β0+β1Pt+β2Ti+β3Pt×Ti+ϵtPer Capita GDPit=β0+β1Pt+β2Ti+β3Pt×Ti+ϵtwhere PtPt is the "period" indicator equal to one for all years after 1975 and zero otherwise; TiTi is the "treatment" group indicator equal to one when the region is the Basque region; and ii is the region. Note the regression only uses observations from the treatment (Basque region) and synthetic control group.
In [ ]:

Discuss the results. What effect did the ETA bombings have on per capita GDP in the Basque region?
In [ ]:

Final/finalproblemset-gexy0jl5.ipynb
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Final Problem Set: Due Tuesday, May 10 at 5:00 pm EDT\n",
"\n",
"**Important:** You need `basque.RData` to complete this set. All are located on eLC.\n",
"\n",
"Answer the questions below. This notebook will be your workspace so add cells as you please. When you are finished, you will submit two objects to eLC:\n",
"\n",
"1. **Written pdf Report.** This should be short. You should include:\n",
" * **The research question.**\n",
" \n",
" * **The figure from your Exploratory Data Analysis along with discussion.** Be sure to note the difference in per capita GDP pre and post 1975.\n",
" \n",
" * **Difference-in-differences result.** Be sure to include the figure comparing the Basque country and the control region. Discuss how the control group satisfies our criteria. Be sure to discuss identification (see \"interpretation\" of DiD below) and results relative to the research question. Ideally, you should work your estimation result into the figure so the reader can quickly observe the \"answer\" to the research question. \n",
" \n",
" * **Synthetic Control result.** Be sure to include the figure comparing the Basque country and the synthetic control region. Be sure to discuss identification (see \"interpretation\" of DiD below) and results relative to the research question. Ideally, you should work your estimation result into the figure so the reader can quickly observe the \"answer\" to the research question.\n",
"\n",
" Your analysis should be professional: i.e., well-written, clear, and concise. Figures should be incorporated in your analysis. For example, it would be useful to set the title of each figure in this notebook (eg, \"Figure 1: Per Capita GDP in the Basque Region\") so in the final report you can reference the figure in the appopriate commentary. Save your report as a pdf (\"File/Save As Adobe PDF\") with the naming convention **'Final_Report_[insert last name]'**. For example, ''Final_Report_Thurk.pdf'. \n",
"\n",
"\n",
"2. **Jupyter Notebook.** Print the notebook as a pdf. [To print: From the file menu, choose 'print preview'. A new tab will open with the notebook presented as html. Print as a pdf.] Save your pdf notebook with the naming conention **'Final_Workbook_[insert last name]'**. For example, 'Final_Workbook_Thurk.pdf'. Think of the notebook as your opportunity to show your work.\n",
"\n",
"**Grading:** The problem set is worth **90 points** and partial credit is indicated for each exercise. I will grade your report (1) and use your notebook (2) to assign partial credit in the event there are errors.\n",
"\n",
"\n",
"* You may use your notes, books, and the internet.\n",
"* Do not consult with other people. This work should be entirely your own. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Exercise 1: Exploratory Data Analysis [30 points]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We're going to estimate the economic impact of terrorism in the Basque region -- an autonomous community in Spain. To do this, we'll use information from 17 other Spanish regions where our underlying assumption will be that terrorism affected economic activity in the Basque region but not elsewhere whereas other economic shocks are aggregate and affect all of Spain.\n",
"\n",
"Here are the details:\n",
"\n",
"* Coverage from 1955–1997 for 18 Spanish regions. One of the data \"regions\" is all of Spain which we won't use.\n",
"* The treatment region is “Basque Country (Pais Vasco)”.\n",
"* The \"treatment\" year is 1975 since there were several bombings around that year.\n",
"* We will measure \"economic impact\" via GDP per capita (in thousands).\n",
"\n",
"__Background:__ Euskadi Ta Askatasuna (ETA), was an armed leftist Basque nationalist and separatist organization in the Basque Country (in northern Spain and southwestern France). The group was founded in 1959 and later evolved from a group promoting traditional Basque culture to a paramilitary group engaged in a violent campaign of bombing, assassinations and kidnappings in the Southern Basque Country and throughout Spanish territory. Its goal was gaining independence for the Basque Country. ETA was the main group within the Basque National Liberation Movement and was the most important Basque participant in the Basque conflict. While its terrorist activities spanned several decades, the death of Spanish dictator Francisco Franco in 1975 led to a substantial increase in bombings. (Wikipedia)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Part A: Load Data\n",
"\n",
"Thus far we've learned how to load data from csv, excel, and stata. Another popular programming language (and to some degree a rival of python) is the open-source language R. The data we want to access is from the following academic paper \n",
"\n",
"> Abadie, A. and Gardeazabal, J. (2003) \"Economic Costs of Conflict: A Case Study of the Basque Country.\" American Economic Review 93 (1) 113--132.\n",
"\n",
"where the data was saved in the R programming language. We'll do this by leveraging python's own open-source nature and load a user-created package called `pyreadr` which we install as usual in the terminal:\n",
"\n",
"`pip install pyreadr`\n",
"\n",
"We then use `pyreadr` to load `basque.RData` access the data as follows:\n",
"\n",
"```python\n",
"result = pyreadr.read_r('basque.RData')\n",
"data = result['basque'] # extract the pandas dataframe\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"
regionnoregionnameyeargdpcapsec.agriculturesec.energysec.industrysec.constructionsec.services.ventasec.services.nonventaschool.illitschool.primschool.medschool.highschool.post.highpopdensinvest
rownames
11.0Spain (Espana)1955.02.354542NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
21.0Spain (Espana)1956.02.480149NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
31.0Spain (Espana)1957.02.603613NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
41.0Spain (Espana)1958.02.637104NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
51.0Spain (Espana)1959.02.669880NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
......................................................
77018.0Rioja (La)1993.09.132391NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN16.765787
77118.0Rioja (La)1994.09.498000NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN16.469452
77218.0Rioja (La)1995.09.752213NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN20.275650
77318.0Rioja (La)1996.010.056413NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
77418.0Rioja (La)1997.010.476292NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
\n",
"

774 rows × 17 columns

\n",
"
"
],
"text/plain": [
" regionno regionname year gdpcap sec.agriculture \\\n",
"rownames \n",
"1 1.0 Spain (Espana) 1955.0 2.354542 NaN \n",
"2 1.0 Spain (Espana) 1956.0 2.480149 NaN \n",
"3 1.0 Spain (Espana) 1957.0 2.603613 NaN \n",
"4 1.0 Spain (Espana) 1958.0 2.637104 NaN \n",
"5 1.0 Spain (Espana) 1959.0 2.669880 NaN \n",
"... ... ... ... ... ... \n",
"770 18.0 Rioja (La) 1993.0 9.132391 NaN \n",
"771 18.0 Rioja (La) 1994.0 9.498000 NaN \n",
"772 18.0 Rioja (La) 1995.0 9.752213 NaN \n",
"773 18.0 Rioja (La) 1996.0 10.056413 NaN \n",
"774 18.0 Rioja (La) 1997.0 10.476292 NaN \n",
"\n",
" sec.energy sec.industry sec.construction sec.services.venta \\\n",
"rownames \n",
"1 NaN NaN NaN NaN \n",
"2 NaN NaN NaN NaN \n",
"3 NaN NaN NaN NaN \n",
"4 NaN NaN NaN NaN \n",
"5 NaN NaN NaN NaN \n",
"... ... ... ... ... \n",
"770 NaN NaN NaN NaN \n",
"771 NaN NaN NaN NaN \n",
"772 NaN NaN NaN NaN \n",
"773 NaN NaN NaN NaN \n",
"774 NaN NaN NaN NaN \n",
"\n",
" sec.services.nonventa school.illit school.prim school.med \\\n",
"rownames \n",
"1 NaN NaN NaN NaN \n",
"2 NaN NaN NaN NaN \n",
"3 NaN NaN NaN NaN \n",
"4 NaN NaN NaN NaN \n",
"5 NaN NaN NaN NaN \n",
"... ... ... ... ... \n",
"770 NaN NaN NaN NaN \n",
"771 NaN NaN NaN NaN \n",
"772 NaN NaN NaN NaN \n",
"773 NaN NaN NaN NaN \n",
"774 NaN NaN NaN NaN \n",
"\n",
" school.high school.post.high popdens invest \n",
"rownames \n",
"1 NaN NaN NaN NaN \n",
"2 NaN NaN NaN NaN \n",
"3 NaN NaN NaN NaN \n",
"4 NaN NaN NaN NaN \n",
"5 NaN NaN NaN NaN \n",
"... ... ... ... ... \n",
"770 NaN NaN NaN 16.765787 \n",
"771 NaN NaN NaN 16.469452 \n",
"772 NaN NaN NaN 20.275650 \n",
"773 NaN NaN NaN NaN \n",
"774 NaN NaN NaN NaN \n",
"\n",
"[774 rows x 17 columns]"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pyreadr, pandas as pd\n",
"pd.set_option('display.max_columns', 20)\n",
"result = pyreadr.read_r('\\\\basque-pehqaa4x.RData')\n",
"result = [pd.DataFrame(i) for i in result.values()][0]\n",
"result"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Part B: Drop Spain\n",
"\n",
"Drop the region \"Spain (Espana)\"."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"result = result[result['regionname'] != 'Spain (Espana)'] # Drop Spain"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Part C: Plot GDP Per Capita Over Time\n",
"\n",
"Before proceeding into the econometrics, it's useful to graph the data variation we're most interested in. We're interested in evaluating the effect of terrororism on per capita GDP where we're using the bombings in 1975 as a \"natural experiment.\" Plot per capita (ie, `gdpcap`) in the Basque region across time. Put a vertical dashed line at 1975 and indicate Pre/Post periods."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Text(0, 0.5, 'real per-capita GDP (1986 USD, thousand)')"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"
"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"import matplotlib.pyplot as plt\n",
"time = result['year']\n",
"position = result['gdpcap']\n",
"\n",
"plt.plot(time, position)\n",
"plt.xlabel('year')\n",
"plt.ylabel('real per-capita GDP (1986 USD, thousand)')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We observe that per capita GDP decreases after the 1975 increase in ETA terrorist bombings and then increases. __How much of that decrease is due to terrorism and how much is due to general economic uncertainty from Franco's death?__"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Exercise 2: Difference-in-Differences [30 points]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Part A: Comparing Before and After\n",
"\n",
"We'll begin by comparing the per capit GDP before and after the 1975 treament year. Solve for average per capita GDP in the Basque region before (and including) 1975 to after 1975 (i.e., two numbers)."
]
},
{
"cell_type": "raw",
"metadata": {},
"source": []
},
{
"cell_type": "code",
"execution_count": 126,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
":2: SettingWithCopyWarning: \n",
"A value is trying to be set on a copy of a slice from a DataFrame.\n",
"Try using .loc[row_indexer,col_indexer] = value instead\n",
"\n",
"See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n",
" compare['year'] = compare['year'].astype(int)\n"
]
}
],
"source": [
"year = compare['year']\n",
"compare['year'] = compare['year'].astype(int)"
]
},
{
"cell_type": "code",
"execution_count": 125,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
""
]
},
"execution_count": 125,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"
"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"compare.plot(x=\"year\", y=\"regionname\", kind=\"scatter\")"
]
},
{
"cell_type": "code",
"execution_count": 124,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"
regionnameyeargdpcap
rownames
689Basque Country (Pais Vasco)1955.03.853185
690Basque Country (Pais Vasco)1956.03.945658
691Basque Country (Pais Vasco)1957.04.033562
692Basque Country (Pais Vasco)1958.04.023422
693Basque Country (Pais Vasco)1959.04.013782
694Basque Country (Pais Vasco)1960.04.285918
695Basque Country (Pais Vasco)1961.04.574336
696Basque Country (Pais Vasco)1962.04.898957
697Basque Country (Pais Vasco)1963.05.197015
698Basque Country (Pais Vasco)1964.05.338903
699Basque Country (Pais Vasco)1965.05.465153
700Basque Country (Pais Vasco)1966.05.545916
701Basque Country (Pais Vasco)1967.05.614896
702Basque Country (Pais Vasco)1968.05.852185
703Basque Country (Pais Vasco)1969.06.081405
704Basque Country (Pais Vasco)1970.06.170094
705Basque Country (Pais Vasco)1971.06.283633
706Basque Country (Pais Vasco)1972.06.555555
707Basque Country (Pais Vasco)1973.06.810769
708Basque Country (Pais Vasco)1974.07.105184
709Basque Country (Pais Vasco)1975.07.377892
710Basque Country (Pais Vasco)1976.07.232934
711Basque Country (Pais Vasco)1977.07.089831
712Basque Country (Pais Vasco)1978.06.786704
713Basque Country (Pais Vasco)1979.06.639817
714Basque Country (Pais Vasco)1980.06.562839
715Basque Country (Pais Vasco)1981.06.500785
716Basque Country (Pais Vasco)1982.06.545059
717Basque Country (Pais Vasco)1983.06.595330
718Basque Country (Pais Vasco)1984.06.761497
719Basque Country (Pais Vasco)1985.06.937161
720Basque Country (Pais Vasco)1986.07.332191
721Basque Country (Pais Vasco)1987.07.742788
722Basque Country (Pais Vasco)1988.08.120537
723Basque Country (Pais Vasco)1989.08.509711
724Basque Country (Pais Vasco)1990.08.776778
725Basque Country (Pais Vasco)1991.09.025279
726Basque Country (Pais Vasco)1992.08.873893
727Basque Country (Pais Vasco)1993.08.718224
728Basque Country (Pais Vasco)1994.09.018138
729Basque Country (Pais Vasco)1995.09.440874
730Basque Country (Pais Vasco)1996.09.686518
731Basque Country (Pais Vasco)1997.010.170666
\n",
"
"
],
"text/plain": [
" regionname year gdpcap\n",
"rownames \n",
"689 Basque Country (Pais Vasco) 1955.0 3.853185\n",
"690 Basque Country (Pais Vasco) 1956.0 3.945658\n",
"691 Basque Country (Pais Vasco) 1957.0 4.033562\n",
"692 Basque Country (Pais Vasco) 1958.0 4.023422\n",
"693 Basque Country (Pais Vasco) 1959.0 4.013782\n",
"694 Basque Country (Pais Vasco) 1960.0 4.285918\n",
"695 Basque Country (Pais Vasco) 1961.0 4.574336\n",
"696 Basque Country (Pais Vasco) 1962.0 4.898957\n",
"697 Basque Country (Pais Vasco) 1963.0 5.197015\n",
"698 Basque Country (Pais Vasco) 1964.0 5.338903\n",
"699 Basque Country (Pais Vasco) 1965.0 5.465153\n",
"700 Basque Country (Pais Vasco) 1966.0 5.545916\n",
"701 Basque Country (Pais Vasco) 1967.0 5.614896\n",
"702 Basque Country (Pais Vasco) 1968.0 5.852185\n",
"703 Basque Country (Pais Vasco) 1969.0 6.081405\n",
"704 Basque Country (Pais Vasco) 1970.0 6.170094\n",
"705 Basque Country (Pais Vasco) 1971.0 6.283633\n",
"706 Basque Country (Pais Vasco) 1972.0 6.555555\n",
"707 Basque Country (Pais Vasco) 1973.0 6.810769\n",
"708 Basque Country (Pais Vasco) 1974.0 7.105184\n",
"709 Basque Country (Pais Vasco) 1975.0 7.377892\n",
"710 Basque Country (Pais Vasco) 1976.0 7.232934\n",
"711 Basque Country (Pais Vasco) 1977.0 7.089831\n",
"712 Basque Country (Pais Vasco) 1978.0 6.786704\n",
"713 Basque Country (Pais Vasco) 1979.0 6.639817\n",
"714 Basque Country (Pais Vasco) 1980.0 6.562839\n",
"715 Basque Country (Pais Vasco) 1981.0 6.500785\n",
"716 Basque Country (Pais Vasco) 1982.0 6.545059\n",
"717 Basque Country (Pais Vasco) 1983.0 6.595330\n",
"718 Basque Country (Pais Vasco) 1984.0 6.761497\n",
"719 Basque Country (Pais Vasco) 1985.0 6.937161\n",
"720 Basque Country (Pais Vasco) 1986.0 7.332191\n",
"721 Basque Country (Pais Vasco) 1987.0 7.742788\n",
"722 Basque Country (Pais Vasco) 1988.0 8.120537\n",
"723 Basque Country (Pais Vasco) 1989.0 8.509711\n",
"724 Basque Country (Pais Vasco) 1990.0 8.776778\n",
"725 Basque Country (Pais Vasco) 1991.0 9.025279\n",
"726 Basque Country (Pais Vasco) 1992.0 8.873893\n",
"727 Basque Country (Pais Vasco) 1993.0 8.718224\n",
"728 Basque Country (Pais Vasco) 1994.0 9.018138\n",
"729 Basque Country (Pais Vasco) 1995.0 9.440874\n",
"730 Basque Country (Pais Vasco) 1996.0 9.686518\n",
"731 Basque Country (Pais Vasco) 1997.0 10.170666"
]
},
"execution_count": 124,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"compare[compare['regionname'] == 'Basque Country (Pais Vasco)']\n"
]
},
{
"cell_type": "code",
"execution_count": 129,
"metadata": {},
"outputs": [
{
"data": {
"image/png":...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here