PROBLEM SET 4 - Empirical Asset Pricing Data Overview The data �CRSP_data.csv� contains data on monthly returns for all publicly traded stocks in the United States for the period January 2010 �...

1 answer below »
please get me a quote


PROBLEM SET 4 - Empirical Asset Pricing Data Overview The data �CRSP_data.csv� contains data on monthly returns for all publicly traded stocks in the United States for the period January 2010 � December 2020. It is taken from the Center for Research in Security Prices (CRSP) monthly stock �le. CRSP is the standard source for stock return information in academic �nance. It can be accessed through the Wharton Research Data Services (WRDS) online portal, which you should have access to as JHU students. There are 10 variables in the data. The variables �cusip� �permno� �permco� and �ticker� are all variables that identify the stock. They mostly accomplish the same thing, but we will use the �permco� variable because it handles mergers and acquisitions appropriately. The variable �connam� is the company name. The variable �ret� is the return for that month, reported in decimal. The variable �prc� is the stock price, and �shrout� is the total number of shares outstanding (in 1,000s). Finally, �sprtrn� is the return on the S&P 500 index. Part A - Basic Data Cleaning Before doing our empirical work, let's �rst clean the data a little. 1. The variable �ret� is the monthly return (in decimal) for each stock. However, CRSP denotes certain returns as �B� and �C� when the data is invalid for various reasons. So the �rst step is to drop those observations, and convert the variable �ret� to a nu- meric variable if it isn't already. 1 2. Next, summarize the return data. Note there are some massive outliers in the upper end of the distribution. Let's keep only observations where the monthly return is <= 100%="" (i.e.="" ret=""><= 1.00). 3. from the �date� variable, create variables for year, month, and day for each date. 4. in order to have more easily interpretable coe�cients, multiple ret by 100 so a value of 5 means 5% (not 500%) 5. now, let's only keep su�ciently large stocks (this is not necessary, but reduces data burden and data errors). generate a variable called mkt_cap equal to shrout x prc x 1,000. next, in for each date, keep only stocks in the top 1,000 by market capitaliza- tion. 6. some stocks have multiple values for a given year and month. keep only one obser- vation for each permco in each year and month (which one you keep is not important for this problem set) 7. we want to have some level of reliability in our estimates of factor betas. to do so, we should impose a minimum number of monthly observations. this is more art than science. let's keep observations (at the permco level) with at least two years of data � that is, drop any stocks with fewer than 24 monthly return observations in the data. 8. finally, some permco values have more than 132 months of data (corresponding to the 11 years of our sample), this has to do with di�erent share classes. let's not bother with them. just drop any stock with more than 132 observations in the sam- ple. by my calculation, these cleaning steps leave us with a total of 115,953 observations. 2 part b - estimating betas 1. load the fama-french 5-factor data from the �le posted on blackboard 2. create year, month, and day variables for the ff data 3. question: what are the means and variances of the ff factors? 4. now, merge the ff data with the stock return data we created in part a (a) to be clear, in my data i have 1,270 unique �rms (permco) with a total of 115,953 5. now, for each stock, regress the stock return on the market factor (mktrf ), and store the market beta from this regression (β̂mkt,i) and the alpha αi (a) plot a histogram of market beta and alpha (b) what is the average market beta? what is the average alpha? 6. next, regress returns on the 5 fama-french factors, and store the estimated betas (call them β̂mkt,i, β̂smb,i, β̂hml,i, β̂cma,i, β̂rmw,i, note that cma stands for "conservative minus aggressive" investment, and rmw is "robust minus weak" pro�tability). (a) how does the distribution of ff5 alphas compare to the distribution of capm alphas? what about the (adjusted) r2? (notes: to compare distributions you can plot histograms, or report means/medians/standard deviations; and, most regression packages store the adjusted r2 from a regression). how would you evaluate the success of the two models? part c - fama-macbeth 3 1. now we're going to estimate fama-macbeth regressions using the capm model. first, regress monthly returns on mktrf but only for years 2010-2016 2. now, based on these estimated market betas (only use one observation for each �rm), sort �rms into 20 groups based on market beta. that is, the bottom 5% of betas should be in one group, the next 5% smallest betas in the second group, etc. 3. for each of these groups, calculate the average return within the group for each date (so for each date, you should 20 average returns, one for each group) 4. now, for the sample period 2016-2020, estimate capm beta for each of the 20 groups. that is, for each group, run a regression of average group return on mktrf. this will result in 20 estimates of market beta, one for each group. (tip: once you have calcu- lated average returns for each group, you need to (and should) keep only one obser- vation per group per date). 5. now, we want to see how well these estimates of market beta explain returns. for each date (year and month), run a regression of average group return on group mar- ket beta. that is, for each date t you estimate: r̄g,t = γ0,t + γ1,tβg,mkt + εg,t. a few things to note. there are 60 months, so you are running 60 regressions (one for each month). the subscript g denotes the �beta group�, and t denotes the date. since you have run 60 regressions, you have 60 estimates of γ̂0,t and γ̂1,t 6. if the capm is correct, what should be the average value of γ̂0,t and γ̂1,t across the 60 regressions? how do your estimates compare to the theory? 4 1.00).="" 3.="" from="" the="" �date�="" variable,="" create="" variables="" for="" year,="" month,="" and="" day="" for="" each="" date.="" 4.="" in="" order="" to="" have="" more="" easily="" interpretable="" coe�cients,="" multiple="" ret="" by="" 100="" so="" a="" value="" of="" 5="" means="" 5%="" (not="" 500%)="" 5.="" now,="" let's="" only="" keep="" su�ciently="" large="" stocks="" (this="" is="" not="" necessary,="" but="" reduces="" data="" burden="" and="" data="" errors).="" generate="" a="" variable="" called="" mkt_cap="" equal="" to="" shrout="" x="" prc="" x="" 1,000.="" next,="" in="" for="" each="" date,="" keep="" only="" stocks="" in="" the="" top="" 1,000="" by="" market="" capitaliza-="" tion.="" 6.="" some="" stocks="" have="" multiple="" values="" for="" a="" given="" year="" and="" month.="" keep="" only="" one="" obser-="" vation="" for="" each="" permco="" in="" each="" year="" and="" month="" (which="" one="" you="" keep="" is="" not="" important="" for="" this="" problem="" set)="" 7.="" we="" want="" to="" have="" some="" level="" of="" reliability="" in="" our="" estimates="" of="" factor="" betas.="" to="" do="" so,="" we="" should="" impose="" a="" minimum="" number="" of="" monthly="" observations.="" this="" is="" more="" art="" than="" science.="" let's="" keep="" observations="" (at="" the="" permco="" level)="" with="" at="" least="" two="" years="" of="" data="" �="" that="" is,="" drop="" any="" stocks="" with="" fewer="" than="" 24="" monthly="" return="" observations="" in="" the="" data.="" 8.="" finally,="" some="" permco="" values="" have="" more="" than="" 132="" months="" of="" data="" (corresponding="" to="" the="" 11="" years="" of="" our="" sample),="" this="" has="" to="" do="" with="" di�erent="" share="" classes.="" let's="" not="" bother="" with="" them.="" just="" drop="" any="" stock="" with="" more="" than="" 132="" observations="" in="" the="" sam-="" ple.="" by="" my="" calculation,="" these="" cleaning="" steps="" leave="" us="" with="" a="" total="" of="" 115,953="" observations.="" 2="" part="" b="" -="" estimating="" betas="" 1.="" load="" the="" fama-french="" 5-factor="" data="" from="" the="" �le="" posted="" on="" blackboard="" 2.="" create="" year,="" month,="" and="" day="" variables="" for="" the="" ff="" data="" 3.="" question:="" what="" are="" the="" means="" and="" variances="" of="" the="" ff="" factors?="" 4.="" now,="" merge="" the="" ff="" data="" with="" the="" stock="" return="" data="" we="" created="" in="" part="" a="" (a)="" to="" be="" clear,="" in="" my="" data="" i="" have="" 1,270="" unique="" �rms="" (permco)="" with="" a="" total="" of="" 115,953="" 5.="" now,="" for="" each="" stock,="" regress="" the="" stock="" return="" on="" the="" market="" factor="" (mktrf="" ),="" and="" store="" the="" market="" beta="" from="" this="" regression="" (β̂mkt,i)="" and="" the="" alpha="" αi="" (a)="" plot="" a="" histogram="" of="" market="" beta="" and="" alpha="" (b)="" what="" is="" the="" average="" market="" beta?="" what="" is="" the="" average="" alpha?="" 6.="" next,="" regress="" returns="" on="" the="" 5="" fama-french="" factors,="" and="" store="" the="" estimated="" betas="" (call="" them="" β̂mkt,i,="" β̂smb,i,="" β̂hml,i,="" β̂cma,i,="" β̂rmw,i,="" note="" that="" cma="" stands="" for="" "conservative="" minus="" aggressive"="" investment,="" and="" rmw="" is="" "robust="" minus="" weak"="" pro�tability).="" (a)="" how="" does="" the="" distribution="" of="" ff5="" alphas="" compare="" to="" the="" distribution="" of="" capm="" alphas?="" what="" about="" the="" (adjusted)="" r2?="" (notes:="" to="" compare="" distributions="" you="" can="" plot="" histograms,="" or="" report="" means/medians/standard="" deviations;="" and,="" most="" regression="" packages="" store="" the="" adjusted="" r2="" from="" a="" regression).="" how="" would="" you="" evaluate="" the="" success="" of="" the="" two="" models?="" part="" c="" -="" fama-macbeth="" 3="" 1.="" now="" we're="" going="" to="" estimate="" fama-macbeth="" regressions="" using="" the="" capm="" model.="" first,="" regress="" monthly="" returns="" on="" mktrf="" but="" only="" for="" years="" 2010-2016="" 2.="" now,="" based="" on="" these="" estimated="" market="" betas="" (only="" use="" one="" observation="" for="" each="" �rm),="" sort="" �rms="" into="" 20="" groups="" based="" on="" market="" beta.="" that="" is,="" the="" bottom="" 5%="" of="" betas="" should="" be="" in="" one="" group,="" the="" next="" 5%="" smallest="" betas="" in="" the="" second="" group,="" etc.="" 3.="" for="" each="" of="" these="" groups,="" calculate="" the="" average="" return="" within="" the="" group="" for="" each="" date="" (so="" for="" each="" date,="" you="" should="" 20="" average="" returns,="" one="" for="" each="" group)="" 4.="" now,="" for="" the="" sample="" period="" 2016-2020,="" estimate="" capm="" beta="" for="" each="" of="" the="" 20="" groups.="" that="" is,="" for="" each="" group,="" run="" a="" regression="" of="" average="" group="" return="" on="" mktrf.="" this="" will="" result="" in="" 20="" estimates="" of="" market="" beta,="" one="" for="" each="" group.="" (tip:="" once="" you="" have="" calcu-="" lated="" average="" returns="" for="" each="" group,="" you="" need="" to="" (and="" should)="" keep="" only="" one="" obser-="" vation="" per="" group="" per="" date).="" 5.="" now,="" we="" want="" to="" see="" how="" well="" these="" estimates="" of="" market="" beta="" explain="" returns.="" for="" each="" date="" (year="" and="" month),="" run="" a="" regression="" of="" average="" group="" return="" on="" group="" mar-="" ket="" beta.="" that="" is,="" for="" each="" date="" t="" you="" estimate:="" r̄g,t="γ0,t" +="" γ1,tβg,mkt="" +="" εg,t.="" a="" few="" things="" to="" note.="" there="" are="" 60="" months,="" so="" you="" are="" running="" 60="" regressions="" (one="" for="" each="" month).="" the="" subscript="" g="" denotes="" the="" �beta="" group�,="" and="" t="" denotes="" the="" date.="" since="" you="" have="" run="" 60="" regressions,="" you="" have="" 60="" estimates="" of="" γ̂0,t="" and="" γ̂1,t="" 6.="" if="" the="" capm="" is="" correct,="" what="" should="" be="" the="" average="" value="" of="" γ̂0,t="" and="" γ̂1,t="" across="" the="" 60="" regressions?="" how="" do="" your="" estimates="" compare="" to="" the="" theory?="">
Answered 1 days AfterNov 18, 2021

Answer To: PROBLEM SET 4 - Empirical Asset Pricing Data Overview The data �CRSP_data.csv� contains data on...

Mohd answered on Nov 20 2021
107 Votes
-
-
-
11/19/2021
library(readr)
library(magrittr)
library(dplyr)
library(ggplot2)
library(rmarkdown)
library(MASS)
library(skimr)
library(ggeffects)
Data Overview The data CRSP_data.csv contains data on monthly returns for all publicly traded stocks in the United States for the period January 2010 December 2020. It is taken from the Center for Research in Security Prices (CRSP) monthly stoc
k le. CRSP is the standard source for stock return information in academic nance. It can be accessed through the Wharton Research Data Services (WRDS) online portal, which you should have access to as JHU students. There are 10 variables in the data. The variables cusip permno permco and ticker are all variables that identify the stock. They mostly accomplish the same thing, but we will use the permco variable because it handles mergers and acquisitions appropriately. The variable connam is the company name. The variable ret is the return for that month, reported in decimal. The variable prc is the stock price, and shrout is the total number of shares outstanding (in 1,000s). Finally, sprtrn is the return on the S&P 500 index.
library(readr)
crsp <- read_csv("data/crsp.csv")
View(crsp)
skim(crsp)
Data summary
    Name
    crsp
    Number of rows
    951786
    Number of columns
    10
    _______________________
    
    Column type frequency:
    
    character
    4
    numeric
    6
    ________________________
    
    Group variables
    None
Variable type: character
    skim_variable
    n_missing
    complete_rate
    min
    max
    empty
    n_unique
    whitespace
    TICKER
    12225
    0.99
    1
    5
    0
    13415
    0
    COMNAM
    5876
    0.99
    3
    32
    0
    11463
    0
    CUSIP
    0
    1.00
    8
    8
    0
    12848
    0
    RET
    11057
    0.99
    1
    9
    0
    321891
    0
Variable type: numeric
    skim_variable
    n_missing
    complete_rate
    mean
    sd
    p0
    p25
    p50
    p75
    p100
    hist
    PERMNO
    0
    1.00
    58902.72
    34128.13
    10001.00
    16137.00
    78756.00
    89911.00
    93436.00
    ▅▁▁▁▇
    date
    0
    1.00
    20152151.71
    31648.51
    20100129.00
    20121130.00
    20150930.00
    20180629.00
    20201231.00
    ▇▇▇▇▇
    PERMCO
    0
    1.00
    37418.64
    17696.02
    5.00
    20891.00
    44072.00
    53271.00
    57668.00
    ▂▃▁▃▇
    PRC
    17406
    0.98
    61.84
    2680.90
    -588.60
    8.12
    19.47
    39.35
    347815.00
    ▇▁▁▁▁
    SHROUT
    6520
    0.99
    98130.83
    369490.21
    2.00
    8246.00
    27200.00
    71969.00
    29206400.00
    ▇▁▁▁▁
    sprtrn
    0
    1.00
    0.01
    0.04
    -0.13
    -0.01
    0.02
    0.03
    0.13
    ▁▂▇▅▁
Part A - Basic Data Cleaning Before doing our empirical work, let’s rst clean the data a little. 1. The variable ret is the monthly return (in decimal) for each stock. However, CRSP denotes certain returns as B and C when the data is invalid for various reasons. So the rst step is to drop those observations, and convert the variable ret to a numeric variable if it isn’t already. 1
crsp%>%
filter(RET==c("B","C"))%>%
count(RET)
## # A tibble: 2 x 2
## RET n
##
## 1 B 3176
## 2 C 3194
crsp_1<-crsp%>%
filter(RET!=c("B","C"))
class(crsp_1$RET)
## [1] "character"
crsp_1$RET<-as.numeric(crsp_1$RET)
class(crsp_1$RET)
## [1] "numeric"
1. Next, summarize the return data. Note there are some massive outliers in the upper end of the distribution. Let’s keep only observations where the monthly return is <= 100% (i.e. ret <= 1.00).
summary(crsp_1$RET)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## -0.994 -0.042 0.006 0.009 0.052 19.884 6332
crsp_2<-crsp_1%>%
filter(RET<=1)
summary(crsp_2$RET)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.993600 -0.042282 0.005571 0.006802 0.051241 1.000000
1. From the date variable, create variables for year, month, and day for each date.
crsp_2$date<-as.character(crsp_2$date)
crsp_2$Year<-substring(crsp_2$date,1,4)
crsp_2$month<-substring(crsp_2$date,5,6)
crsp_2$day<-substring(crsp_2$date,7,8)
crsp_2$RET<-100*crsp_2$RET
1. In order to have more easily interpretable coecients, multiple ret by 100 so a value of 5 means 5% (not...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here