Problem Set 1 Problem Set 1 Statistics 100 Due June 29, 2020 at 11:59 pm Problem set policies. Please provide concise, clear answers for each question. Note that only writing the result of a...

1 answer below »
Hello, I want a solution for two problems in statistics with R programming language. I will share with you (attached) the rd file I have my work for problems 1,2,3,6 and I want to compute problems 4 and 5. I also share libraries and questions.


Problem Set 1 Problem Set 1 Statistics 100 Due June 29, 2020 at 11:59 pm Problem set policies. Please provide concise, clear answers for each question. Note that only writing the result of a calculation (e.g., "SD = 3.3") without explanation is not sufficient. For problems involving R, be sure to include the code in your solution. Please submit your problem set via Canvas as a PDF, along with the R Markdown source file. We encourage you to discuss problems with other students (and, of course, with the teaching team), but you must write your final answer in your own words. Solutions prepared "in committee" are not acceptable. If you do collaborate with classmates on a problem, please list your collaborators on your solution. Problem 1. For each of the following scenarios, discuss (in at most five sentences) the main issue(s) with respect to sampling or reporting bias. a) A particular city has 14 architects who own their own firm. To select a survey sample, each architect was contacted via telephone by order of appearance in the telephone directory, then the first 8 that agreed to be interviewed formed the sample. b) The September 1992 issue of Prevention magazine included a women’s health survey; ap- proximately 16,500 women responded to the survey. The May 1993 issue reported on the survey results, claiming that “92% of our readers rated their health as excellent, very good, or good”. c) Many scholars and policymakers are interested in estimating the prevalence of mental ill- ness among the homeless population. In one study, the authors sampled homeless persons who received medical attention from a clinic that was part of the Health Care for the Home- less project, resulting in an estimated prevalence of 33%.1 The authors maintain that se- lection bias is not a serious problem because the clinics are easily accessible to homeless people. Problem 2. A recently published analysis examined 10 studies that measured optimism and pessimism by asking participants about their level of agreement with statements like “In uncertain times, I usually expect the best,” or “I rarely expect good things to happen to me”. Optimistic people tend to expect that they will encounter favorable outcomes, whereas less optimistic people tend to expect that they will encounter unfavorable outcomes.2 These studies also measured other variables on participants, including factors related to heart disease. The analysis found that compared with pessimists, people with the most optimistic out- look had a 35% lower risk for cardiovascular events (e.g., heart attacks). The studies, on average, 1This project is a federally funded program that brings general health and mental health services to homeless people. 2Alan Rozanski, MD, et al. Association of optimism with cardiovascular events and all-cause mortality. JAMA Network Open 2019; 2(9):e1912200. 1 observed people over a 14-year period and compared the rate of cardiovascular events between those classified as optimists versus pessimists. a) A popular newspaper reports on the analysis with the headline “Thinking Positively Im- proves Cardiovascular Health”. Write a short response to the editor explaining clearly why the headline is potentially misleading. Be sure to use language accessible to a general audi- ence without a statistics background. Limit your answer to at most five sentences. b) Briefly describe a plausible study design that has the potential to demonstrate the effect of thinking positively on cardiovascular health. c) Suppose someone who is very optimistic reads about the analysis and concludes that the findings suggest he has a 35% lower risk for cardiovascular events than his friend who is extremely pessimistic. Explain why this is not necessarily the case. Problem 3. The following graphs are based on data from the National Center for Health Certificates. a) Describe what you see in the two graphs, with particular focus on the differences between the two distributions. b) Economists are interested in the possible causes driving the shape of the age distribution in 2016. i. Discuss a possible reason behind the discrepancy between the 1980 distribution and the 2016 distribution; i.e., what is a potential factor driving the difference in the distri- butions? ii. Discuss a possible reason behind the shape of the age distribution in 2016. 2 Problem 4. The Stanford Open Policing Project is a team of researchers and journalists at Stanford Univeristy working to collect and standardize data on vehicle and pedestrian stops from law enforcement departments across the country, with the goal of investigating and improving interactions between the police and the public. In a recently published analysis based on these data, the authors found that police stops and search decisions suffer from persistent racial bias.3 In this problem, you will work with data from the Stanford Open Policing Project and conduct an exploratory analysis based on approaches used by the study team. The dataset stops.Rdata contains standardized data on police stops in Philadelphia, Pennsylvania between 2013 and 2017. Each case represents a single police stop. The variables are defined as follows: – date: date of the stop, in YYYY-MM-DD format – year: year of the stop – time: 24-hour time for the stop, in HH:MM format – location: freeform text of the location, e.g. street number and street name – lat: latitude of the stop – lng: longitude of the stop – district: police district – service_area: police service area – subject_age: age of the stopped subject – subject_race: race of the stopped subject, recorded as either white, black, hispanic, asian/pacific islander, or other/unknown – subject_sex: the recorded sex of the stopped subject – type: type of stop, either vehicular or pedestrian – arrest_made: recorded as TRUE if an arrest was made, and FALSE if otherwise – outcome: strictest police action taken, either arrest, citation, warning, summons – contraband_found: recorded as TRUE if contraband was found from a search, and FALSE if otherwise – frisk_performed: recorded as TRUE if a frisk was performed, and FALSE if otherwise – search_conducted: recorded as TRUE if a search was conducted, and FALSE if otherwise – search_person: recorded as TRUE if search of a person has occurred, and FALSE if otherwise – search_vehicle: recorded as TRUE if search of a vehicle has occurred, and FALSE if otherwise Use these data to answer the following questions. a) Take an initial look at the stops dataset. i. How many police stops are represented in the data? ii. What date range does the data cover? iii. Of the police stops recorded, what proportion of stops occurred in 2017? b) Describe the distribution of age of stopped subjects, referencing numerical and graphical summaries as needed. 3E. Pierson, C. Simoiu, J. Overgoor, S. Corbett-Davies, D. Jenson, A. Shoemaker, V. Ramachandran, P. Barghouty, C. Phillips, R. Shroff, and S. Goel. A large-scale analysis of racial disparities in police stops across the United States. Nature Human Behaviour, Vol. 4, 2020. 3 c) To narrow the scope of the analysis, we will focus on vehicular police stops that occurred in 2017. Subset the data appropriately and name the subset stops.subset. i. Using numerical and graphical summaries, describe the distribution of race of stopped subjects, among vehicular stops in 2017. Does any race appear to be overrepresented? ii. In a few sentences, briefly explain why it would be helpful to account for racial demo- graphics in Philadelphia when interpreting the values in part i. iii. The dataset population_2017.Rdata contains information about racial demographics in Philadelphia for 2017. Use this information to compute the “stop rate” for each group, where stop rate is defined as number of police stops per member of the population. For example, if 10 police stops occur in which the stopped subject is Asian, and there are 100 Asian members of the population, the stop rate for Asians is 10/100 = 0.10. Report the stop rate for each race group. iv. Based on the calculations in part iii., relative to white drivers, how much more often are black drivers stopped by the police? Relative to white drivers, how much more often are Hispanic drivers stopped by the police? d) After a driver is stopped, officers may carry out a search of the driver or vehicle if they suspect more serious criminal activity. One strategy for understanding whether data suggest biased decision-making is the outcome test, which is based on assessing the proportion of searches that successfully identify contraband. If searches of minorities are successful less often than searches of whites, this suggests that officers are searching minorities on the basis of less evidence. i. Calculate hit races by race in Philadelphia in 2017 for vehicular stops, where hit rate is defined as the proportion of searches in which contraband was found. Describe your findings. It may be the case that the bar for stopping people is lower in certain police districts, and that minorities are more likely to live in neighborhoods in those districts. The dataframe hit_rates.Rdata contains the hit rate for whites, black, and Hispanics in each police district in Philadelphia (for vehicular stops and searches in 2017). Information about each district is contained in two rows: one row contains the hit rates of black drivers and one row contains the hit rates of Hispanic drivers. ii. Create a plot that summarizes the relationship between the hit rates of black drivers and the hit rates of white drivers for police districts in Philadelphia. iii. Add a y = x line to the plot from part i. Describe what a point on the y = x line would represent in context of the data. iv. With reference to the y = x line, describe what you see in the plotted data. Are the results suggestive of bias against black drivers? Explain your answer. 4 Problem 5. Vitamin D is essential for growth and bone health in children. It can be either obtained from dietary sources or produced by the body upon exposure of skin to ultraviolet waves (typically via sun exposure). Natural food sources rich in Vitamin D are scarce. Even in many low latitude countries where sunshine is plentiful, Vitamin D deficiency is a public health concern. A study was conducted to evaluate Vitamin D status among schoolchildren in Thailand. The study drew data from a randomized trial conducted in rural subdistricts of a specific subregion of the country that assessed the efficacy of a seasoning powder fortified with iron, zinc, iodine, and Vitamin A for reducing anemia. Exposure to sunlight allows the body to produce serum 25(OH)D, which is a marker of Vitamin D status. Serum 25(OH)D is then converted into a biologically active form, serum 1,25(OH)2D. Data on both serum levels were used to determine the prevalence of Vitamin D deficiency in the subpopulation under study. Vitamin D deficiency is defined as having a serum 25(OH)D level below 50 nmol/L. The file vitamin_d
Answered Same DayJun 26, 2021

Answer To: Problem Set 1 Problem Set 1 Statistics 100 Due June 29, 2020 at 11:59 pm Problem set policies....

Bezawada Arun answered on Jun 29 2021
146 Votes
---
title: "Problem Set 1"
author: "Ioannis Lamprou"
date: "26 June 2020 - 08:20"
output:
pdf_document:
fig_height: 3.5
fig_width: 5
word_document: default
geometry: margin=1in
fontsize: 11pt
---
## Problem 1.
a)
First of all, this is sampling bias. The reason is that the first eight architects whose last names are higher in the alphabet order have a higher possibility to be selected than those who their last names are lower in the a
lphabet order.
In an unbiased sample, each person in the population has equal chances to be sampled (selected).
b)
First of all, 92% of women that responded to the survey that their health is excellent, very good, or good does not mean that 92% of women that read this particular magazine gave these ratings. Also, this does not even mean that 92% of the magazine readers gave these ratings as well.
So, in my personal opinion, the sample that was announced is not randomly selected between the females that read the magazine. For instance, there is a possibility that the magazine selected the women that are in excellent health who are more likely to respond to these types of surveys.
In my opinion, the magazine was inaccurately reported to its readers about this subject.
c)
In my opinion, there is no possibility for each person to seek medical attention from a clinic. For instance, individuals that need general health services are more likely to seek medical help from clinics than people with mental illness. So, even in the case that clinics are accessible, in this sample, there is probably a selection bias.
## Problem 2.
a)
Dear (Editors name),
The newspaper headline "Thinking Positively Improves Cardiovascular Health" is misleading as the studies do not indicate that thinking positively would improve the cardiovascular health of people.
The studies indicated that optimistic individuals have a 35% lower likelihood of getting a heart attack. They have not shown in the results or even revealed that it would enhance your cardiovascular health.
b)
A plausible study design that has the potential to demonstrate the effect of thinking positively on cardiovascular health is the following:
i. The study must be in two parts: a systematic review (latest researches) and meta-analysis.
ii. The study must contain a large portion of the population of different ages, countries, climates for a long period of time.
iii. The study must take into consideration other factors that affect cardiovascular health.
c)
I will explain to him that researches does not show that thinking positively would improve the cardiovascular health of people. However, there is a strong correlation between these two but there are other factors like age, physical activity that need to be considered in the studies.
## Problem 3.
a)
The first graph on the left (1980 graph) exhibits a unimodal distribution with little right skewing for 20 years. On the other hand, the second graph (2016 graph) exhibits a bimodal distribution with two peaks, one peak in almost 20 years and the second one in 29 years approximately.
b)
i.

Comparing these two graphs, we can identify that women have more opportunities to find jobs or access pursuing higher education, for instance, than in the past. So, the result is that women may delay having children until they reach their dreams (join the workforce, pursue an education) compared to the past. As we can see, this is the main difference between these two graphs that is resulting in a shift during this time period.

ii.
A possible reason behind the shape of the age distribution in 2016 is it is a bimodal shape; it can be due to geography factors. For example, there are places where it is common for women to have children in their 20s while there are other places that women can delay having children until their late 20s.
\newpage
## Problem 4.
a)
i.
```{r, warning = FALSE, message = FALSE}
install.packages("knitr")
install.packages("tinytex")
library(knitr)
library(tinytex)
tinytex::install_tinytex()
install.packages("markdown")
install.packages("rmarkdown")
d= read.table(file=".txt",header=TRUE,sep="")
render("pset01summer2020i-frknptbu_updated.rmd","pdf_document")
sprintf(gettext(fmt, domain = domain), ...)
gettextf("Package %s version %s cannot be unloaded:\n %s", sQuote(package), oldversion, paste0(P, conditionMessage(e), "\n"))
stop(gettextf("Package %s version %s cannot be unloaded:\n %s", sQuote(package), oldversion, paste0(P, conditionMessage(e), "\n")), domain = NA)
# load the data
load("datasets/stops.Rdata")
# number of...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here