this is a problem set for a statistics/data analysis course. Need to be in R markdown format未命名 2...

Question

this is a problem set for a statistics/data analysis course. Need to be in R markdown format

未命名 2 SIMULATION STUDY Problem 1   People are notoriously bad at generating random numbers in their heads. In this problem, we will compare random binary sequences generated experimentally versus ones that are made up. A.  Write down a sequence of 100 binary digits (each digit is a ‘0’ or a ‘1’) that you make up off the top of your head without any experimental or computational help. B.  Find a way to generate a sequence of 100 binary digits that represent an independent sample of size 100 from a Bernoulli distribution with p=0.5. C.  Let be the digit in either of the sequences you generated.  Compute the sample mean, , for both sequences.  What is the expected value for independent Bernoulli random variables with ?  Compute the sample covariance between adjacent samples, , for both sequences.  What is the theoretical covariance for independent Bernoulli random variables?  How do the sample statistics from each sequence compare to the theoretical values? D.  A run is a sequence of adjacent ‘0’s or ‘1’s.  If a sequence of random Bernoulli variables is drawn independently, after drawing a first ‘0’ or ‘1’, what is the probability that the first run will be of length 1?  What is the probability that the first run will be of length 2?  What is the probability that the first run will be of length ?  Plot the empirical PMF and empirical CDF of the lengths of runs from both sequences, and compare these to the theoretical distributions. E. Discuss how well your made up binary sequence resembles an actual sequence of independent Bernoulli[0.5] random variables. If you wanted to determine whether a binary sequence was generated from an independent Bernoulli process or made up by a person, which statistics would you check? F. Write code to generate 0s and 1s in the following way. 1. For the first sample, randomly select a 0 or 1 with probability 1/2 for each. 2. If the last sample was a 1, the next sample will be a 1 with probability and a 0 with probability . If the last sample was a 0, the next sample will be a 1 with probability and a 0 with probability . 3. Repeat until the sample is length 100. G. Come up with a statistic to estimate in the above problem by computing the number of times the following sequential pairs appear in the sequence: 00, 01, 10, 11. For example in the sequence 0001101 you would make this table. Use this statistics to estimate for your sequence from part A. Do you think the above model is a good model for your sequence? Is there something different about your sequence? DATA ANALYSES Problem 2 The dataset contains (made up) pilot data for a test of a new cholesterol drug. 20 high-cholesterol patients were assigned to a drug group and 20 patients were assigned to a placebo group. Their blood cholesterol levels were measured before and after a 1-month regimen of the drug or placebo. The dataset contains the patient blood triglyceride levels in mg/dL before and after the regimen. The group variable contains a ‘0’ for the placebo group and a ‘1’ for the drug group. A.  Load the data into a statistics software package (such as R or MATLAB).  Visualize the data before and after the intervention.  Use some descriptive statistics to describe and compare the data between the before and after periods.  Describe the structure of the data in words based on your statistics and visualizations. B.  Compute the change in triglyceride level for each patient, both as a change in the raw value (in mg/dL) and as a percentage change from the period before the regimen.  Visualize and use descriptive statistics to characterize features of the levels of change for both the drug and placebo groups.  Also compute the fraction of patients who saw a reduction of triglyceride level from each of the drug and placebo groups.  Is there evidence for an improved effect of the drug relative to a placebo? 0 1 0 2 2 1 1 1 C.  When you show your results to the clinicians studying the drug, they hypothesize that some of the measurements might be abnormally high because patients did not fast before their blood was drawn.  Abnormal data points are often called outliers.  Is their any evidence in the data of abnormally high measurements? If so, remove these outliers and repeat the analyses from part B.  How does this change your results? D.  What would you recommend to the researchers regarding this dataset? What are the potential drawbacks of removing the outlier points in part C?

pset-1-2evayn4o.pdf cholesterol-j42j44cx.csv

Mohd · Accepted Answer

---
title: "Cholestrol Data Analysis"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
importing data
```{r}
library(readr)
cholesterol_j42j44cx % as_tibble() %>% mutate(
  Differ = After-Before,
  percent_diff = ((After-Before)/Before)*100
)
View(placebo2)
summary(placebo2)
```
Visualize placebo group 
```{r}
par(mfrow=c(1,2))
plot(placebo2$Differ,
     placebo2$cnt ,type = 'b', 
     col= 'red', lwd = 2,
     ylab = 'triglyceride levels in mg/dL', 
     main = 'Differ')
plot(placebo2$percent_diff, placebo2$cnt ,type = 'b',
     col= 'red',
     lwd = 2,
     ylab = 'triglyceride levels in mg/dL',
     main = 'percent_diff')
```
80% of people belong to drug group recorded decrease in triglyceride levels.

未命名 2 SIMULATION STUDY Problem 1 People are notoriously bad at generating random numbers in their heads. In this problem, we will compare random binary sequences generated experimentally versus ones...

Answer To: 未命名 2 SIMULATION STUDY Problem 1 People are notoriously bad at generating random numbers in their...

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment