Stats 10 W2020 Lab 3 Lab 3: Linear Regression, Probability, and Sampling Stats 10: Introduction to Statistical Reasoning Winter 2020 All rights reserved, Adam Chaffee and Michael Tsiang, XXXXXXXXXX....

lab


Stats 10 W2020 Lab 3 Lab 3: Linear Regression, Probability, and Sampling Stats 10: Introduction to Statistical Reasoning Winter 2020 All rights reserved, Adam Chaffee and Michael Tsiang, 2017-2020. Do not post, share, or distribute anywhere or with anyone without explicit permission. Objectives 1. Understand linear regression in R and verify linear regression assumptions 2. Plotting time series to analyze trends 3. Use R for sampling and simulation 4. Calculate theory-based probabilities for normal and binomial distributions Collaboration Policy In lab you are encouraged to work in pairs or small groups to discuss the concepts on the assignments. However, DO NOT copy each other’s work as this constitutes cheating. The work you submit must be entirely your own. If you have a question in lab, feel free to reach out to other groups or talk to your TA if you get stuck. Linear Regression in R You learned about linear regression in lecture. Now, we will learn how to code it in lab. We will be using the following new commands: • lm() will create an output containing the equation of the regression line, correlation coefficient, p-values, residuals, and much more. We typically create an object to store the results of the lm() function (see example below). • abline() is a command to generate a regression line on a plot of the form y = a + bx. It requires arguments for a and b, or linear model results (see example below). Try running an example of the code on the following page by loading and using NCbirths below. We will discuss what each line does in section. ## Run the linear model of weight against Mom's age and print a summary linear_model <- lm(ncbirths$weight="" ~="" ncbirths$mage)="" summary(linear_model)="" ##="" create="" a="" plot="" of="" the="" data,="" and="" draw="" the="" regression="" line="" using="" abline="" plot(ncbirths$weight="" ~="" ncbirths$mage,="" xlab="Mom Age" ,="" ylab="Weight" ,="" main="Regression of Weight on Mother's Age" )="" abline(linear_model,="" col="red" ,="" lwd="2)" ##="" create="" a="" plot="" of="" the="" residuals="" to="" assess="" regression="" assumptions="" plot(linear_model$residuals="" ~="" ncbirths$mage,="" main="Residuals plot" )="" ##="" add="" a="" line="" of="" y="0" to="" help="" visualize="" the="" residuals="" abline(a="0," b="0," col="red" ,="" lwd="2)" exercise="" 1="" we="" will="" be="" working="" with="" some="" soil="" mining="" data="" and="" are="" interested="" in="" looking="" at="" some="" of="" the="" relationships="" between="" metal="" concentrations="" (in="" ppm).="" use="" the="" line="" below="" to="" obtain="" the="" data:="" soil=""><- read.table("http://www.stat.ucla.edu/~nchristo/statistics_c173_c273/soil_complete.txt",="" header="TRUE)" a.="" run="" a="" linear="" regression="" of="" lead="" against="" zinc="" concentrations="" (treat="" lead="" as="" the="" response="" variable).="" use="" the="" summary="" function="" just="" like="" in="" the="" example="" above="" and="" paste="" the="" output="" into="" your="" report.="" b.="" plot="" the="" lead="" and="" zinc="" data,="" then="" use="" the="" abline()="" function="" to="" overlay="" the="" regression="" line="" onto="" the="" data.="" c.="" in="" a="" separate="" plot,="" plot="" the="" residuals="" of="" the="" regression="" from="" (a),="" and="" again="" use="" the="" abline()="" function="" to="" overlay="" a="" horizontal="" line.="" parts="" d-h="" can="" be="" answered="" by="" hand,="" using="" a="" calculator,="" or="" any="" r="" functions="" of="" your="" choice.="" d.="" based="" on="" the="" output="" from="" (a),="" what="" is="" the="" equation="" of="" the="" linear="" regression="" line?="" e.="" imagine="" we="" have="" a="" new="" data="" point.="" we="" find="" out="" that="" the="" zinc="" concentration="" at="" this="" point="" is="" 1,000="" ppm.="" what="" would="" we="" expect="" the="" lead="" concentration="" at="" this="" point="" to="" be?="" f.="" imagine="" two="" locations="" (a="" and="" b)="" for="" which="" we="" only="" observe="" zinc="" concentrations.="" location="" a="" contains="" 100ppm="" higher="" concentration="" of="" zinc="" than="" location="" b.="" how="" much="" higher="" would="" we="" expect="" the="" lead="" concentration="" to="" be="" in="" location="" a="" compared="" to="" location="" b?="" g.="" report="" the="" r-squared="" value="" and="" explain="" in="" words="" what="" it="" means="" in="" context.="" h.="" comment="" on="" whether="" you="" believe="" the="" three="" main="" assumptions="" (linearity,="" symmetry,="" equal="" variance)="" for="" linear="" regression="" are="" met="" for="" this="" data.="" list="" any="" concerns="" you="" have.="" exercise="" 2="" our="" next="" data="" set="" is="" what="" is="" known="" as="" a="" time="" series,="" or="" data="" in="" time.="" it="" contains="" the="" measurements="" via="" satellite="" imagery="" of="" sea="" ice="" extent="" in="" millions="" of="" square="" kilometers="" for="" each="" month="" from="" 1988="" to="" 2011.="" please="" download="" the="" “sea_ice”="" data="" from="" ccle="" and="" read="" it="" into="" r.="" if="" you="" have="" your="" working="" directory="" properly="" set,="" you="" can="" use="" the="" line="" below:="" ice=""><- read.csv("sea_ice.csv",="" header="TRUE)" note="" that="" currently="" r="" does="" not="" know="" what="" class="" the="" date="" column="" is.="" we="" need="" to="" convert="" the="" date="" column="" into="" class="" “date”="" using="" the="" following="" line:="" ice$date=""><- as.date(ice$date, "%m/%d/%y") a. produce a summary of a linear model of sea ice extent against time. b. plot the data and overlay the regression line. does there seem to be a trend in this data? c. plot the residuals of the model over time and include a horizontal line. what assumption(s) about the linear model should we be concerned about? sampling and simulating in r we can use the sample() function to sample data from a vector, and the replicate() function to simulate random events many times over. note that the computer is not truly random, but it is close enough for our purposes to consider it random. we also rely on the set.seed() function to make our “random” results reproducible. try the following examples in r and see the ## comments for descriptions. ## set seed for reproducibility set.seed(1335) ## create a names vector names = c("leslie", "ron", "andy", "april", "tom", "ben", "jerry") ## sample 3 of the names with replacement sample(names, 3, replace = true) ## sample 3 of the names without replacement sample(names, 3, replace = false) ## create a vector from 1 to 10 numbers = 1:10 ## simulate sampling 3 different numbers at random, 10 times replicate(10, sample(numbers, 3, replace = false)) ## one more time, but save as an object rand_draws = replicate(10, sample(numbers, 3, replace = false)) ## perform analysis on the random draws colmeans(rand_draws) ## takes the mean of each sample colsums(rand_draws) ## takes the sum of each sample exercise 3 one of adam’s favorite casino games is called “craps”. in the first round of this game, two fair 6-sided dice are rolled. if the sum of the two dice equal 7 or 11, adam doubles his money! if a 2, 3, or 12 are rolled, adam loses all the money he bets. l a. based on your lecture notes, what is the chance adam will double his money in the first round of the game? what is the chance adam will lose his money in the first round of the game? b. let’s now approximate the results in (a) by simulation. first, set the seed to 123. then, create an object that contains 5,000 sample first round craps outcomes (simulate the sum of 2 dice, 5,000 times). use the appropriate function to visualize the distribution of these outcomes (hint: are the outcomes discrete or continuous?). c. imagine these sample results happened in real life for adam. using r functions of your choice, calculate the percentage of time adam doubled his money. calculate the percentage of time adam lost his money. d. adam winning money and adam losing money can both be considered events. are these two events independent, disjoint, or both? explain why. e. quickly mathematically verify by calculator if those events are independent using part (a) and what you learned in lecture. show work. calculating normal and binomial distribution probabilities the functions pnorm(), dbinom(), and pbinom() allow us to calculate theoretical probabilities using certain assumptions about the distribution. to use the pnorm() function, r assumes a normal distribution with a mean, sd, and some observation we want to find a probability for. the binom functions assume a binomial distribution with sample size n, the probability of success p, and some observation. try running the below examples. ## coin flipping scenario. probability of getting 4 heads when 7 coins are tossed dbinom(4, size = 7, prob = 0.5) ## probability of getting 4 heads or less when 7 coins are tossed pbinom(4, size = 7, prob = 0.5) ## probability of getting a number less than 4 from a normal distribution with mean 2, sd of 7 pnorm(4, mean = 2, sd = 0.7) feel free to use the help screen, or google, for more examples. exercise 4 tenorio national park in costa rica has a roughly consistent year-round climate. on any given day, we assume there is a 40% chance of heavy rain. a. we are interested in forecasting the number of heavy rain days during 2019. write down n and p if we are to use the binomial distribution for this forecast. b. calculate the mean and standard deviation of heavy rain days in 2019 using the binomial model. use r or a calculator. c. find the probability that the park will experience exactly 145 days of heavy rain in 2019. d. find the probability that the park will see between 125 and 175 days of heavy rain in 2019. e. the yearly amount of rainfall in the park is normally distributed with mean 200 inches and standard deviation of 20 inches. find the probability that the park will experience more than 230 inches of rain in 2019. celeste falls – tenorio national park as.date(ice$date,="" "%m/%d/%y")="" a.="" produce="" a="" summary="" of="" a="" linear="" model="" of="" sea="" ice="" extent="" against="" time.="" b.="" plot="" the="" data="" and="" overlay="" the="" regression="" line.="" does="" there="" seem="" to="" be="" a="" trend="" in="" this="" data?="" c.="" plot="" the="" residuals="" of="" the="" model="" over="" time="" and="" include="" a="" horizontal="" line.="" what="" assumption(s)="" about="" the="" linear="" model="" should="" we="" be="" concerned="" about?="" sampling="" and="" simulating="" in="" r="" we="" can="" use="" the="" sample()="" function="" to="" sample="" data="" from="" a="" vector,="" and="" the="" replicate()="" function="" to="" simulate="" random="" events="" many="" times="" over.="" note="" that="" the="" computer="" is="" not="" truly="" random,="" but="" it="" is="" close="" enough="" for="" our="" purposes="" to="" consider="" it="" random.="" we="" also="" rely="" on="" the="" set.seed()="" function="" to="" make="" our="" “random”="" results="" reproducible.="" try="" the="" following="" examples="" in="" r="" and="" see="" the="" ##="" comments="" for="" descriptions.="" ##="" set="" seed="" for="" reproducibility="" set.seed(1335)="" ##="" create="" a="" names="" vector="" names="c("Leslie"," "ron",="" "andy",="" "april",="" "tom",="" "ben",="" "jerry")="" ##="" sample="" 3="" of="" the="" names="" with="" replacement="" sample(names,="" 3,="" replace="TRUE)" ##="" sample="" 3="" of="" the="" names="" without="" replacement="" sample(names,="" 3,="" replace="FALSE)" ##="" create="" a="" vector="" from="" 1="" to="" 10="" numbers="1:10" ##="" simulate="" sampling="" 3="" different="" numbers="" at="" random,="" 10="" times="" replicate(10,="" sample(numbers,="" 3,="" replace="FALSE))" ##="" one="" more="" time,="" but="" save="" as="" an="" object="" rand_draws="replicate(10," sample(numbers,="" 3,="" replace="FALSE))" ##="" perform="" analysis="" on="" the="" random="" draws="" colmeans(rand_draws)="" ##="" takes="" the="" mean="" of="" each="" sample="" colsums(rand_draws)="" ##="" takes="" the="" sum="" of="" each="" sample="" exercise="" 3="" one="" of="" adam’s="" favorite="" casino="" games="" is="" called="" “craps”.="" in="" the="" first="" round="" of="" this="" game,="" two="" fair="" 6-sided="" dice="" are="" rolled.="" if="" the="" sum="" of="" the="" two="" dice="" equal="" 7="" or="" 11,="" adam="" doubles="" his="" money!="" if="" a="" 2,="" 3,="" or="" 12="" are="" rolled,="" adam="" loses="" all="" the="" money="" he="" bets.="" l="" a.="" based="" on="" your="" lecture="" notes,="" what="" is="" the="" chance="" adam="" will="" double="" his="" money="" in="" the="" first="" round="" of="" the="" game?="" what="" is="" the="" chance="" adam="" will="" lose="" his="" money="" in="" the="" first="" round="" of="" the="" game?="" b.="" let’s="" now="" approximate="" the="" results="" in="" (a)="" by="" simulation.="" first,="" set="" the="" seed="" to="" 123.="" then,="" create="" an="" object="" that="" contains="" 5,000="" sample="" first="" round="" craps="" outcomes="" (simulate="" the="" sum="" of="" 2="" dice,="" 5,000="" times).="" use="" the="" appropriate="" function="" to="" visualize="" the="" distribution="" of="" these="" outcomes="" (hint:="" are="" the="" outcomes="" discrete="" or="" continuous?).="" c.="" imagine="" these="" sample="" results="" happened="" in="" real="" life="" for="" adam.="" using="" r="" functions="" of="" your="" choice,="" calculate="" the="" percentage="" of="" time="" adam="" doubled="" his="" money.="" calculate="" the="" percentage="" of="" time="" adam="" lost="" his="" money.="" d.="" adam="" winning="" money="" and="" adam="" losing="" money="" can="" both="" be="" considered="" events.="" are="" these="" two="" events="" independent,="" disjoint,="" or="" both?="" explain="" why.="" e.="" quickly="" mathematically="" verify="" by="" calculator="" if="" those="" events="" are="" independent="" using="" part="" (a)="" and="" what="" you="" learned="" in="" lecture.="" show="" work.="" calculating="" normal="" and="" binomial="" distribution="" probabilities="" the="" functions="" pnorm(),="" dbinom(),="" and="" pbinom()="" allow="" us="" to="" calculate="" theoretical="" probabilities="" using="" certain="" assumptions="" about="" the="" distribution.="" to="" use="" the="" pnorm()="" function,="" r="" assumes="" a="" normal="" distribution="" with="" a="" mean,="" sd,="" and="" some="" observation="" we="" want="" to="" find="" a="" probability="" for.="" the="" binom="" functions="" assume="" a="" binomial="" distribution="" with="" sample="" size="" n,="" the="" probability="" of="" success="" p,="" and="" some="" observation.="" try="" running="" the="" below="" examples.="" ##="" coin="" flipping="" scenario.="" probability="" of="" getting="" 4="" heads="" when="" 7="" coins="" are="" tossed="" dbinom(4,="" size="7," prob="0.5)" ##="" probability="" of="" getting="" 4="" heads="" or="" less="" when="" 7="" coins="" are="" tossed="" pbinom(4,="" size="7," prob="0.5)" ##="" probability="" of="" getting="" a="" number="" less="" than="" 4="" from="" a="" normal="" distribution="" with="" mean="" 2,="" sd="" of="" 7="" pnorm(4,="" mean="2," sd="0.7)" feel="" free="" to="" use="" the="" help="" screen,="" or="" google,="" for="" more="" examples.="" exercise="" 4="" tenorio="" national="" park="" in="" costa="" rica="" has="" a="" roughly="" consistent="" year-round="" climate.="" on="" any="" given="" day,="" we="" assume="" there="" is="" a="" 40%="" chance="" of="" heavy="" rain.="" a.="" we="" are="" interested="" in="" forecasting="" the="" number="" of="" heavy="" rain="" days="" during="" 2019.="" write="" down="" n="" and="" p="" if="" we="" are="" to="" use="" the="" binomial="" distribution="" for="" this="" forecast.="" b.="" calculate="" the="" mean="" and="" standard="" deviation="" of="" heavy="" rain="" days="" in="" 2019="" using="" the="" binomial="" model.="" use="" r="" or="" a="" calculator.="" c.="" find="" the="" probability="" that="" the="" park="" will="" experience="" exactly="" 145="" days="" of="" heavy="" rain="" in="" 2019.="" d.="" find="" the="" probability="" that="" the="" park="" will="" see="" between="" 125="" and="" 175="" days="" of="" heavy="" rain="" in="" 2019.="" e.="" the="" yearly="" amount="" of="" rainfall="" in="" the="" park="" is="" normally="" distributed="" with="" mean="" 200="" inches="" and="" standard="" deviation="" of="" 20="" inches.="" find="" the="" probability="" that="" the="" park="" will="" experience="" more="" than="" 230="" inches="" of="" rain="" in="" 2019.="" celeste="" falls="" –="" tenorio="" national="">
Feb 15, 2021
SOLUTION.PDF

Get Answer To This Question

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here