do the codes in rstudio

do the codes in rstudio


Project 1 Project 1: Heating in Sønderborg Formalities, structure and expectations for the first mandatory project The assignment consists of two parts. The first part focuses on descriptive analysis of the data. The second part is primarily about confidence intervals and hypothesis tests. The assignment is formulated in such a way that it can be solved in small “easy” steps. In practice, the assignment must be solved using the statistical software R. Some R code is provided in order to make it easy to get started with the project. However, the code is not complete, and you are encouraged to explore new features in R while working on the project. For example, you could add suitable titles to the plots, or use R’s built-in functions for computing confidence intervals and testing hypotheses. The results of the analysis must be documented in the report using tables, figures, mathematical notation, and explanatory text. Relevant figures and tables must be in- cluded within the text, not in the appendix. Present the results of your analysis as you would when explaining them to one of your peers. Divide the report into subsections, one for each of the questions to be answered. The report must be handed in as a pdf file. R code should not be included in the re- port itself but must be handed in as an appendix (a .R-file). The report and appendix must be handed in under Assignments on Learn at: Projekt 1: Varmeforbrug i Sønderborg The report text should not exceed 6 pages (excluding figures, tables, and the appendix). A normal page contains 2400 characters. Project 2 Figures and tables cannot stand alone - it is important that you describe and explain the R output in words. Figures and tables are not included in the assessment of the length of the report. How- ever, it is not in itself an advantage to include many figures, if they are not relevant! You may work together in groups, but the report must be written individually. Ques- tions about the project can be addressed to the teaching assistants, see the guidelines on the Projects page of the course website. Introduction This project focuses on the daily heat consumption for four houses in Sønderborg dur- ing the period from October 2008 to June 2011. The relation between the surrounding climate and the heat consumption can give insight into the actual level of insulation of the houses. E.g. the relation between wind speed and heat consumption is an indicator of how airtight/exposed the houses are. Ideally, this kind of data can be used to make an empirical energy signature for houses. As long as the residents’ behaviour doesn’t change too much during the period of ob- servation, the analysis will be independent of the indoor temperature. The houses are detached single-family houses. Reading the data into R Make a folder for the project on your computer. Download the project material from Learn and unzip it to the folder that you just made. Then, open the data file soenderborg1_data.csv (e.g., in RStudio, File → Open File) in order to see the contents of the file. Note that the first row (referred to as a header) contains variable names, and that the subsequent rows contain the actual observa- tions. Variable names and observations of the individual variables are separated by a ’;’ (therefore .csv: “comma separated values”, though here it is a semi-colon). The data consists of daily observations of the heat consumption in four houses together with daily averages of centrally observed climatic variables. The file contains the fol- lowing columns/variables: Project 3 Variable Explanation t Date Ta Ambient air temperature (◦C) G Global radiation (W/m2) Ws Wind speed (m/s) Q1 Heat consumption in House 1 (kW/day) Q2 Heat consumption in House 2 (kW/day) Q3 Heat consumption in House 3 (kW/day) Q4 Heat consumption in House 4 (kW/day) Open the file soenderborg1_english.R, which contains some R code that can be used for the analysis. First, the "working directory" must be set to the directory on the com- puter, which contains the files for the project: ## In RStudio the working directory is easily set via the menu ## "Session -> Set Working Directory -> To Source File Location" ## Note: In R only "/" is used for separating in paths ## (i.e. no backslash). setwd("Replace with path to directory containing project files.") Now the data may be read into R using the following code: ## Read data from soenderborg1_data.csv D <- read.table("soenderborg1_data.csv",="" header="TRUE," sep=";" ,="" as.is="TRUE)" d="" becomes="" a="" "data.frame"="" (a="" kind="" of="" table),="" which="" contains="" the="" data="" that="" was="" read="" into="" r="" (see="" the="" introduction="" to="" r="" in="" section="" 1.5="" of="" the="" book).="" descriptive="" analysis="" the="" purpose="" of="" the="" first="" part="" of="" the="" project="" is="" to="" carry="" out="" a="" descriptive="" analysis="" of="" the="" data.="" in="" a="" report="" it="" is="" important="" to="" present="" the="" data="" and="" describe="" it="" to="" the="" reader.="" for="" example,="" this="" can="" be="" done="" using="" summary="" statistics="" and="" suitable="" figures.="" start="" by="" running="" the="" following="" commands="" to="" get="" a="" simple="" overview="" of="" the="" data:="" ##="" dimensions="" of="" d="" (number="" of="" rows="" and="" columns)="" dim(d)="" ##="" column/variable="" names="" https://02323.compute.dtu.dk/filemanager/02323/sharelatex-public/files/book-introstatistics.pdf#dv:sec:r="" project="" 4="" names(d)="" ##="" the="" first="" rows/observations="" head(d)="" ##="" the="" last="" rows/observations="" tail(d)="" ##="" selected="" summary="" statistics="" summary(d)="" ##="" another="" type="" of="" summary="" of="" the="" dataset="" str(d)="" a)="" write="" a="" short="" description="" of="" the="" data.="" which="" variables="" are="" included="" in="" the="" dataset?="" are="" the="" variables="" quantitative="" and/or="" categorized="" (or="" date="" variables)?="" (catego-="" rized="" variables="" are="" only="" introduced="" in="" chapter="" 8,="" but="" they="" are="" simply="" variables="" which="" divide="" the="" observations="" into="" categories/groups="" -="" e.g.="" three="" categories:="" low,="" medium,="" and="" high.)="" how="" many="" observations="" are="" there?="" which="" time="" period="" is="" covered="" by="" the="" observations="" (date="" of="" first="" and="" last="" observations)?="" are="" there="" any="" missing="" values?="" the="" following="" code="" may="" be="" used="" to="" generate="" a="" "density="" histogram"="" describing="" the="" em-="" pirical="" density="" of="" the="" heat="" consumption="" of="" house="" 1="" (see="" section="" 1.6.1):="" ##="" histogram="" describing="" the="" empirical="" density="" of="" the="" daily="" heat="" ##="" consumptions="" of="" house="" 1="" (histogram="" of="" daily="" consumptions="" normalized="" ##="" to="" have="" an="" area="" of="" 1)="" hist(d$q1,="" xlab="Heat consumption (House 1)" ,="" prob="TRUE)" b)="" make="" a="" density="" histogram="" of="" the="" daily="" heat="" consumption="" of="" house="" 1.="" use="" this="" histogram="" to="" describe="" the="" empirical="" distribution="" of="" the="" daily="" heat="" consumption.="" is="" the="" empirical="" density="" symmetrical="" or="" skewed?="" can="" the="" heat="" consumption="" be="" negative?="" is="" there="" much="" variation="" to="" be="" seen="" in="" the="" observations?="" note:="" in="" a="" skewed="" distribution,="" the="" probability="" mass="" is="" not="" symmetrically="" distributed="" around="" the="" median.="" in="" a="" left-skewed="" distribution,="" the="" left="" tail="" is="" longer="" than="" the="" right="" tail="" (and,="" typically,="" the="" mean="" will="" lie="" to="" the="" left="" of="" the="" median).="" similarly,="" in="" a="" right-="" skewed="" distribution,="" the="" right="" tail="" is="" the="" longer="" of="" the="" two="" (usually,="" with="" the="" mean="" to="" the="" right="" of="" the="" median).="" https://02323.compute.dtu.dk/filemanager/02323/sharelatex-public/files/book-introstatistics.pdf#aov:cha="" https://02323.compute.dtu.dk/filemanager/02323/sharelatex-public/files/book-introstatistics.pdf#dv:sec:freq-distr-hist="" project="" 5="" when="" observations="" are="" recorded="" regularly="" over="" time,="" the="" data="" is="" often="" referred="" to="" as="" a="" time="" series.="" thus,="" the="" daily="" heat="" consumptions="" of="" each="" house="" constitute="" a="" times="" series.="" for="" time="" series,="" it="" is="" often="" relevant="" to="" make="" figures="" illustrating="" the="" data="" over="" time.="" here,="" it="" is="" first="" necessary="" to="" tell="" r="" that="" the="" variable="" t="" should="" be="" treated="" as="" a="" date="" variable.="" this="" can="" be="" done="" using="" the="" following="" code:="" ##="" converts="" the="" variable="" 't'="" to="" a="" date="" variable="" in="" r="" d$t=""><- as.date(x="D$t," format="%Y-%m-%d" )="" ##="" checks="" the="" result="" summary(d$t)="" a="" plot="" illustrating="" the="" daily="" heat="" consumption="" over="" time="" for="" each="" house="" for="" the="" period="" 2="" october="" 2008="" to="" 1="" october="" 2010="" (coloured="" according="" to="" house)="" can="" now="" be="" made="" using="" the="" following="" r="" code:="" ##="" plot="" of="" heat="" consumption="" over="" time="" plot(d$t,="" d$q1,="" type="l" ,="" xlim="as.Date(c("2008-10-02","2010-10-01"))," ylim="c(0,9)," xlab="Date" ,="" ylab="Heat consumption" ,="" col="2)" lines(d$t,="" d$q2,="" col="3)" lines(d$t,="" d$q3,="" col="4)" lines(d$t,="" d$q4,="" col="5)" ##="" add="" a="" legend="" legend("topright",="" legend="paste0("Q"," c(1,2,3,4)),="" lty="1," col="2:5)" note="" that="" the="" data="" has="" missing="" values="" –="" shown="" as="" gaps="" in="" the="" time="" series="" plot.="" c)="" make="" a="" plot="" illustrating="" the="" daily="" heat="" consumption="" over="" time="" for="" the="" period="" 2="" october="" 2008="" to="" 1="" october="" 2010="" (coloured="" according="" to="" house).="" describe="" the="" de-="" velopment="" of="" the="" heat="" consumption="" over="" time="" in="" words.="" is="" it="" similar="" across="" the="" four="" houses?="" is="" the="" daily="" heat="" consumption="" stable?="" is="" is="" possible="" to="" identify="" the="" heating="" season?="" are="" there="" any="" time="" periods="" with="" unexpected="" levels="" of="" heat="" con-="" sumption?="" when="" doing="" data="" analysis,="" it="" is="" often="" useful="" to="" be="" able="" to="" take="" subsets="" of="" the="" data.="" this="" can="" be="" done="" in="" r="" using,="" e.g.,="" the="" subset="" function.="" see="" the="" remark="" on="" p.="" 11="" as="" well.="" in="" the="" further="" analysis,="" only="" data="" from="" the="" period="" january-february="" 2010="" is="" to="" be="" used.="" use="" the="" r="" code="" below="" to="" make="" a="" subset="" of="" the="" data="" which="" only="" includes="" the="" observations="" from="" these="" two="" months:="" project="" 6="" ##="" subset="" of="" the="" data:="" only="" jan-feb="" 2010="" dsel=""><- subset(d,="" "2010-01-01"=""><= t="" &="" t="">< "2010-3-01") the following r code makes a box plot of the daily heat consumption by house for the first two months of 2010: ## box plot of daily heat consumption by house boxplot(dsel[ ,c("q1","q2","q3","q4")], xlab="house", ylab="heat consumption") d) make a box plot of the daily heat consumption in january-february 2010 by house. use this plot to describe the empirical distribution of the daily heat con- sumption of the four houses. are the distributions symmetrical or skewed? does there seem to be a difference between the distributions (if so, describe the differ- ence)? are there extreme observations/outliers? the empirical distribution of the daily heat consumptions during january-february 2010 for each of the four houses may also be quantified using summary statistics as in the following table: house number of obs. sample mean sample variance sample std. dev. lower quartile median upper quartile n (x̄) (s2) (s) (q1) (q2) (q3) house 1 house 2 house 3 house 4 r code like the following may be used to fill in the empty cells in the table (see also the remark on p. 12): ## total number of observations for house 1 during jan-feb 2010 ## (doesn't include missing values if there are any) sum(!is.na(dsel$q1)) ## sample mean of daily heat consumption for house 1, jan-feb 2010 mean(dsel$q1, na.rm=true) ## sample variance of daily heat consumption for house 1, jan-feb 2010 var(dsel$q1, na.rm=true) ## etc. ## ## the argument 'na.rm=true' ensures that the statistic is ## computed even in cases where there are missing values. project 7 e) fill in the empty cells in the table above by computing the relevant summary statistics for the daily heat consumption of each of the four houses during the first two months of 2010. which additional information may be gained from the table, compared to the box plot? statistical analysis the purpose of the second part of the project is to perform a simple statistical anal- ysis of the daily heat consumption of the houses. this includes specifying statistical models for heat consumption, estimating the parameters of these models, performing hypothesis tests, and computing confidence intervals. confidence intervals and hypothesis tests the following r code may be used to make a qq-plot. this plot can be used to inves- tigate whether the daily heat consumptions of house 1 may be assumed to be normal distributed: ## qq-plot of daily heat consumption (house 1) qqnorm(dsel$q1) qqline(dsel$q1) f) specify separate statistical models describing the daily heat consumption of each of the four houses (see remark 3.2). estimate the parameters of the four models (mean and standard deviation). carry out model validation (see chapter 3 and section 3.1.8). since, in this case, confidence intervals and hypothesis tests in- volve the distribution of an average, it might also be useful to include the central limit theorem (theorem 3.14) in the discussion. in practice, situations will arise where it is not appropriate to assume that the assump- tions of a model are satisfied. in these cases, one often considers whether a transfor- mation of the data might improve the situation. (see chapter 3.1.9.) note that after a transformation, the interpretation of the results on the original scale changes. in this specific project, however, the intention is not for you to transform the data. https://02323.compute.dtu.dk/filemanager/02323/sharelatex-public/files/book-introstatistics.pdf#sna:rem:model https://02323.compute.dtu.dk/filemanager/02323/sharelatex-public/files/book-introstatistics.pdf#sna:cha https://02323.compute.dtu.dk/filemanager/02323/sharelatex-public/files/book-introstatistics.pdf#sna:sec:opti-models-assumpt https://02323.compute.dtu.dk/filemanager/02323/sharelatex-public/files/book-introstatistics.pdf#sna:the:clt https://02323.compute.dtu.dk/filemanager/02323/sharelatex-public/files/book-introstatistics.pdf#sna:sec:transf-towards-norm project 8 g) state the formula for a 95% confidence interval (ci) for the mean daily heat con- sumption of house 1 during january - february 2010 (see section 3.1.2 of the book). insert values and calculate the interval. compute corresponding intervals for the three other houses and fill in the table below. lower bound of ci upper bound of ci house 1 house 2 house 3 house 4 compare the ci for house 1 computed above with the result of the following r code: ## ci for the mean daily heat consumption of house 1 t.test(dsel$q1, conf.level=0.95)$conf.int it was estimated that the average daily heat consumption of house 1 over a full year is approximately 2.38 kw/day. h) carry out a hypothesis test in order to assess whether the mean daily heat con- sumption of house 1 during january-february 2010 is significantly different from 2.38 kw/day. this can be done by testing the following hypothesis: h0 : µhouse1 = 2.38, h1 : µhouse1 6= 2.38. specify the significance level α, the formula for the test statistic, as well as the dis- tribution of the test statistic (remember to include the degrees of freedom). insert relevant values and compute the test statistic and p-value. write a conclusion in words. if there is a significant difference: "2010-3-01")="" the="" following="" r="" code="" makes="" a="" box="" plot="" of="" the="" daily="" heat="" consumption="" by="" house="" for="" the="" first="" two="" months="" of="" 2010:="" ##="" box="" plot="" of="" daily="" heat="" consumption="" by="" house="" boxplot(dsel[="" ,c("q1","q2","q3","q4")],="" xlab="House" ,="" ylab="Heat consumption" )="" d)="" make="" a="" box="" plot="" of="" the="" daily="" heat="" consumption="" in="" january-february="" 2010="" by="" house.="" use="" this="" plot="" to="" describe="" the="" empirical="" distribution="" of="" the="" daily="" heat="" con-="" sumption="" of="" the="" four="" houses.="" are="" the="" distributions="" symmetrical="" or="" skewed?="" does="" there="" seem="" to="" be="" a="" difference="" between="" the="" distributions="" (if="" so,="" describe="" the="" differ-="" ence)?="" are="" there="" extreme="" observations/outliers?="" the="" empirical="" distribution="" of="" the="" daily="" heat="" consumptions="" during="" january-february="" 2010="" for="" each="" of="" the="" four="" houses="" may="" also="" be="" quantified="" using="" summary="" statistics="" as="" in="" the="" following="" table:="" house="" number="" of="" obs.="" sample="" mean="" sample="" variance="" sample="" std.="" dev.="" lower="" quartile="" median="" upper="" quartile="" n="" (x̄)="" (s2)="" (s)="" (q1)="" (q2)="" (q3)="" house="" 1="" house="" 2="" house="" 3="" house="" 4="" r="" code="" like="" the="" following="" may="" be="" used="" to="" fill="" in="" the="" empty="" cells="" in="" the="" table="" (see="" also="" the="" remark="" on="" p.="" 12):="" ##="" total="" number="" of="" observations="" for="" house="" 1="" during="" jan-feb="" 2010="" ##="" (doesn't="" include="" missing="" values="" if="" there="" are="" any)="" sum(!is.na(dsel$q1))="" ##="" sample="" mean="" of="" daily="" heat="" consumption="" for="" house="" 1,="" jan-feb="" 2010="" mean(dsel$q1,="" na.rm="TRUE)" ##="" sample="" variance="" of="" daily="" heat="" consumption="" for="" house="" 1,="" jan-feb="" 2010="" var(dsel$q1,="" na.rm="TRUE)" ##="" etc.="" ##="" ##="" the="" argument="" 'na.rm="TRUE'" ensures="" that="" the="" statistic="" is="" ##="" computed="" even="" in="" cases="" where="" there="" are="" missing="" values.="" project="" 7="" e)="" fill="" in="" the="" empty="" cells="" in="" the="" table="" above="" by="" computing="" the="" relevant="" summary="" statistics="" for="" the="" daily="" heat="" consumption="" of="" each="" of="" the="" four="" houses="" during="" the="" first="" two="" months="" of="" 2010.="" which="" additional="" information="" may="" be="" gained="" from="" the="" table,="" compared="" to="" the="" box="" plot?="" statistical="" analysis="" the="" purpose="" of="" the="" second="" part="" of="" the="" project="" is="" to="" perform="" a="" simple="" statistical="" anal-="" ysis="" of="" the="" daily="" heat="" consumption="" of="" the="" houses.="" this="" includes="" specifying="" statistical="" models="" for="" heat="" consumption,="" estimating="" the="" parameters="" of="" these="" models,="" performing="" hypothesis="" tests,="" and="" computing="" confidence="" intervals.="" confidence="" intervals="" and="" hypothesis="" tests="" the="" following="" r="" code="" may="" be="" used="" to="" make="" a="" qq-plot.="" this="" plot="" can="" be="" used="" to="" inves-="" tigate="" whether="" the="" daily="" heat="" consumptions="" of="" house="" 1="" may="" be="" assumed="" to="" be="" normal="" distributed:="" ##="" qq-plot="" of="" daily="" heat="" consumption="" (house="" 1)="" qqnorm(dsel$q1)="" qqline(dsel$q1)="" f)="" specify="" separate="" statistical="" models="" describing="" the="" daily="" heat="" consumption="" of="" each="" of="" the="" four="" houses="" (see="" remark="" 3.2).="" estimate="" the="" parameters="" of="" the="" four="" models="" (mean="" and="" standard="" deviation).="" carry="" out="" model="" validation="" (see="" chapter="" 3="" and="" section="" 3.1.8).="" since,="" in="" this="" case,="" confidence="" intervals="" and="" hypothesis="" tests="" in-="" volve="" the="" distribution="" of="" an="" average,="" it="" might="" also="" be="" useful="" to="" include="" the="" central="" limit="" theorem="" (theorem="" 3.14)="" in="" the="" discussion.="" in="" practice,="" situations="" will="" arise="" where="" it="" is="" not="" appropriate="" to="" assume="" that="" the="" assump-="" tions="" of="" a="" model="" are="" satisfied.="" in="" these="" cases,="" one="" often="" considers="" whether="" a="" transfor-="" mation="" of="" the="" data="" might="" improve="" the="" situation.="" (see="" chapter="" 3.1.9.)="" note="" that="" after="" a="" transformation,="" the="" interpretation="" of="" the="" results="" on="" the="" original="" scale="" changes.="" in="" this="" specific="" project,="" however,="" the="" intention="" is="" not="" for="" you="" to="" transform="" the="" data.="" https://02323.compute.dtu.dk/filemanager/02323/sharelatex-public/files/book-introstatistics.pdf#sna:rem:model="" https://02323.compute.dtu.dk/filemanager/02323/sharelatex-public/files/book-introstatistics.pdf#sna:cha="" https://02323.compute.dtu.dk/filemanager/02323/sharelatex-public/files/book-introstatistics.pdf#sna:sec:opti-models-assumpt="" https://02323.compute.dtu.dk/filemanager/02323/sharelatex-public/files/book-introstatistics.pdf#sna:the:clt="" https://02323.compute.dtu.dk/filemanager/02323/sharelatex-public/files/book-introstatistics.pdf#sna:sec:transf-towards-norm="" project="" 8="" g)="" state="" the="" formula="" for="" a="" 95%="" confidence="" interval="" (ci)="" for="" the="" mean="" daily="" heat="" con-="" sumption="" of="" house="" 1="" during="" january="" -="" february="" 2010="" (see="" section="" 3.1.2="" of="" the="" book).="" insert="" values="" and="" calculate="" the="" interval.="" compute="" corresponding="" intervals="" for="" the="" three="" other="" houses="" and="" fill="" in="" the="" table="" below.="" lower="" bound="" of="" ci="" upper="" bound="" of="" ci="" house="" 1="" house="" 2="" house="" 3="" house="" 4="" compare="" the="" ci="" for="" house="" 1="" computed="" above="" with="" the="" result="" of="" the="" following="" r="" code:="" ##="" ci="" for="" the="" mean="" daily="" heat="" consumption="" of="" house="" 1="" t.test(dsel$q1,="" conf.level="0.95)$conf.int" it="" was="" estimated="" that="" the="" average="" daily="" heat="" consumption="" of="" house="" 1="" over="" a="" full="" year="" is="" approximately="" 2.38="" kw/day.="" h)="" carry="" out="" a="" hypothesis="" test="" in="" order="" to="" assess="" whether="" the="" mean="" daily="" heat="" con-="" sumption="" of="" house="" 1="" during="" january-february="" 2010="" is="" significantly="" different="" from="" 2.38="" kw/day.="" this="" can="" be="" done="" by="" testing="" the="" following="" hypothesis:="" h0="" :="" µhouse1="2.38," h1="" :="" µhouse1="" 6="2.38." specify="" the="" significance="" level="" α,="" the="" formula="" for="" the="" test="" statistic,="" as="" well="" as="" the="" dis-="" tribution="" of="" the="" test="" statistic="" (remember="" to="" include="" the="" degrees="" of="" freedom).="" insert="" relevant="" values="" and="" compute="" the="" test="" statistic="" and="" p-value.="" write="" a="" conclusion="" in="" words.="" if="" there="" is="" a="" significant="">
Mar 02, 2024
SOLUTION.PDF

Get Answer To This Question

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here