One is the HW4 rdm file another pdf. please see attachSTAT 4410/8416 Homework 4 STAT 4410/8416...

Question

One is the HW4 rdm file another pdf. please see attachSTAT 4410/8416 Homework 4 STAT 4410/8416 Homework 4 lastName firstName Due on Nov 8, 2019 1. Exploring XML data; In this problem we will read the xml data. For this we will obtain a xml data called olive oils from the link http://www.ggobi.org/book/data/olive.xml. Please follow the directions in each step and provide your codes and output. a. Parse the xml data from the above link and store in a object called olive. Obtain the root of the xml file and display its name. b. Examine the actual file by going to the link above and identify the path of categorical variables in the xml tree. Use that path to obtain the categorical variable names. Please keep the names, not nick names and store them in cvNames. Display cvNames. c. Now examine the file by going to the link and identify the path of real variables in the xml tree. Use that path to obtain the real variable names. Please keep the names, not nick names and store them in rvNames. Display rvNames. d. Notice the path for the data in xml file. Use that path to obtain the data and store the data in a data frame called oliveDat. Change the column names as you have obtained the column names. Display some data. e. Generate a plot of your choice to display any feature of oliveDat data. Notice that the column names are different fatty acids. The values are % of fatty acids found in the Italian olive oils coming from different regions and areas. f. Explain what these two lines of codes are doing. r c. Hw1 solution contains the answer of what is data science. The answer has three paragraphs. Write the three paragraphs of text about data science in three different paragraph tags. You can copy the text from hw1 solution. d. Write “What we learnt from hw1” in second heading under tage. Copy all the points we learnt in hw1 solution. List all the points under ordered list tag. Notice that each item of the list should be inside list item tag. f. Now we want to make the text beautiful. For this we would write some CSS codes in between  tag under . For this please refer to online (year 2014) lecture 15 slide 8. First change the fonts of the body tag to Helvetica Neue. g. For the paragraph that contains the definition of data science, give an attribute id='dfn' and in CSS change the color of ‘dfn’ to white, background-color to olive and font to be bold. h. For other paragraphs, give an attribute class='cls' and in CSS change the color of ‘cls’ to green. i. Write CSS so that color of h1,h2 becomes orange. j. Write javaScripts codes so that onClick on h1 header, it shows a message ‘Its about data science’. 5. Boston hubway data; This question will explore Boston hubway data. Please carefully answer each question below including your codes and results. a. Obtain the compressed data, bicycle-rents.csv.zip, from Canvas and display few data rows. b. For each day, count the number of bikes rented for that date and show the data in a time series plot. c. Based on the rent date column, create two new columns weekDay and hourDay which represent week day name and hour of the day respectively. Store the data in myDat and display few records of the data. Hint: For weekday use function wday(). d. Summarize myDat by weekDay based on the number of rents for each weekDay and store the data in weekDat. Display some data. e. Create a suitable plot of the data you stored in weekDay so that it displays number of bike rents for each week day. f. Now we want to investigate what happens in each day. Summarize myDat again but this time by weekDay and hourDay and obtain the number of rents. Store the data in hourDat and Display some data. g. The dataframe hourDat is now ready for plotting. Generate line plots showing number of bike rents vs hour of the day and colored by weekDay. 6. Bonus for undergraduate (3 points) mandatory for graduate students: The following link contains the complete texts of Romeo and Juliet written by Shakespeare. Read the complete text and generate a plot similar to Romeo and Juliet case study in online(year 2014) lecture 13 (last plot). http://shakespeare.mit.edu/romeo_juliet/full.html 2 http://shakespeare.mit.edu/romeo_juliet/full.html 7. Bonus (2 points) question for all : In the United States, a Consumer Expenditure Survey (CE) is conducted each year to collect data on expenditures, income, and demographics. These data are available as public-use microdata (PUMD) files in the following link. Download the data for the year 2016 and explore. Provide some plots and numerical summary that creates some interest about this data. https://www.bls.gov/cex/pumd.htm 3 https://www.bls.gov/cex/pumd.htm

Kshitij · Accepted Answer

STAT 4410/8416 Homework 4
STAT 4410/8416 Homework 4 
lastName firstName 
Due on Nov 8, 2019 
1. Exploring XML data; In this problem we will read the xml data. For this we will obtain 
a xml data called olive oils from the link http://www.ggobi.org/book/data/olive.xml. 
Please follow the directions in each step and provide your codes and output. 
a. Parse the xml data from the above link and store in a object called olive. Obtain the 
root of the xml file and display its name. 
library("XML") 
library("xml2",lib.loc="~/R/win-library/3.4") 
library("dplyr") 
library("ggplot2") 
olive%select(name,leapyears) 
## # A tibble: 11 x 2 
##    name       leapyears 
##               
##  1 Eisenhower         2 
##  2 Kennedy            0 
##  3 Johnson            2 
##  4 Nixon              1 
##  5 Ford               1 
##  6 Carter             1 
##  7 Reagan             2 
##  8 Bush               1 
##  9 Clinton            2 
## 10 Bush2              2 
## 11 Obama              2 
b. Consider the Teams dataset from the Luhman package that provides a series of baseball 
statistics over a number of years. Note that the “H” column refers to number of home 
runs. The following outlines a procedure to follow to determine the number of home 
runs that occurred during each presidents’ (adjusted) time in office. i. First, filter the 
Teams dataset to only include years between 1953 and 2016. 
Teams % filter(yearID %in% 1953:2016) 
ii. Next, we will partition the rows of the presidential dataset by only considering the 
year of each president’s start and end dates with the conditions that 1) if a president’s 
term did NOT start in January, then we will not include that year in their time in office, 
and 2) if a president’s term ended in January, then that ending year will also not be 
included. For example, Johnson will be considered as having a starting year of 1964 
and an ending year of 1968. 
checkstartyear%group_by(yearID)%>%summarise(homeruns=sum(H)) 
 
homecount% filter(yearID %in% x:y) 
  hc=sum(temp$homeruns) 
  return(hc) 
} 
 
psd$homeruns 
## 1 Bush2 
4. Creating HTML Page; In this problem we would like to create a basic HTML page. 
Please follow each of the steps below and finally submit your HTML file on Canvas. 
Please note that you don’t need to answer these questions here in the .Rmd file. 
a. Open a notepad or any plain text editor.

STAT 4410/8416 Homework 4 STAT 4410/8416 Homework 4 lastName firstName Due on Nov 8, 2019 1. Exploring XML data; In this problem we will read the xml data. For this we will obtain a xml data called...

Answer To: STAT 4410/8416 Homework 4 STAT 4410/8416 Homework 4 lastName firstName Due on Nov 8, 2019 1....

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment