Assignment Instructions: Your final document should be an 1) RMarkdown file and 2) HTML file knitted from R Markdown. In answering each of the following questions please include a) the question as a...

1 answer below »
Hello,My deadline for this homework just got extended to Monday, 19th of April, 9pm EST.Could you kindly help me with this homework?I have attached instructions for an R homework I need help with again. It has to do with Web-scrape and SQL.This HW needs to bedone in an 1) R markdown file and 2) knitted as an HTML too.My budget is $110 for thisKind regards,RK


Assignment Instructions: Your final document should be an 1) RMarkdown file and 2) HTML file knitted from R Markdown. In answering each of the following questions please include a) the question as a header in your Rmarkdown report, b) then include the raw code that you used to generate your results, and c) the top ten rows/values/or elements of the resulting dataframe, vector, or list created in your results (unless a lesser amount is requested). Feel free to refer to any R scripts provided throughout the course to answer the following questions: 1. Web-scrape all table data from the following web-page and build a data frame in R. Limit your final table to include columns for "Package", "Item", "Title", "Rows", and "Cols". Print the first five rows of the table. http://vincentarelbundock.github.io/Rdatasets/datasets.html 2. Web-scrape the full links to every CSV file listed in the CSV column of the web- page. Add a new column to your data frame that includes these links. Name the column "CSV Links". Print the first five rows of the table. Note: You may need to use string operations to recreate the full link to the csv files after they are scraped. 3. Use R code/functions to search the "Title" column to return the row of data with the title, "Violent Crime Rates by US State" 4. Import the csv file into R using the full link listed in the "CSV Links" column for this dataset. Create a new variable called "Violent_crime" that adds together data for all columns in the dataset that contain violent crime data (i.e.-add data from assault, murder, and rape columns together in new column called "Violent_crime"). 5. Using one or more of the following example datasets that come preloaded in R (see ?data(state) for more information), add state region codes, state divisions, and all of the variables from the "state.x77" dataset to your Violent Crime Rates data frame. Print the first five lines of your new dataset. • state.abb, state.area, state.center, state.division, state.name, state.region Note 1: state names can be extracted to new columns for joins from these datasets by using row.names() if needed. Note 2: Some of these state datasets may be matrix objects or lists. Be sure to convert them to data frames before joining them if needed. 6. Calculate the average for each numeric column in the dataset. 7. Group the data by region and then calculate the average for each numeric column in the dataset per region. Which region had the highest population (data is from the late 1970s)? Which region had the most violent crime? http://vincentarelbundock.github.io/Rdatasets/datasets.html 8. Group the data by division and then calculate the average for each numeric column in the dataset per division. Which division had the highest population (data is from the late 1970s)? Which division had the most violent crime? 9. What SQL statement would you write to return two columns denoting income and Illiteracy in your state data? 10. What SQL statement would you write to return two columns denoting income and Illiteracy in your state data and sort the data from the highest to lowest income values? 11. What SQL statement would you write to return two columns denoting income and Illiteracy in your state data and sort the data from the highest to lowest income values and limit the data to incomes at or higher than 5000? 12. What SQL statement would you write to return two columns denoting income and Illiteracy in your state data and sort the data from the highest to lowest income values and limit the data to incomes at or higher than 5000 and return the top 10 rows only? 13. Create a new data frame that includes two columns from your state data denoting state names and violent crimes. Spread the state names to 50 unique columns with a single row that includes the violent crime data per state. Print the first five columns of the new dataset. 14. Take the dataset from question 13 and use a function from the apply family of functions to return a named list with each name denoting a state and each value per state indicating the square root of the value for violent crimes. 15. Subset the list you created in question 14 to extract values for Texas and New York.
Answered 1 days AfterApr 16, 2021

Answer To: Assignment Instructions: Your final document should be an 1) RMarkdown file and 2) HTML file knitted...

Abr Writing answered on Apr 18 2021
141 Votes
assignment.html
Assignment
18/04/2021
Question 1
Getting the HTML response from the given web page using rvest package.
web.link <- "http://vincentarelbundock.github.io/Rdatasets/datasets.html"
web.page <- read_html(web.link)
Extracting the table as data frame with mentioned columns
data <- web.page %>%
html_nodes("table") %>%
.[[2]] %>%
html_table(fill = TRUE) %>%
as.data.frame %>%
select(Package, Item, Title, Rows, Cols)
Printing the first five rows of the table
head(data, n = 5)
Package Item
1 AER Affairs
2 AER ArgentinaCPI
3 AER
BankWages
4 AER BenderlyZwick
5 AER BondYield
Title Rows Cols
1 Fair's Extramarital Affairs Data 601 9
2 Consumer Price Index in Argentina 80 2
3 Bank Wages 474 4
4 Benderly and Zwick Data: Inflation, Growth and Stock Returns 31 5
5 Bond Yield Data 60 2
Question 2
Extracting all the links for all the CSV files
links <- web.page %>%
html_nodes("a") %>%
html_attr('href') %>%
as.data.frame %>%
add_rownames %>%
filter(rowname %in% seq(1,2*nrow(data),2)) %>%
select(-one_of("rowname"))
Warning: `add_rownames()` is deprecated as of dplyr 1.0.0.
Please use `tibble::rownames_to_column()` instead.
This warning is displayed once every 8 hours.
Call `lifecycle::last_warnings()` to see where this warning was generated.
Adding a new column CSV Links to the data.
data$`CSV Links` <- links$.
Printing the first five rows of the table
head(data, n = 5)
Package Item
1 AER Affairs
2 AER ArgentinaCPI
3 AER BankWages
4 AER BenderlyZwick
5 AER BondYield
Title Rows Cols
1 Fair's Extramarital Affairs Data 601 9
2 Consumer Price Index in Argentina 80 2
3 Bank Wages 474 4
4 Benderly and Zwick Data: Inflation, Growth and Stock Returns 31 5
5 Bond Yield Data 60 2
CSV Links
1 https://vincentarelbundock.github.io/Rdatasets/csv/AER/Affairs.csv
2 https://vincentarelbundock.github.io/Rdatasets/csv/AER/ArgentinaCPI.csv
3 https://vincentarelbundock.github.io/Rdatasets/csv/AER/BankWages.csv
4 https://vincentarelbundock.github.io/Rdatasets/csv/AER/BenderlyZwick.csv
5 https://vincentarelbundock.github.io/Rdatasets/csv/AER/BondYield.csv
Question 3
violent <- data[
data$Title == "Violent Crime Rates by US State",
]
violent
Package Item Title Rows Cols
445 datasets USArrests Violent Crime Rates by US State 50 4
CSV Links
445 https://vincentarelbundock.github.io/Rdatasets/csv/datasets/USArrests.csv
Question 4
Reading the data from the CSV Link.
violent_data <- read.csv(violent$`CSV Links`)
violent_data$Violent_crime <-
violent_data$Assault +
violent_data$Murder +
violent_data$Rape
Question 5
data(state)
state.x77 <- state.x77 %>%
as.data.frame
state.x77$X <- rownames(state.x77)
state.x77$state.abb <- state.abb
state.x77$state.area <- state.area
state.x77$state.center.x <- state.center$x
state.x77$state.center.y <- state.center$y
state.x77$state.division <- state.division
state.x77$state.name <- state.name
state.x77$state.region <- state.region
Merging two Data Frames
violent_data <- merge(
violent_data,
state.x77,
by = "X"
)
Printing the first five rows of the table
head(violent_data, n = 5)
X Murder.x Assault UrbanPop Rape Violent_crime Population Income
1 Alabama 13.2 236 58 21.2 270.4 3615 3624
2 Alaska 10.0 263 48 44.5 317.5 365 6315
3 Arizona 8.1 294 80 31.0 333.1 2212 4530
4 Arkansas 8.8 190 50 19.5 218.3 2110 3378
5 California 9.0 276 91 40.6 325.6 21198 5114
Illiteracy Life Exp Murder.y HS Grad Frost Area state.abb state.area
1 2.1 69.05 15.1 41.3 20 50708 AL 51609
2 1.5 69.31 11.3 66.7 152 566432 AK 589757
3 1.8 70.55 7.8 58.1 15 113417 AZ 113909
4 1.9 70.66 10.1 39.9 65 51945 AR 53104
5 1.1 71.71 10.3 62.6 20 156361 CA 158693
state.center.x state.center.y state.division state.name state.region
1 -86.7509 32.5901 East South Central Alabama South
2 -127.2500 49.2500 Pacific Alaska West
3 -111.6250 34.2192 Mountain Arizona West
4 -92.2992 34.7336 West South Central Arkansas South
5 -119.7730 36.5341 Pacific California West
Question 6
violent_data %>%
select(where(is.numeric)) %>%
summarise_all(mean) %>%
t %>%
as.data.frame %>%
rename(Mean = V1)
Mean
Murder.x 7.78800
Assault 170.76000
UrbanPop 65.54000
Rape 21.23200
Violent_crime 199.78000
Population 4246.42000
Income 4435.80000
Illiteracy 1.17000
Life Exp 70.87860
Murder.y 7.37800
HS Grad 53.10800
Frost 104.46000
Area 70735.88000
state.area 72367.98000
state.center.x -92.46414
state.center.y 39.41074
Question 7
violent_data %>%
group_by(state.region) %>%
select(where(is.numeric), state.region) %>%
summarise_all(mean) %>%
...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here