You will use R to analyse data collected that contains information on the Means of Travel, Time and...

Question

You will use R to analyse data collected that contains information on the Means of Travel, Time and Distance to Work, School or College.

You will be assigned a county for the purposes of answering the questions listed below.

Steps you will complete include:

· Import data into R

· Using R; Explore the data, remove irrelevant columns and manipulate the remaining heading names to be more clear/intuitive.

· Check for missing, outlier or erroneous values and take appropriate action

· Write R code to perform data analysis to support your answers the questions listed below

· Include a brief reflection section (

· any challenges met, any assumptions made, and any potential issues/weaknesses associated

· with using the dataset for decision making purposes.

Questions

1. What is the most popular mode of transport nationally?

2. How does this (Answer to Q1) compare to the most popular mode of transport in your

assigned county?

3.What differences are evident between the choice of transportation in the cities compared to the other regions?

4.What proportion of commuters leave home outside of the 8-9am rush hour?

5.Are commuters in your assigned county likely to travel for longer than 45 minutes each

morning?

6.How does this (Answer to Q5) compare to other counties in the same NUTS III region?

7.The residents of which five counties experience the longest commute times?

8.What proportion of cars used in the morning commute contain only one person?

9.Which Electoral Division within each Planning Region do you propose should be prioritised

for investment in public transportation?

Use the assigned county of Carlow.

Some packages that can be used are:

· Dplyr

· Stringr

· lubridate

· tidyr

Pritam · Accepted Answer

data = read.csv("transport.csv", header = T)
str(data)
# missing values:
tb = colMeans(is.na(data))
which(tb >= 0.3)
sum(is.na(data))
colSums(is.na(data))
# So, there are onl two missing value in the overall data and all of the variables
# have less than 1% missing values. So, no need to worry about the missing values
# Question 1:
t1 = sapply(data[,c(10:22)], sum)
which.max(t1)
# As it seems that private transport comb is the most popular transport mode in the
# contry.
library(dplyr)
library(stringr)
# Question 2:
d2 = filter(data, County == "Carlow")
t2 = sapply(d2[,c(10:22)], sum)
which.max(t2)
# The same transport seems to be the most popular in the county of Carlow also.
d3 = data[,c(6,10:22)]
library(ggplot2)
g1 = ggplot(d3, aes(County, 
Population_Aged_5_Over_By_Means_Of_Travel_To_Work_School_College_On_Foot_2011))+
  geom_boxplot()
g1+coord_flip()
g2 = ggplot(d3, aes(County, 
d3$Population_Aged_5_Over_By_Means_Of_Travel_To_Work_School_College_Bicycle_2011))+
  geom_boxplot()
g2+coord_flip()
g3 = ggplot(d3, aes(County, 
d3$Population_Aged_5_Over_By_Means_Of_Travel_To_Work_School_College_Bus_Minibus_Coach_2011 ))+
  geom_boxplot()
g3+coord_flip()
g4 = ggplot(d3, aes(County, 
d3$Population_Aged_5_Over_By_Means_Of_Travel_To_Work_School_College_Train_Dart_Luas_2011))+
  geom_boxplot()
g4+coord_flip()
g5 = ggplot(d3, aes(County, 
d3$Population_Aged_5_Over_By_Means_Of_Travel_To_Work_School_College_Motorcycle_Scooter_2011))+
  geom_boxplot()
g5+coord_flip()
g6 = ggplot(d3, aes(County, 
d3$Population_Aged_5_Over_By_Means_Of_Travel_To_Work_School_College_Car_Driver_2011))+
  geom_boxplot()
g6+coord_flip()
g7 = ggplot(d3, aes(County, 
d3$Population_Aged_5_Over_By_Means_Of_Travel_To_Work_School_College_Car_Passenger_2011))+
  geom_boxplot()
g7+coord_flip()
# percentage of people leaving home between 8-9
(sum(data[,27])+sum(data[,28]))/sum(data[,32])
# proportion of people willing to travel for more than 45 mins
(sum(d2[,36])+sum(d2[,37])+sum(d2[,38]))/sum(d2[,40])
# As evident from the result, only 12% of the population in my county are 
# willing to travel for more than 45 mins
table(data$County)
table(data$NUTS_III)
df1 = filter(data, NUTS_III == "Border")
df1 =  data %>%
  filter(NUTS_III == "Border") %>%
  select(County, Population_Aged_5_Over_By_Journey_Time_To_Work_School_College_One_And_Half_Hours_And_Over_2011,
         Population_Aged_5_Over_By_Journey_Time_To_Work_School_College_One_Hour_To_Under_One_Hour_Thirty_Mins_2011,
         Population_Aged_5_Over_By_Journey_Time_To_Work_School_College_Three_Quarter_Hours_To_Under_One_Hour_2011,
         Population_Aged_5_Over_By_Journey_Time_To_Work_School_College_Total_2011)
df1 %>%
  group_by(County) %>%
  summarize(over_1_30_hr = sum(Population_Aged_5_Over_By_Journey_Time_To_Work_School_College_One_And_Half_Hours_And_Over_2011),
            one_1_30_hr = sum(Population_Aged_5_Over_By_Journey_Time_To_Work_School_College_One_Hour_To_Under_One_Hour_Thirty_Mins_2011),
            fourt_five_1_hr = sum(Population_Aged_5_Over_By_Journey_Time_To_Work_School_College_Three_Quarter_Hours_To_Under_One_Hour_2011),

You will use R to analyse data collected that contains information on the Means of Travel, Time and Distance to Work, School or College. You will be assigned a county for the purposes of answering the...

Answer To: You will use R to analyse data collected that contains information on the Means of Travel, Time and...

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment