Using the dataset (avocado.csv) provided, use R to (i)Find relationships and interesting insights into the data provided through hypothesis exploration. (ii)Show what you’ve found using plots and data...

1 answer below »
Using the dataset (avocado.csv) provided, use R to
(i)Find relationships and interesting insights into the data provided through hypothesis exploration.
(ii)Show what you’ve found using plots and data visualisations.
(iii)Use statistical techniques to quantitatively demonstrate the probability that the relationships you’ve found are real using hypothesis confirmation.


So, you take what you’ve found (from exploration) and shown (through visualisation) and confirm it with statistics (confirmation).
The document must be written in R markdown so that you can embed your code and text and output all in one document


R Group Assignment Instructions R Group Assignment Instructions David McClelland 05/07/2020 R Project Instructions Form groups from between 2 to 5 students. Select a dataset from the ten datasets that will be provided. Then explore, visualise and test. Some of the datasets are large and complex, some are small and simple. I would encourage the groups of 4 or 5 to work with the bigger datasets. Multiple groups will be using the same datasets so do not plagiarise each others work (code or writing). This project will be judged on three core research skills. How to find relationships and interesting insights into the data provided through hypothesis exploration. How to show what you’ve found using plots and data visualisations. Using statistical techniques to quantitatively demonstrate the probability that the relationships you’ve found are real using hypothesis confirmation. So, you take what you’ve found (from exploration) and shown (through visualisation) and confirm it with statistics (confirmation). These three steps often overlap with each other but please try to include three separate headings in your project for each section. This all needs to be combined into an easy to read and fun document (I should be able to read through it in 5-10min). The document must be written in R markdown so that you can embed your code and text and output all in one document. There will be a lesson and guide on how to write in R markdown released soon. Your code needs to be shown at every step so that I can run your code using the original data and arrive at the same conclusion. Please feel free to ask for help from your peers and myself on Stack Overflow. But remember to explain your specific problem clearly and show what you’ve tried and what’s not working. Preferably, you should be able to provide a reproducible example (mini version of your problem that can easily be run without accessing your data) but we can work towards that. Hot tip: Make it easy to read, good to look at and interesting Dates The data for the assignment will be released on Friday 10th July. The assignment will be due at the end of the month Friday 31st July (3 weeks later). Submit your group members to me by Thursday evening 1 R Project Instructions Dates Avocado Prices Context It is a well known fact that Millenials LOVE Avocado Toast. It's also a well known fact that all Millenials live in their parents basements. Clearly, they aren't buying home because they are buying too much Avocado Toast! But maybe there's hope… if a Millenial could find a city with cheap avocados, they could live out the Millenial American Dream. Content This data was downloaded from the Hass Avocado Board website in May of 2018 & compiled into a single CSV. The table below represents weekly 2018 retail scan data for National retail volume (units) and price. Retail scan data comes directly from retailers’ cash registers based on actual retail sales of Hass avocados. Starting in 2013, the table below reflects an expanded, multi-outlet retail data set. Multi-outlet reporting includes an aggregation of the following channels: grocery, mass, club, drug, dollar and military. The Average Price (of avocados) in the table reflects a per unit (per avocado) cost, even when multiple units (avocados) are sold in bags. The Product Lookup codes (PLU’s) in the table are only for Hass avocados. Other varieties of avocados (e.g. greenskins) are not included in this table. Some relevant columns in the dataset: · Date - The date of the observation · AveragePrice - the average price of a single avocado · type - conventional or organic · year - the year · Region - the city or region of the observation · Total Volume - Total number of avocados sold · 4046 - Total number of avocados with PLU 4046 sold · 4225 - Total number of avocados with PLU 4225 sold · 4770 - Total number of avocados with PLU 4770 sold
Answered Same DayJul 27, 2021

Answer To: Using the dataset (avocado.csv) provided, use R to (i)Find relationships and interesting insights...

Bezawada Arun answered on Jul 29 2021
136 Votes
---
title: "Untitled"
author: "user"
date: "29/07/2020"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
# installing the necessary packages
install.packages("dplyr")
insta
ll.packages("tidyverse")
install.packages("tidyr")
install.packages("ggplot2")
install.packages("lubridate")
install.packages("factoextra")
# calling the libraries
library(dplyr)
library(tidyverse)
library(tidyr)
library(ggplot2)
library(lubridate)
library(factoextra)
library(boot)
library(tibbletime)
```
```{rsetup, include=FALSE}
# importing the avocado dataset
avocado <- read.csv("avocado.csv")
# Checking the levels of avocado
levels(avocado$type)
# Displaying the top six records in the data
head(avocado)
# Viewing the structure of dataset
str(avocado)
# Checking the null values in the data
sum(is.na(avocado))
```
```{r avocado}
# summary statistics
summary(avocado)
```
```{rsetup,include=FALSE}
# Filter by type of the avocados
organic <- avocado %>% select(Date, AveragePrice, type, Total.Volume) %>% filter(type == "organic")
conventional <- avocado %>% select(Date, AveragePrice, type, Total.Volume) %>% filter(type == "conventional")
# Organic avocados
organic <- as_tbl_time(organic, index=Date)
organic <- as_period(organic, '1 month')
# Conventional avocados
conventional <- as_tbl_time(conventional, index=Date)
conventional <- as_period(conventional, '1 month')
```
## Including Plots
```{r pressure, echo=FALSE}
# plotting the conventional and organic type of avocados
options(repr.plot.width=8, repr.plot.height=4)
ggplot(df, aes(x=AveragePrice, fill=type)) + geom_density() + facet_wrap(~type) + theme_minimal() +
theme(plot.title=element_text(hjust=0.5), legend.position="bottom") + labs(title="Avocado Price by Type") + scale_fill_brewer(palette="Set1")
# Displaying the volume type
volume <- avocado %>% group_by(type) %>% summarise(avg.vol=mean(Total.Volume)) %>% mutate(pct=prop.table(avg.vol) * 100)
print(volume)
```
```{rplots, include=FALSE}
# Changing the Date column from factor to Date
avocado$Date <- as.Date(avocado$Date, "%Y-%m-%d")
# Checking the class of Date column
class(avocado$Date)
# Sort the...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here