You should use RStudio (probably with ggplot, tidyr, and dplyr) for this. We will use a dataset from the UC-Irvine Machine Learning Data Repository. It’s just a place to keep cool datasets. You might...


You should use RStudio (probably with ggplot, tidyr, and dplyr) for this.




We will use a dataset from the UC-Irvine Machine Learning Data Repository.


It’s just a place to keep cool datasets. You might want to check it out sometime.




Wine quality dataset description:
http://archive.ics.uci.edu/ml/datasets/Wine+Quality


12 variables, 1599 rows of Red Wine:
http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv




Be sure to do these things for the big dataset (smithj should be your name and first initial, not smith, unless your name is J Smith):




  • Save the data into your own Y: Drive or GoogleDrive Space, using:
    write.csv(winequality_red, file="RedWine.csv")




  • (optional) Make a new script file for your homework called smithj-220hw5.R




  • Make an RMarkdown file called smithj-220hw5.rmd




The final column, “quality” is a 1-10 variable, where 10 means a very high quality wine (1 is lousy).


This “quality” variable will be your “y” response variable for this assignment.






  1. Import the dataset into RStudio using readr or the Import Dataset tool.
    (Notice that the UCI file uses semicolons instead of commas as the delimiter).




  2. Using the “pairs” command, look at all the variables. Eek.




  3. Since we really only care about quality, let’s just look at that one against the others:




winequality_red %>%



gather(-quality, key = "var", value = "value") %>%



ggplot(aes(x = value, y = quality, color= "density")) +



geom_point() +#Would geom_jitter() be a better choice?



stat_smooth(method="lm") +#Might loess work better here?



facet_wrap(~ var, scales = "free")




  1. Perhaps “alcohol” is be the best candidate. Make a scatterplot of the two variables




  2. Make a simple regression predicting quality from density. Spoiler: lm(y~x)




  3. From the simple display, what is your slope and intercept?




  4. Using “summary,” what about r^2? Which variable is best?




  5. Repeat your model using pH and density as the explanatory variable for quality.




  6. Explore with a few more promising candidates, using lm and graphs




  7. In RMarkdown, write some text around your analysis to make this like a report to someone who was trying to pick a great wine for a large party, perhaps a wedding.




  8. Knit it into a .pdf file (probably as a .doc or .html first) and submit just the .pdf



Apr 02, 2021
SOLUTION.PDF

Get Answer To This Question

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here