For this assignment, you choose a dataset and perform EDA and exploratory data visualization to better understand the data, examine some initial questions / hypotheses that you have, and report on...

1 answer below »

For this assignment, you choose a dataset and perform EDA and exploratory data visualization to better understand the data, examine some initial questions / hypotheses that you have, and report on these questions and hypotheses at the close of your analysis. The deliverable for the assignment is a document containing visualizations that are captioned, and that conveys information and insights (our textbook author would say “pearls”) that you learned from analysis.



DO NOT USE A DATASET FROM KAGGLE OR OTHER ONLINE CO-WORKING PLATFORMS. You need to do your own analysis and chart your own path!


Choose a dataset that will have the depth and breadth to allow you to perform a rigorous analysis, and practice using the tools you’ve learned about in this course. Below are some suggestions to get you started, although you are free to use others. If you are wondering if a dataset is too simple or inappropriate for this assignment, send me a link to the dataset and we can talk about it.




Tools to Use for The Assignment


You will use R + Rstudio + ggplot2



The Assignment Deliverable



  • You will create a document using a word processor, such as Microsoft Word, or Google Docs or similar, that will contain visualizations you’ve created and the writing and captions for the analysis.

  • Before you begin your analysis, write some questions/hypotheses about your data that you wish to answer. You can put these in your document in bullet points or paragraph form. You should writeat least threequestions at a minimum; more are probably a good idea.

  • Now, begin your wrangling and exploratory visualization.First, gain an overview of your data: how much is there? How much is missing? Start with simple counting and then move on to univariate and basic multivariate data charts/plots. Don’t forget to “sanity check” the data – if something looks weird, make note!Second,start to refine your visualization. This is when you start to focus on certain variables or relationships and explore the further. You may also want to start doing small things like changing sort order or using aesthetic changes *just for informational purposes.* You’re not making explanatory visualization quality charts, just things that help *you.* You should repeat this process to examine all of your questions, and the consider reporting on other interesting things you find.


  • You should create at least 10 visualizations and each should have a caption explaining the visualization, and why it matters or what is interesting about it, and/or any insights you have as a result of it.

  • The visualizations are screen shots from the tool you use. Try to get high resolution screen shots so that the image quality is retained when you copy the photo into the document. (In Tableau, use Worksheet -> Export -> Image…)

  • The beginning of your document should explain a little bit about your data (3-5 sentences). The end of your document should summarize your findings (6 sentences at least).


  • Do not put everything you create or ran in R into this document.You should only put in things that are interesting and insightful, not everything.

Answered Same DayMar 04, 2022

Answer To: For this assignment, you choose a dataset and perform EDA and exploratory data visualization to...

Suraj answered on Mar 05 2022
105 Votes
EDA-R Programming
Data set: We are working on the labor force unemployment rate from year 1974 to
2020. There are total 7 different variables in the data set. Following are some hypothetical questions based on the data set. The questions are given as follows:
Question: Is there any difference between the statewide unemployed rate and National rate. If the difference then what kind of difference between them.
Question: Whether the unemployment rate increases or decreases over the years that is from 1974 to 2020.
Question: Whether the average labor force Increases over the years from 52000.
Now, first we will check the distribution of the variables on which we will do our...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here