any data set for my R programming assignment which contains at least 700 or less than 6000 records and 8 attributes also the data set should be uncleaned. i was to load this data set on the R...

any data set for my R programming assignment which contains at least 700 or less than 6000 records and 8 attributes also the data set should be uncleaned. i was to load this data set on the R programming console read the file clean the file. i also need the programming codes for cleaning the data set.Also i need help Summarizing the data in a table. with the help of Graphs that help visualize the data. These graphs be bar charts, histograms, pie charts, etc. everything should be done on R console.



Page 1 of 4 ALY6000: Data Analysis Overview and Rationale Being able to ask appropriate questions of data is an important part of the work of data analytics. It is also critical to be able to interpret the results of the analysis. This assignment is intended to familiarize you with the data sets and to get you thinking about key business questions you can answer from this data. Module Outcomes This assignment is directly linked to the following learning: • Investigate impacts of big data on industry • Describe the evolution of big data • Analyze data to complete a data rich and visually appealing report Assignment Instructions Find one dataset that is of interest to you. Some places to find datasets include: • The R Project for Statistical Computing • Kaggle • U.S. Government’s Open Data • or your own data. Your data set should have at least 700, but less than 6000, records and eight (8) attributes and the data should not be “clean”. Part of this assignment will require you to clean the data yourself. Please see any accompanying Data Dictionary to understand the fields and values in your chosen dataset is available. The assignment has three parts. Part I Please review the Data Dictionary document as you review the datasets if one is provided. In order to understand the data we first need to run some descriptive statistics on the data set. Start by providing the following for each appropriate variable in the dataset: 1. Summarize the data in a table. https://www.r-project.org/ https://www.kaggle.com/ https://www.data.gov/ Page 2 of 4 2. Graphs that help visualize the data. These can be bar charts, histograms, pie charts, etc. Be sure the chosen graph best represents the information you want to highlight. 3. Explain the story the data is telling you. • What business question do your descriptive analyses answer? Provide a brief discussion of the findings. • If there are any unusual values, discuss them. If data values are “out of range,” clean the data as needed. Delete the out of range values and run the analysis again. • If you remove out of range values for any of the variables, present both the analysis with the out of range values and the analysis without the out of range value(s). • Identify additional questions that the data is leading you to ask. What new attributes are needed to answer those questions? Part II Create new attributes based on the data and the questions you identified in Part 1. For your data set, compute differences between appropriate variable values and create a new variable. For examples, if the data shows yearly sales for different years, by month, calculate the increase or decrease in sales from month to month. Then, compute the mean and median for each of the variables you have computed. Part III Now that you have worked with the data, what is the data saying to you? What have you learned about the attributes? What are some follow-up questions you would like to have answered? Identify 3-5 observations or follow-up questions that you have. What to Submit A presentation slide deck (5-8 slides not including Title and reference list slide) with your findings. Submit a single file with the following filename: _FinalProject.pptx Format Your presentation must: • Tell the story of your data through the use of descriptive statistics and visualizations. o Remember your visualizations are the primary vehicle you'll use to convey information in an analytics presentation. o Include very concise with written information that is highly connected to the points made in the visualizations as a Notes section on each slide. • Properly cite all sources using APA citation rules. Page 3 of 4 Appendix Assignment Part I Section Example Business Question: What is the distribution of the status of the 2017 GxP Audits? Analysis: Descriptives Table Audit Status Frequency Percent Valid Percent Valid Closed 19 19.8 19.8 Completed 4 4.2 4.2 In Progress 18 18.8 18.8 Scheduled 11 11.5 11.5 Pending 14 14.6 14.6 Not In Scope 26 27.1 27.1 Cancelled 4 4.2 4.2 Total 96 100.0 100.0 Audit Status Count Page 4 of 4 Audit Status Percentages Discussion: The data file includes information on 96 audits in 2017 for GxP areas. It is unclear if the data file includes all the known GxP audits in 2017 or if it only includes a subset. A large percentage of all GxP Audits (27.1%) are not in scope. 19.8% of audits are closed and 4.2% of audits are completed. It is unclear what the difference between “closed” and “completed” audits is. We should perhaps ask the client. Do we really need two distinct values? 18.8% of the audits are in progress, 11.5% are scheduled and 14.6% are pending. For the pending audits, the dates of the audit process have not been established. 4.2% of the audits were canceled. It may be interesting to have a notes field where the reasons for cancelation are noted.
Oct 30, 2021
SOLUTION.PDF

Get Answer To This Question

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here