Find data, explore data, make some pretty pictures. 1. For this project you must find some published or existing data. Possible sources include: almanacs, magazines and journal articles, textbooks,...

1 answer below »
Find data, explore data, make some pretty pictures. 1. For this project you must find some published or existing data. Possible sources include: almanacs, magazines and journal articles, textbooks, web resources, athletic teams, newspapers, professors with experimental data, campus organizations, electronic data repositories, etc. Your dataset must have at least 250 cases, two categorical variables and two quantitative variables. It is also recommended that you are interested in the material included in the dataset. 2. Utilizing technology preform exploratory data analysis, EDA. (a) For at least one of the quantitative variables, include summary statistics (mean, standard deviation, five number summary) and graphical displays (histogram, box plot and qq plot). Are there any outliers? Is the distribution normal, symmetric, skewed, or some other shape? (b) Create a graphical display looking at multiple variables and their correlation. (c) For at least one of the categorical variables, include a frequency and relative frequency table. (d) Include a two-way table for two of the categorical variables and discuss any relevant proportions. Describe any possible relationship between the two variables. (e) Include a side-by-side plot for at least one categorical and at least one quantitative variable. Describe any association between the two variables. Use summary statistics to compare groups. (f) Create a visualization or preform a statistical computation that is appropriate to your data but not already included. 3. Write your report! (a) Introduce your data set including a reference to where it can be found. Describe all relevant variables that you will use in your analysis. (b) Describe all ways that were neseccary to clean and organize the data. (c) Include all items requested above. Include graphs and text about each. (d) Write a brief conclusion highlighting the most interesting features of your data. The report will be graded by the following criteria: ˆ Statistical analysis - 30 points. The statistical tests are all provided. ˆ Graphical Representations - 30 points. The requested graphical displays are made and included in report. ˆ Data collection - 15 points. The data is gathered in a responsible way. The method of collection is clearly stated and variables are all explained. If a sample of the data is used it is done in a proper way. ˆ Interpretations - 15 points. The results of the statistical analysis are clearly explained and interpreted in the context of the problem. The conclusions accurately reflect the analysis and are well supported. ˆ Writing quality - 10 points. The paper is readable and clearly written. There are few, if any, grammatical or spelling errors and they do not interfere with the clarity of the paper. Numbering on this document is not used in the report in anyway.
Answered 2 days AfterSep 14, 2021

Answer To: Find data, explore data, make some pretty pictures. 1. For this project you must find some published...

Suraj answered on Sep 16 2021
130 Votes
Introduction: The data used for this assignment is famous titanic data set. The data set satisfying all the conditions of requirement of categorical variable and quantitative variable. It has more than 250 entries. The data set is downloaded from an online famous site. The website is link is given as follows:
https://www.kaggle.com/brendan45774/test-file
The data set has 12 different columns. The description about the variables is given as follows:
Here, survival is the target variable and rest all the variables are independent variables. In this data set there are 2 quantitative variables and rest all the variables are categorical variables.
Since, we will not use all the variables in the analysis. The variables those will be use to complete this analysis are Pclass, sex, age, fare. We will use only four variables to complete this analysis.
b)
The first task before any type of analysis is to clean the data and make it suitable for analysis. Thus, the data set contain some unimportant variables. We dropped (deleted) those unnecessary columns from the data set.
c)
The summary statistics for the two quantitative variables is given in the following table:
The visualization part is given as follows:
From the histogram we can see that there is slight tail...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here