Data Analytics is a subject that can be best appreciated only when applied to a dataset you are familiarwith. The aim of this project is to achieve that. Do not view this project as ahurdle in the...

1 answer below »
Data Analytics is a subject that can be best appreciated only when applied to a dataset you are familiarwith. The aim of this project is to achieve that. Do not view this project as ahurdle in the course, rathera bridge to connect the topics you learnt to your work or subject domain.There are five main modules in this course:

  • Module 1 :Normal Distribution (Percentile, distribution of means, and chance of occurrence if weassume normal distribution)

  • Module 2 : Confidence Interval Estimation (Including Sample Size determination)

  • Module 3 : Inferences from data (Hypothesis testing, i.e., confirming or checking if a claim made aboutthe data. In this module, we dealt with only one sample)

  • Module 4 :More Inferences from data (Multiple samples)

  • Module5 :Regression analysis (Both simple and multiple, apart from basic ANOVA)


Objective

The purpose of the project is for you to apply what you learnt fromat least 4 modulesonyour dataset and make some inferences or estimations.Remember, each Hawkeslearning quiz had 10-15 questions.Here I am asking you to do only 4 tests oranalysis. But the key is – you bring the data and you come up with the question, and each question/set ofanalysis represents something you learnt from the Modules (1-5). There should befour different ones.That is the best waytounderstand the concepts you learnt in this course.If you wish, you can use twodata sources(datasets) to achieve it.It is not necessary all of them have to be done using one dataset.

Data source

There are 3 options, you can choose one of them(there are no restrictions on that)

  1. Bring your own data from work (you can remove any private or confidential information, forexample:if you are bringing any sales or cost data of an item/product or service – the namecanbe masked)

  2. Use data from your previous work or company you have access to (again you can remove anyprivate/confidential information)

  3. Use data from public domain – In today’s world, there is no dearth of structured data. Here aresome places where you can get data from:


Grading Rubric


Total Points (Midterm and Final Report): 55 points


Midterm report:



Point Value:15 points



Due date:April 8th, Thursday



Requirements:



  • No more than 1.5 to 2 pages.

  • You should describe your source of data (including the data fields you have) and what you want to accomplish based on the topics you learnt.

  • You can state the research hypothesis you plan to check, confidence intervals you plan to estimate, or test any relationship between variables you think is important.

  • Remember - I need at least your plan based on the first three modules (see examples). No need for analysis, just what you plan to do.


I will provide feedback within 4 days to each of you (if you submit early, you get your feedback early), if I feel any change is needed – I will indicate that.



How are the 15 points given:



  • Your Data: 5 points(Note: Remember, the sample size should be at least 30 data points to due any parametric tests)

  • Your plan of action: 10 points


Final report :



Point Value: 40 points



Due date:April 30th, Friday



Requirements:



  • No more than 1.5 to 2 pages.

  • Present the findings using the skillset acquired (topics covered) in class.

  • Also include the dataset with the analysis (could be excel or any statistical package). You should provide details of the analysis in an Appendix.



How are the 40 points given: 10 for each Module you choose to apply. (For example, you choose regression to test an association or predict an outcome, you get 10 points for that analysis)


Samples

These are four examples, for each sample – I am showing you how we could use lessons learnt from one or two modules.
Answered Same DayApr 29, 2021

Answer To: Data Analytics is a subject that can be best appreciated only when applied to a dataset you are...

Anu answered on Apr 30 2021
138 Votes
Data is taken from the kaggle.com. It has information on the height and weight of male persons. Link of data is given below,
https://www.kaggle.com/mustafaali96/weight-height
Normal Distribution:
Normal distribution is the most important distribution in Statistics because most of the distributions for large sample size move to normal distribution. Central limit theorem also says that for large sample size sample mean also follows the normal distribution. Most of the statistical tests have the assumption that the variable is normally distributed. To perform these tests, there is a need of testing the normality of a variable. To check the normality of a variable we can make a histogram then check the shape of normal distribution which is shown in the Appendix. But this is the rough idea. Now we will construct a QQ plot to check normality if points are close to line then we will say that the variable is normally distributed as shown in the Appendix. If it is typical to check the closeness then we will move towards the numerical test i.e. Shapiro-Wilk test. In the Appendix table of Shapiro-Wilk...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here