For this exercise, we use theHepatitis Disease datasetfrom UCI data repository. This data consists of 156 instances with 20 attributes.
Attribute information:
1. Class: DIE, LIVE
2. AGE: 10, 20, 30, 40, 50, 60, 70, 80
3. SEX: male, female
4. STEROID: no, yes
5. ANTIVIRALS: no, yes
6. FATIGUE: no, yes
7. MALAISE: no, yes
8. ANOREXIA: no, yes
9. LIVER BIG: no, yes
10. LIVER FIRM: no, yes
11. SPLEEN PALPABLE: no, yes
12. SPIDERS: no, yes
13. ASCITES: no, yes
14. VARICES: no, yes
15. BILIRUBIN: Continuous
16. ALK PHOSPHATE: Continuous
17. SGOT: Continuous
18. ALBUMIN: Continuous
19. PROTIME: Continuous
20. HISTOLOGY: no, yes
Question no 1: Use Excel and the hepatitis dataset. Answer the following questions:
(1+1+1+1+1+2=7)
a. Probability of a Male patient being dead.
b. There is one patient with attribute ANOREXIA value to be "?" -- question is, what is the likely value of this attribute for this patient?
c. What is the probability that a patient between age [10,50] use steroid? (Replace “?” with “Yes”)
d. Which one is more likely, a person with no ANTIVIRALS being Alive or a person with MALAISE being dead?
e. Which Age group is more likely to be dead ? What are the probabilities? (Group the ages in 3 groups. 20-40, 40-60, 60-80)
f. Is the age attribute normally distributed? Reason why or why not?
[ for Question no 1: you are allowed to use inbuilt excel function. As an example, for probability of a male being dead, I would like to see something as follows:
"This question could be answered by finding xx and doing xxx".
Show how finding XX
How doing XXXX
Therefore answer is:
2.
Use Excel/Python and the Hepatitis dataset: (3+2= 5)
Create 3 different visualizations showing the mean and standard deviation (orstandard erroras it is referred to in this context) of the sampling distributions of sample age for sample sizes: 2, 5, 10
What happens to the mean of the sample means of age as the sample size is increased? What happens to the standard error ?
[description: In addition to doing the work in python or excel, you need to write a descriptive answer that summarizes your findings.
Question no 3: USE PYTHON (1+2+2)
a. Generate a discrete uniform distribution of population size 100 between interval (1,10).)
b Consider the sample size of N=10, Simulate the sampling distribution of the sample mean. (repeat 100 times) Draw the visualization.
c Consider the sample size of N=30, what is the sample mean and sample standard deviation? (repeat 100 times). Draw the visualization.
[Code+ graphs]