For this exercise, we use theHepatitis Disease datasetfrom UCI data repository. This data consists of 156 instances with 20 attributes.Attribute information: 1. Class: DIE, LIVE 2. AGE: 10, 20, 30,...

1 answer below »

For this exercise, we use theHepatitis Disease datasetfrom UCI data repository. This data consists of 156 instances with 20 attributes.


Attribute information:



1. Class: DIE, LIVE



2. AGE: 10, 20, 30, 40, 50, 60, 70, 80



3. SEX: male, female



4. STEROID: no, yes



5. ANTIVIRALS: no, yes



6. FATIGUE: no, yes



7. MALAISE: no, yes



8. ANOREXIA: no, yes



9. LIVER BIG: no, yes



10. LIVER FIRM: no, yes



11. SPLEEN PALPABLE: no, yes



12. SPIDERS: no, yes



13. ASCITES: no, yes



14. VARICES: no, yes



15. BILIRUBIN: Continuous



16. ALK PHOSPHATE: Continuous


17. SGOT: Continuous



18. ALBUMIN: Continuous



19. PROTIME: Continuous



20. HISTOLOGY: no, yes





Question no 1: Use Excel and the hepatitis dataset. Answer the following questions:
(1+1+1+1+1+2=7)





a. Probability of a Male patient being dead.



b. There is one patient with attribute ANOREXIA value to be "?" -- question is, what is the likely value of this attribute for this patient?



c. What is the probability that a patient between age [10,50] use steroid? (Replace “?” with “Yes”)



d. Which one is more likely, a person with no ANTIVIRALS being Alive or a person with MALAISE being dead?



e. Which Age group is more likely to be dead ? What are the probabilities? (Group the ages in 3 groups. 20-40, 40-60, 60-80)



f. Is the age attribute normally distributed? Reason why or why not?



[ for Question no 1: you are allowed to use inbuilt excel function. As an example, for probability of a male being dead, I would like to see something as follows:



"This question could be answered by finding xx and doing xxx".



Show how finding XX



How doing XXXX



Therefore answer is:




2.
Use Excel/Python and the Hepatitis dataset: (3+2= 5)




  1. Create 3 different visualizations showing the mean and standard deviation (orstandard erroras it is referred to in this context) of the sampling distributions of sample age for sample sizes: 2, 5, 10


  2. What happens to the mean of the sample means of age as the sample size is increased? What happens to the standard error ?



[description: In addition to doing the work in python or excel, you need to write a descriptive answer that summarizes your findings.





Question no 3: USE PYTHON (1+2+2)





a. Generate a discrete uniform distribution of population size 100 between interval (1,10).)




b Consider the sample size of N=10, Simulate the sampling distribution of the sample mean. (repeat 100 times) Draw the visualization.




c Consider the sample size of N=30, what is the sample mean and sample standard deviation? (repeat 100 times). Draw the visualization.




[Code+ graphs]

Answered Same DayOct 13, 2022

Answer To: For this exercise, we use theHepatitis Disease datasetfrom UCI data repository. This data consists...

Baljit answered on Oct 14 2022
52 Votes
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here