Use the “TitanicPassenger.jmp” (if you are using JMP) or “TitanicPassenger.xlsx” file for this assignment.Answer each question below in a report. Include screenshots if needed.Give a brief data...

Use the “TitanicPassenger.jmp” (if you are using JMP) or “TitanicPassenger.xlsx” file for this assignment.




Answer each question below in a report. Include screenshots if needed.





Give a brief data description. How many data records are in the dataset? How many data records have missing values in the “Age” variable? What other variables have missing values? Are there any duplicate values in this dataset?





What is the proportion of survived passengers? What is the age range of the passengers? What is the age range of the majority of the Titanic passengers? Include a screenshot of the “Survived” distribution and the “Age” distribution.





What is the proportion of first-class passengers compared to the rest? What is the most common home destination of the Titanic passengers?





Build a nominal logistical regression model to predict the “Survived” variable using “Age”, “Passenger Class” and “Sex” as the predicting variables.

What is the AUC score of this model?

What is the chance that a female baby (less than 10 years old) first-class passenger survived?

What is the chance that a female baby (less than 10 years old) third-class passenger survived?

What is the chance that a 40 years old male first-class passenger survived?

What is the chance that a 40 years old male third-class passenger survived?

What do you think of these differences?





Create a Mosaic graph with “Survived” as the y-axis, “Passenger” as the x-axis, and “Sex” as the group x. What are the characteristics of the passengers (age and passenger class) that would most likely survive? What are the characteristics of the passengers (age and passenger class) that would very unlikely survive? What do you think about this insight? Include a screenshot of the Mosaic graph.






Construct a decision tree using 3 variables: “Passenger Class”, “Sex”, and “Age” to predict the “Survived” variable.

How many splits are done before the AUC score (Area under ROC curve) achieves at least 0.80?

Which variable contributes the most to predicting “Survived”?






Submission: A report in Word or PDF format with screenshots of your work in rapid miner.


Oct 17, 2022
SOLUTION.PDF

Get Answer To This Question

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here