For this lab, please reference Kaggle as your primary source of information. For easy access, all information from theOverview > Evaluationsection on Kaggle may be found below.Overarching GoalIt is...

1 answer below »

For this lab, please reference Kaggle as your primary source of information. For easy access, all information from theOverview > Evaluationsection on Kaggle may be found below.


Overarching Goal


It is your job to predict if a passenger survived the sinking of the Titanic or not.
For each in the test set, you must predict a 0 or 1 value for the variable.


Metric


Your score is the percentage of passengers you correctly predict. This is known as accuracy.


Submission File Format for Predictions to Kaggle


You should submit a.csvfile with exactly 418 entries plus a header row. Your submission will show an error if you have extra columns (beyondPassengerIdandSurvived) or rows.


The file should have exactly 2 columns:




  • PassengerId(sorted in any order)


  • Survived(contains your binary predictions: 1 for survived, 0 for deceased)


For example:



PassengerId, Survived
892, 0
893, 1
894, 0
...



You can download an example submission file (gender_submission.csv) on theDatapage.


Optimization


Once you have completed the baseline requirement, you will turn to perfecting your model.In the second week, your assignment is to optimize your model such that you are in the top 70% of either the public or private leaderboard.In other words, the "competitive" aspect of the lab takes place in the second week, and you should dedicate yourself to discovering better ways to refine your model.


Optimization Scoring


The breakdown for the scores you may receive are as follows:




  • 90 pts:The predictions from your algorithm rank your group in the top 70% of the class, either in the public or private leaderboard.


  • 75 pts:The predictions from your algorithm do not rank your group in the top 70% of the class, either in the public or private leaderboard, but your predictions do beat the baseline.


  • TA Discretion:The predictions from your algorithm do not rank your group in the top 70% of the class, either in the public or private leaderboard, and your predictions do not beat the baseline.


  • 0 pts:You did not submit anything.


There will also be an opportunity forextra credit:




  • +20 pts EC:Your team ranks amongst the top five (rank 1-5) in the private leaderboard, and at least one member of your group presents on your group's approach to the project. We will look to the private leaderboard to mitigate overfitting to the public leaderboard.


  • +10 pts EC:Your team ranks amongst the next five (rank 6-10) in the private leaderboard, and at least one member of your group presents on your group's approach to the project. We will look to the private leaderboard to mitigate overfitting to the public leaderboard.


  • Welcome to the Data Mining Kaggle Competition!

    In this lab, you will get a chance to use and apply all the concepts you have learnt so far. This is also a very good opportunity get your feet wet in data science competitions and familiarize yourself with Kaggle!

    The Challenge

    The sinking of the Titanic is one of the most infamous shipwrecks in history. On April 15, 1912, during her maiden voyage, the widely considered “unsinkable” RMS Titanic sank after colliding with an iceberg. Unfortunately, there weren’t enough lifeboats for everyone onboard, resulting in the death of 1502 out of 2224 passengers and crew. Here is alinkif you would like to learn more. While there was some element of luck involved in surviving, it seems some groups of people were more likely to survive than others.
    In this challenge, we ask you to build a predictive model that answers the question: “what sorts of people were more likely to survive?” using passenger data (ie name, age, gender, socio-economic class, etc).

    Overview of How Kaggle’s Competitions Work



    • Join the Competition
      Read about the challenge description, accept the Competition Rules and gain access to the competition dataset.

    • Get to Work
      Download the data, build models on it locally or and generate a prediction file.

    • Make a Submission
      Upload your prediction as a submission on Kaggle and receive an accuracy score.

    • Check the Leaderboard
      See how your model ranks against other Kagglers on our leaderboard.

      What Data Will I Use in This Competition?

      In this competition, you’ll gain access to two similar datasets that include passenger information like name, age, gender, socio-economic class, etc. One dataset is titledtrain.csvand the other is titledtest.csv.
      Thetrain.csvwill contain the details of a subset of the passengers on board (891 to be exact) and importantly, will reveal whether they survived or not, also known as the “ground truth”.
      Thetest.csvdataset contains similar information but does not disclose the “ground truth” for each passenger. It’s your job to predict these outcomes.
      Using the patterns you find in thetrain.csvdata, predict whether the other 418 passengers on board (found intest.csv) survived.
      Check out the “Data” tab to explore the datasets even further. Once you feel you’ve created a competitive model, submit your predictions to Kaggle to see where your model stands on our leaderboard against other Kagglers.


    Submission File Format:

    You should submit a csv file with exactly 418 entries plus a header row. Your submission will show an error if you have extra columns (beyondPassengerIdandSurvived) or rows.
    The file should have exactly 2 columns:


    • PassengerId(sorted in any order)


    • Survived(contains your binary predictions: 1 for survived, 0 for deceased)


    Overview

    The data has been split into two groups: - training set (train.csv) - test set (test.csv)The training set should be used to build your machine learning models. For the training set, we provide the outcome (also known as the “ground truth”) for each passenger. Your model will be based on “features” like passengers’ gender and class. You can also use feature engineering to create new features.The test set should be used to see how well your model performs on unseen data. For the test set, we do not provide the ground truth for each passenger. It is your job to predict these outcomes. For each passenger in the test set, use the model you trained to predict whether or not they survived the sinking of the Titanic.We also includegender_submission.csv, a set of predictions that assume all and only female passengers survive, as an example of what a submission file should look like.

    Data Dictionary


    Survived: Survival, 0 = No, 1 = YesPclass: Ticket class, 1 = 1st, 2 = 2nd, 3 = 3rdSex: SexAge: Age in yearsSibSp: Number of siblings / spouses aboard the TitanicParch: Number of parents / children aboard the TitanicTicket: Ticket numberFare: Passenger fareCabin: Cabin numberEmbarked: Part of Embarkation, C = Cherbourg, Q = Queenstown, S = Southampton

    VARIABLE NOTES


    Pclass: A proxy for socio-economic status (SES) 1st = Upper 2nd = Middle 3rd = LowerAge: Age is fractional if less than 1. If the age is estimated, is it in the form of xx.5SibSp: The number of siblings/spousesParch: The number of parents/childrenSome children travelled only with a nanny, therefore parch=0 for them.

Answered Same DayOct 11, 2022

Answer To: For this lab, please reference Kaggle as your primary source of information. For easy access, all...

Robert answered on Oct 12 2022
54 Votes
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here