For this lab, please reference Kaggle as your primary source of information. For easy access, all...

Question

For this lab, please reference Kaggle as your primary source of information. For easy access, all information from theOverview > Evaluationsection on Kaggle may be found below.

Overarching Goal

It is your job to predict if a passenger survived the sinking of the Titanic or not.
For each in the test set, you must predict a 0 or 1 value for the variable.

Metric

Your score is the percentage of passengers you correctly predict. This is known as accuracy.

Submission File Format for Predictions to Kaggle

You should submit a.csvfile with exactly 418 entries plus a header row. Your submission will show an error if you have extra columns (beyondPassengerIdandSurvived) or rows.

The file should have exactly 2 columns:

PassengerId(sorted in any order)

Survived(contains your binary predictions: 1 for survived, 0 for deceased)

For example:


PassengerId, Survived
892, 0
893, 1
894, 0
...

You can download an example submission file (gender_submission.csv) on theDatapage.

Optimization

Once you have completed the baseline requirement, you will turn to perfecting your model.In the second week, your assignment is to optimize your model such that you are in the top 70% of either the public or private leaderboard.In other words, the "competitive" aspect of the lab takes place in the second week, and you should dedicate yourself to discovering better ways to refine your model.

Optimization Scoring

The breakdown for the scores you may receive are as follows:

90 pts:The predictions from your algorithm rank your group in the top 70% of the class, either in the public or private leaderboard.

75 pts:The predictions from your algorithm do not rank your group in the top 70% of the class, either in the public or private leaderboard, but your predictions do beat the baseline.

TA Discretion:The predictions from your algorithm do not rank your group in the top 70% of the class, either in the public or private leaderboard, and your predictions do not beat the baseline.

0 pts:You did not submit anything.

There will also be an opportunity forextra credit:

+20 pts EC:Your team ranks amongst the top five (rank 1-5) in the private leaderboard, and at least one member of your group presents on your group's approach to the project. We will look to the private leaderboard to mitigate overfitting to the public leaderboard.

+10 pts EC:Your team ranks amongst the next five (rank 6-10) in the private leaderboard, and at least one member of your group presents on your group's approach to the project. We will look to the private leaderboard to mitigate overfitting to the public leaderboard.

Welcome to the Data Mining Kaggle Competition!
In this lab, you will get a chance to use and apply all the concepts you have learnt so far. This is also a very good opportunity get your feet wet in data science competitions and familiarize yourself with Kaggle!
The Challenge
The sinking of the Titanic is one of the most infamous shipwrecks in history. On April 15, 1912, during her maiden voyage, the widely considered “unsinkable” RMS Titanic sank after colliding with an iceberg. Unfortunately, there weren’t enough lifeboats for everyone onboard, resulting in the death of 1502 out of 2224 passengers and crew. Here is alinkif you would like to learn more. While there was some element of luck involved in surviving, it seems some groups of people were more likely to survive than others.
In this challenge, we ask you to build a predictive model that answers the question: “what sorts of people were more likely to survive?” using passenger data (ie name, age, gender, socio-economic class, etc).
Overview of How Kaggle’s Competitions Work
- Join the Competition
  Read about the challenge description, accept the Competition Rules and gain access to the competition dataset.
- Get to Work
  Download the data, build models on it locally or and generate a prediction file.
- Make a Submission
  Upload your prediction as a submission on Kaggle and receive an accuracy score.
- Check the Leaderboard
  See how your model ranks against other Kagglers on our leaderboard.
  What Data Will I Use in This Competition?
  In this competition, you’ll gain access to two similar datasets that include passenger information like name, age, gender, socio-economic class, etc. One dataset is titledtrain.csvand the other is titledtest.csv.
  Thetrain.csvwill contain the details of a subset of the passengers on board (891 to be exact) and importantly, will reveal whether they survived or not, also known as the “ground truth”.
  Thetest.csvdataset contains similar information but does not disclose the “ground truth” for each passenger. It’s your job to predict these outcomes.
  Using the patterns you find in thetrain.csvdata, predict whether the other 418 passengers on board (found intest.csv) survived.
  Check out the “Data” tab to explore the datasets even further. Once you feel you’ve created a competitive model, submit your predictions to Kaggle to see where your model stands on our leaderboard against other Kagglers.
Submission File Format:
You should submit a csv file with exactly 418 entries plus a header row. Your submission will show an error if you have extra columns (beyondPassengerIdandSurvived) or rows.
The file should have exactly 2 columns:
- PassengerId(sorted in any order)
- Survived(contains your binary predictions: 1 for survived, 0 for deceased)
Overview
The data has been split into two groups: - training set (train.csv) - test set (test.csv)The training set should be used to build your machine learning models. For the training set, we provide the outcome (also known as the “ground truth”) for each passenger. Your model will be based on “features” like passengers’ gender and class. You can also use feature engineering to create new features.The test set should be used to see how well your model performs on unseen data. For the test set, we do not provide the ground truth for each passenger. It is your job to predict these outcomes. For each passenger in the test set, use the model you trained to predict whether or not they survived the sinking of the Titanic.We also includegender_submission.csv, a set of predictions that assume all and only female passengers survive, as an example of what a submission file should look like.
Data Dictionary

Survived: Survival, 0 = No, 1 = YesPclass: Ticket class, 1 = 1st, 2 = 2nd, 3 = 3rdSex: SexAge: Age in yearsSibSp: Number of siblings / spouses aboard the TitanicParch: Number of parents / children aboard the TitanicTicket: Ticket numberFare: Passenger fareCabin: Cabin numberEmbarked: Part of Embarkation, C = Cherbourg, Q = Queenstown, S = Southampton
VARIABLE NOTES

Pclass: A proxy for socio-economic status (SES) 1st = Upper 2nd = Middle 3rd = LowerAge: Age is fractional if less than 1. If the age is estimated, is it in the form of xx.5SibSp: The number of siblings/spousesParch: The number of parents/childrenSome children travelled only with a nanny, therefore parch=0 for them.

attachment1-sltl121m.zip

Robert · Accepted Answer

Answer Attached Below:

For this lab, please reference Kaggle as your primary source of information. For easy access, all information from theOverview > Evaluationsection on Kaggle may be found below.Overarching GoalIt is...

Overarching Goal

Metric

Submission File Format for Predictions to Kaggle

Optimization

Optimization Scoring

Welcome to the Data Mining Kaggle Competition!

The Challenge

Overview of How Kaggle’s Competitions Work

What Data Will I Use in This Competition?

Submission File Format:

Overview

Data Dictionary

VARIABLE NOTES

Answer To: For this lab, please reference Kaggle as your primary source of information. For easy access, all...

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment