Select a competition from Kaggle - it can be an old competition - can submit an entry. Then write a report, a short report, outline your approach. The report will be graded according to the following:...


Select a competition from Kaggle - it can be an old competition - can submit an entry. Then write a report, a short report, outline your approach. The report will be graded according to the following:



This should be a short report - not an essay - and use as many tables and diagrams and code examples as you can. It is not an ESSAY!


Data Preparation 15%


Data preparation includes some exploration - what are the dimensions and levels, five number summary of continuous variables etc.



Data Statistics
Summary Statistics

Correlations
val correlation: Double = Statistics.corr(seriesX, seriesY, "pearson")
Stratified Sampling - splits automatically on a given level
val approxSample = data.sampleByKey(withReplacement = false, fractions)
Hypothesis Testing
Random data generation
val u = normalRDD(sc, 1000000L, 10)
A simple and shirt exploration of the data. Data-bricks has a nice visualisation tool that can help with illustration.



For a kaggle competition, this could involve all feature engineering. This includes dropping features and transforming features but you must include a reason for why you have done it , not just a, well, I thought it was useless blah blah.


Implementation: 28%


This is your solution, algorithm, parameters and features.


Supervised
Classification (Binary and Multinomial)
Linear SVM

Logistic Regression

Decision trees (random forests)
Regression
Linear Regression (Linear least squares)
Lasso (Support regularisation and feature selection)
Ridge Regression (Addresses Co-linearity)
Streaming linear regression (A version of online learning)
Decision Tree Regression


Model evaluation: 22%


This is how you evaluate - do you use the right evaluation metric or technique, do you apply the right measure.


Examples of evaluations techniques:


Evaluation
Classification
Confusion Matrix
Precision (Positive Predictive Value)
Recall (True Positive Rate)

F-measure
Receiver Operating Characteristic (ROC)
Area Under ROC Curve
Regression
Mean Squared Error (MSE)
Root Mean Squared Error & Mean Absolute Error
Coefficient of Determination (R2)







Oct 07, 2019
SOLUTION.PDF

Get Answer To This Question

Related Questions & Answers

More Questions ยป

Submit New Assignment

Copy and Paste Your Assignment Here