This order requires you to work on some model development as well as results for a credit fraud project. Dataset:https://www.kaggle.com/mlg-ulb/creditcardfraudYour job is to download the dataset and...

1 answer below »
This order requires you to work on some model development as well as results for a credit fraud project. Dataset:https://www.kaggle.com/mlg-ulb/creditcardfraudYour job is to download the dataset and use the ML algorithm: Logistic, Random Forest, Decision Tree, SVM to develop a model then use evaluationand validation method such as AUC or ROC to compare each algorithm's performance and precision. The dataset is fairly cleaned and organized, you do not need to much time on pre-processing. Please submit a notebook for all coding and a word document for documentation. In the notebook, please give clear comments to explain what you did. In the documentation, please write about 4 pages of report regarding results and interpretations including words, outputs, visualizations etc. Thank you very much.

Answered 4 days AfterDec 01, 2021

Answer To: This order requires you to work on some model development as well as results for a credit fraud...

Suraj answered on Dec 04 2021
111 Votes
Machine learning With Python Project
Here, we are provided credit card data set. It is important that credit card companies are able to recognize fraudulent credit card transactions so that customers are not charged for items that they did n
ot purchase.
The dataset contains transactions made by credit cards in September 2013 by European cardholders. This dataset presents transactions that occurred in two days, where we have 492 frauds out of 284,807 transactions. The dataset is highly unbalanced, the positive class (frauds) account for 0.172% of all transactions.
Our aim is to build different types of machine learning models to classify fraudulent credit card transaction or not. Thus, for this we will use four different types of machine learning algorithms that is logistic regression, decision tree, random forest and support vector machine (SVM).
The data set is fully cleaned. There is no missing value in the data set and also no NA values are present in the data. The only job responsivity is dividing the data set into correct training testing set and apply the algorithm. The main feature needs to check the fitting of the model good or bad is the precision score, recall, f1 score and the accuracy of the model. Also, we plotted the ROC AUC curve to check for the model fitting.
Logistic Regression:
Let’s first discuss the logistic regression model. The required output tables and ROC curve is given as follows:
Since, this data consists of very large observations. So, in this we are using 20k observations to make model. The reason for choosing less observation just because the model is taking bit more time while training.
The model performance is given as follows:
Precision - Precision is the ratio of correctly predicted positive observations to the total predicted positive observations. High precision relates to the low false positive rate. We have got 0.61 precision which is pretty good. Thus, the model predicted 61% times correctly positive result from the total positively predicted results.
Precision = TP/TP+FP
Recall (Sensitivity) - Recall is the ratio of correctly predicted positive observations to the all observations in actual class - yes. We have got recall of 0.57 which is good for this model as it’s above 0.5. Thus, the model predicted 57% times correctly positive result from the total original positive results.
Recall = TP/TP+FN
F1 score - F1 Score is the...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here