R programming and pythonPerform BIRCH clustering for theLoansdata set. As a final step of this...

Question

R programming and pythonPerform BIRCH clustering for theLoansdata set. As a final step of this assignment, make a graph of clusters, compute silhouette (in addition, you can make a silhouette graph) in both R and Python, and make conclusions. Make a final report with code, outputs, graphs, captions, and basic descriptions / conclusions.

Aakarsh · Accepted Answer

Notebook
  
    
Birch clustering using Python on Loan Data Set¶
In [437]:
    
#import pakages required 
from itertools import cycle
from time import time
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.colors as colors
import pandas as pd
import seaborn as sns
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import Birch
from sklearn.model_selection import train_test_split
from sklearn.datasets.samples_generator import make_blobs
import matplotlib.cm as cm
from sklearn.metrics import silhouette_samples, silhouette_score
Overview Of the dataset¶
In [343]:
    
df = pd.read_csv("loan_data.csv") # load the data into a dataframe using Pandas                                      
df.head()      
Out[343]:
		Approval	Debt-to-Income Ratio	FICO Score	Request Amount	Interest
	0	F	0.0	397	1000	450.0
	1	F	0.0	403	500	225.0
	2	F	0.0	408	1000	450.0
	3	F	0.0	408	2000	900.0
	4	F	0.0	411	5000	2250.0
In [344]:
    
df.info() #Different tuples in the data set

RangeIndex: 150302 entries, 0 to 150301
Data columns (total 5 columns):
Approval                150302 non-null object
Debt-to-Income Ratio    150302 non-null float64
FICO Score              150302 non-null int64
Request Amount          150302 non-null int64
Interest                150302 non-null float64
dtypes: float64(2), int64(2), object(1)
memory usage: 5.7+ MB
Around 150k enteries for the loan data
In [345]:
    
# Replacing true false with 1 and 0 respectively
df["Approval"].replace('F', 0, inplace=True) 
df["Approval"].replace('T', 1, inplace=True)
In [346]:
    
df.describe() #statistical details of the data
Out[346]:
		Approval	Debt-to-Income Ratio	FICO Score	Request Amount	Interest
	count	150302.000000	150302.000000	150302.000000	150302.000000	150302.000000
	mean	0.500566	0.183538	672.023266	13427.080145	6042.186065
	std	0.500001	0.137226	69.129157	9468.345958	4260.755681
	min	0.000000	0.000000	371.000000	500.000000	225.000000
	25%	0.000000	0.090000	647.000000	6000.000000	2700.000000
	50%	1.000000	0.160000	684.000000	11000.000000	4950.000000
	75%	1.000000	0.240000	714.000000	19000.000000	8550.000000
	max	1.000000	1.030000	869.000000	44000.000000	19800.000000
Here details of all loan parameters are depicted and Average Approval rate is  50% 
Correlation b/w tuples¶
In [439]:
    
corr = df.corr()
corr
Out[439]:
		Approval	Debt-to-Income Ratio	FICO Score	Request Amount	Interest
	Approval	1.000000	-0.267921	0.544305	-0.045903	-0.045903
	Debt-to-Income Ratio	-0.267921	1.000000	-0.070586	0.129207	0.129207
	FICO Score	0.544305	-0.070586	1.000000	0.153920	0.153920
	Request Amount	-0.045903	0.129207	0.153920	1.000000	1.000000
	Interest	-0.045903	0.

R programming and pythonPerform BIRCH clustering for the Loans data set. As a final step of this assignment, make a graph of clusters, compute silhouette (in addition, you can make a silhouette graph)...

Answer To: R programming and pythonPerform BIRCH clustering for the Loans data set. As a final step of this...

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment