Major Project Jan BatchMajor ProjectTake any Dataset of your choice ,perform EDA(Exploratory Data Analysis) and apply asuitable Classifier,Regressor or Clusterer and calculate the accuracy of...

1 answer below »
Take any Dataset of your choice ,perform EDA(Exploratory Data Analysis) and apply a suitable Classifier,Regressor or Clusterer and calculate the accuracy of the model


Major Project Jan Batch Major Project Take any Dataset of your choice ,perform EDA(Exploratory Data Analysis) and apply a suitable Classifier,Regressor or Clusterer and calculate the accuracy of the model.
Answered 1 days AfterJan 05, 2023

Answer To: Major Project Jan BatchMajor ProjectTake any Dataset of your choice ,perform EDA(Exploratory...

Pratyush answered on Jan 07 2023
32 Votes
1/7/23, 9:08 AM Iris Data Analysis.ipynb - Colaboratory
https://colab.research.google.com/drive/12z-t9UQzBFiqjno8iNWVVCpYakVCE1sm#scrollTo=OrsRUgvdP6JZ&printMode=true 1/8
import numpy as np
import seaborn as sns
import warnings
import matplotlib.pyplot as plt
from mpl_
toolkits.mplot3d import Axes3D
warnings.filterwarnings("ignore", category = FutureWarning)
sns.set(style = "white" , color_codes = True)
import pandas as pd
dataset = pd.read_csv('Iris.csv')
Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species
0 1 5.1 3.5 1.4 0.2 Iris-setosa
1 2 4.9 3.0 1.4 0.2 Iris-setosa
2 3 4.7 3.2 1.3 0.2 Iris-setosa
3 4 4.6 3.1 1.5 0.2 Iris-setosa
4 5 5.0 3.6 1.4 0.2 Iris-setosa
5 6 5.4 3.9 1.7 0.4 Iris-setosa
6 7 4.6 3.4 1.4 0.3 Iris-setosa
7 8 5.0 3.4 1.5 0.2 Iris-setosa
8 9 4.4 2.9 1.4 0.2 Iris-setosa
9 10 4.9 3.1 1.5 0.1 Iris-setosa
dataset.head(10)
iris_setosa_data = dataset.iloc[0:50 , :]
iris_versicolor_data = dataset.iloc[51:100 , :]
iris_virginica_data = dataset.iloc[101:150 , :]
Mean, Variance, Standard Deviation
Mean
Mean is the CENTRAL TENDENCY or the AVERAGE VALUE of a set of given observations. Mathematical de�nation of mean is :
=X̄
∑n
i=1 xi
n
print("MEANS")
print(np.mean(iris_setosa_data['PetalLengthCm']) , "--setosa")
print(np.mean(iris_versicolor_data['PetalLengthCm']) , "--versicolor")
print(np.mean(iris_virginica_data['PetalLengthCm']) , "--virginica")
MEANS
1.464 --setosa
4.2510204081632645 --versicolor
5.542857142857144 --virginica
MEAN
By using the mean we can perform very initial EDA. For example, just by looking at the values of the means of the petal lenght, we can easily tell
that Iris Setosa has a much smaller petal length on average when comapre to Iris Versicolor and Iris Virginica.
Observations
#Mean with an outlier
print(np.mean(np.append(iris_setosa_data['PetalLengthCm'] , 50)))
2.4156862745098038
1/7/23, 9:08 AM Iris Data Analysis.ipynb - Colaboratory
https://colab.research.google.com/drive/12z-t9UQzBFiqjno8iNWVVCpYakVCE1sm#scrollTo=OrsRUgvdP6JZ&printMode=true 2/8
Even after 50 values all say that the petal length of setosa �owers are around 1.464 cm , If there is even a single wrong data, it can shift the
mean wildly.
These error can happen because of human mistakes or data corruption or any other reasons. Such data points are called as "OUTLIERS".
Variance
Variance represents the spread of the given observations. It is the average square distance of the observation from mean.
The formula for the variance of a population is given by :
Where
is the sum of squared errors
is the number of observations in the group
is the observation in the group
is the mean of the group
= =s2
SS
N
∑( −xi x̄)2
N
SS
N
xi...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here