Take any Dataset of your choice ,perform EDA(Exploratory Data Analysis) and apply a suitable...

Question

Take any Dataset of your choice ,perform EDA(Exploratory Data Analysis) and apply a suitable Classifier,Regressor or Clusterer and calculate the accuracy of the model

Major Project Jan Batch Major Project Take any Dataset of your choice ,perform EDA(Exploratory Data Analysis) and apply a suitable Classifier,Regressor or Clusterer and calculate the accuracy of the model.

data-science-major-project-zc5ivrxh.pdf

Pratyush · Accepted Answer

1/7/23, 9:08 AM Iris Data Analysis.ipynb - Colaboratory
https://colab.research.google.com/drive/12z-t9UQzBFiqjno8iNWVVCpYakVCE1sm#scrollTo=OrsRUgvdP6JZ&printMode=true 1/8
import numpy as np
import seaborn as sns
import warnings
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
warnings.filterwarnings("ignore", category = FutureWarning)
sns.set(style = "white" , color_codes = True)
import pandas as pd
dataset = pd.read_csv('Iris.csv')
Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species
0 1 5.1 3.5 1.4 0.2 Iris-setosa
1 2 4.9 3.0 1.4 0.2 Iris-setosa
2 3 4.7 3.2 1.3 0.2 Iris-setosa
3 4 4.6 3.1 1.5 0.2 Iris-setosa
4 5 5.0 3.6 1.4 0.2 Iris-setosa
5 6 5.4 3.9 1.7 0.4 Iris-setosa
6 7 4.6 3.4 1.4 0.3 Iris-setosa
7 8 5.0 3.4 1.5 0.2 Iris-setosa
8 9 4.4 2.9 1.4 0.2 Iris-setosa
9 10 4.9 3.1 1.5 0.1 Iris-setosa
dataset.head(10)
iris_setosa_data = dataset.iloc[0:50 , :]
iris_versicolor_data = dataset.iloc[51:100 , :]
iris_virginica_data = dataset.iloc[101:150 , :]
Mean, Variance, Standard Deviation
Mean
Mean is the CENTRAL TENDENCY or the AVERAGE VALUE of a set of given observations. Mathematical de�nation of mean is :
=X̄
∑n
i=1 xi
n
print("MEANS")
print(np.mean(iris_setosa_data['PetalLengthCm']) , "--setosa")
print(np.mean(iris_versicolor_data['PetalLengthCm']) , "--versicolor")
print(np.mean(iris_virginica_data['PetalLengthCm']) , "--virginica")
MEANS
1.464 --setosa
4.2510204081632645 --versicolor
5.542857142857144 --virginica
MEAN
By using the mean we can perform very initial EDA. For example, just by looking at the values of the means of the petal lenght, we can easily tell
that Iris Setosa has a much smaller petal length on average when comapre to Iris Versicolor and Iris Virginica.
Observations
#Mean with an outlier
print(np.mean(np.append(iris_setosa_data['PetalLengthCm'] , 50)))
2.4156862745098038
1/7/23, 9:08 AM Iris Data Analysis.ipynb - Colaboratory
https://colab.research.google.com/drive/12z-t9UQzBFiqjno8iNWVVCpYakVCE1sm#scrollTo=OrsRUgvdP6JZ&printMode=true 2/8
Even after 50 values all say that the petal length of setosa �owers are around 1.464 cm , If there is even a single wrong data, it can shift the
mean wildly.
These error can happen because of human mistakes or data corruption or any other reasons. Such data points are called as "OUTLIERS".
Variance
Variance represents the spread of the given observations. It is the average square distance of the observation from mean.
The formula for the variance of a population is given by :

Major Project Jan BatchMajor ProjectTake any Dataset of your choice ,perform EDA(Exploratory Data Analysis) and apply asuitable Classifier,Regressor or Clusterer and calculate the accuracy of...

Answer To: Major Project Jan BatchMajor ProjectTake any Dataset of your choice ,perform EDA(Exploratory...

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment