The Abalone Data Set was acquired from the open-source UCI Machine Learning repository https://archive.ics.uci.edu/ml/datasets/Abalone, which was provided by the Department of Primary Industry and...

1 answer below »
I need a ppt guys of 10slides. Please try to visually represent the ppt. Including what is particularly needed


The Abalone Data Set was acquired from the open-source UCI Machine Learning repository https://archive.ics.uci.edu/ml/datasets/Abalone, which was provided by the Department of Primary Industry and Fisheries, Tasmania. It contains 4177 recordings having 9 features. Data Set Features Number of recordings Number of Attributes Attribute Characteristics Multivariate 4177 9 Categorical, Integer, Real In detail the data can be summarized as this: Attribute Data Type Units Description Sex nominal N/A M, F, and I (infant) Length continuous mm Longest shell measurement Diameter continuous mm perpendicular to length Height continuous mm with meat in shell Whole weight continuous grams whole abalone Shucked weight continuous grams weight of meat Viscera weight continuous grams gut weight (after bleeding) Shell weight continuous grams after being dried Rings integer N/A +1.5 gives the age in years After inspecting the dataset we can make test various hypothesizes based on the visualizations and arrive at different conclusions like: Hypothesis 1: "the mean of length is the same for Male and Female" is null hypothesis Hypothesis 2: "the mean of length is not the same for Male and Female" is alternative hypothesis and original claim Using boxplot, Anova test, T test with, important metrics: α = 0.05 and p-value = 8.987874966189928e-07, we see that because we success to reject the null hypothesis, we conclude that mean of length is not the same between two sexes of abalone, which means the growth pattern is different between male and female. Hypothesis: "the median of Rings of female and male is the same" is null hypothesis Using Mann Whitney U-test, median_test and histogram plot with important metrics α = 0.05, p-value for Mann Whitney U-test=6.689638084926974e-05 and p-value for median-test=0.0032854772243561072, it is Because we succeed to reject the null hypothesis, we conclude that also the sample medians look the same, medians of Rings of male and female are not the same actually, which means the age distribution between male and female are different. Hypothesis: When the rings of abalone is less than the median rings of infant, the abalone's length, height and weight are increasing when rings increase. When the rings of abalone is larger than the median rings of infant, the abalone's length, height and weight are less likely to increase with rings' increase. Necessary Numbers: Pearsonr, p-value For Length (less than the median rings of infant): pearsonr=0.74 ; p=1.1e-243 For Length (larger than the median rings of infant): pearsonr=0.14; p=1.2e-13 For Height (less than the median rings of infant): pearsonr=0.54 ; p=3.2e-107 For Height (larger than the median rings of infant): pearsonr=0.27 ; p=5.6e-46 For Whole weight (less than the median rings of infant): pearsonr=0.62 ; p=4.1e-148 For Whole weight (larger than the median rings of infant): pearsonr=0.2 ; p=3e-26 Because the pearsonr of 'larger than the median rings of infant' are all larger than the one of 'less than the median rings of infant', and the low p-values show the reliability of these pearsonr results, we could not reject the H0. This conclusion could suggest that abalones grows (in length, height and weight) until a certain age (the median rings of infant), and after that, it's growth speed slow down dramatically. Analysis: Which elements in the dataset are likely to have linear relationship with Rings? Conclusion from Analysis: Because Length has the largest p-value, it is the only element are likely to have linear relationship with Rings. H0: "Length and Rings have linear relationship" is null hypothesis Using Linear Regression with learning rate as 0.05 and p-value as 0 we see that, because the p-value is 0, it leads to the conclusion conclude that Length and Rings don't have linear relationship. H0: "Length and Rings have linear relationship for Infant" is null hypothesis Using Linear Regression with learning rate as 0.05 and p-value as 0, we can conclude that for infant, Length and Rings also there is no linear relationship. H0: "Height for infant is Gaussian distribution" is null hypothesis Using QQ plot and normaltest with learning rate as 0.05 and p-value = 0.5639596947723188, we conclude that as the p-value is larger than α, so we can we clear that Height for infant is Gaussian distribution. References: https://archive.ics.uci.edu/ml/datasets/Abalone https://rpubs.com/AlistairGJ/Abalone https://datahub.io/machine-learning/abalone Appendix: #read in and parse column headers import pandas as pd url='https://archive.ics.uci.edu/ml/machine-learning-databases/abalone/abalone.data' abalone=pd.read_csv(url,header=None) abalone.columns=["Sex","Length","Diameter","Height","Whole weight","Shucked weight","Viscera weight","Shell weight","Rings"] abalone['Sex'] = abalone['Sex'].map({'M': 2, 'F': 1, 'I':0}) sexes = pd.unique(abalone.Sex.values) aba_data = {sex:abalone['Length'][abalone.Sex == sex] for sex in sexes} aba_df=pd.DataFrame({"Male":aba_data[2].tolist()[0:1307],"Female":aba_data[1].tolist()}) aba_df # draw boxplot to show the shape of distribution boxplot = aba_df.boxplot(column=['Male', 'Female']) sex_val = {'Male_mean': aba_data[2].mean(), 'Male_std': aba_data[2].std(),'Female_mean': aba_data[1].mean(), 'Female_std': aba_data[1].std()} df = pd.DataFrame(data=sex_val,index=[0]) # use Anova test to test for p value f, p = stats.f_oneway(aba_data[2],aba_data[1]) print("p-value for significance is: ", p) if p<0.05: print("reject="" null="" hypothesis")="" else:="" print("accept="" null="" hypothesis")="" from="" scipy.stats="" import="" ttest_ind="" #="" use="" t="" test="" to="" reprove="" the="" result="" ttest,pval="ttest_ind(aba_data[2],aba_data[1])" print("p-value="" for="" significance="" is:="" ",pval)="" #print(aba_data[2].mean(),aba_data[1].mean(),aba_data[2].std(),aba_data[1].std())="" if="" pval="">< 0.05:="" print("we="" reject="" null="" hypothesis")="" else:="" print("we="" accept="" null="" hypothesis")="" #calculate="" the="" mean="" rings="" of="" both="" aba_rings="{sex:abalone['Rings'][abalone.Sex" =="sex]" for="" sex="" in="" sexes}="" aba_rings[1].median()="" aba_rings[2].median()="" aba_male="abalone[abalone['Sex']==2]" aba_female="abalone[abalone['Sex']==1]" import="" matplotlib.pyplot="" as="" plt="" plt.rcparams["figure.figsize"]="(12,7)" aba_male["rings"].plot(kind='hist' ,="" legend="True,title='Male" hist')="" aba_female["rings"].plot(kind='hist' ,="" legend="True,title='Female" hist')="" import="" scipy.stats="" as="" stats="" u_statistic,="" pval="stats.mannwhitneyu(aba_male['Rings']," aba_female['rings'])="" print("p="" value="" is:",pval)="" if="" pval="">< 0.05:="" #="" alpha="" value="" is="" 0.05="" or="" 5%="" print("="" we="" are="" rejecting="" null="" hypothesis")="" else:="" print("we="" are="" accepting="" null="" hypothesis")="" from="" scipy.stats="" import="" median_test="" stat,="" p,="" med,="" tbl="median_test(aba_male['Rings']," aba_female['rings'])="" print("p="" value="" is:",p,="" "and="" the="" median="" is",med)="" if="" pval="">< 0.05:="" #="" alpha="" value="" is="" 0.05="" or="" 5%="" print("="" we="" are="" rejecting="" null="" hypothesis")="" else:="" print("we="" are="" accepting="" null="" hypothesis")="" aba_sns="sns.jointplot(data=abalone," x='Rings' ,="" y='Length' ,kind='kde' )="" aba_sns.fig.set_figwidth(10)="" aba_sns.fig.set_figheight(10)="" aba_sns="sns.jointplot(data=abalone," x='Rings' ,="" y='Height' ,="" kind='kde' ,color='k' )="" aba_sns.fig.set_figwidth(10)="" aba_sns.fig.set_figheight(10)="" aba_sns="sns.jointplot(data=abalone," x='Rings' ,="" y='Whole weight' ,="" kind='kde' )="" aba_sns.fig.set_figwidth(10)="" aba_sns.fig.set_figheight(10)="" #calculate="" the="" median="" rings="" of="" infant="" aba_rings="{sex:abalone['Rings'][abalone.Sex" =="sex]" for="" sex="" in="" sexes}="" aba_rings[0].median()="" abalone_s="abalone[abalone['Rings']"><= aba_rings[0].median()]="" aba_sns="sns.jointplot(data=abalone_s," x='Rings' ,="" y='Length' ,kind='resid' )="" aba_sns.annotate(stats.pearsonr)="" aba_sns.fig.set_figwidth(10)="" aba_sns.fig.set_figheight(10)="" aba_sns="sns.jointplot(data=abalone_s," x='Rings' ,="" y='Height' ,="" kind='resid' ,color='k' )="" aba_sns.annotate(stats.pearsonr)="" aba_sns.fig.set_figwidth(10)="" aba_sns.fig.set_figheight(10)="" aba_sns="sns.jointplot(data=abalone_s," x='Rings' ,="" y='Whole weight' ,="" kind='resid' )="" aba_sns.annotate(stats.pearsonr)="" aba_sns.fig.set_figwidth(10)="" abalone_l="abalone[abalone['Rings']"> aba_rings[0].median()] aba_sns = sns.jointplot(data=abalone_l, x='Rings', y='Length',kind='resid') aba_sns.annotate(stats.pearsonr) aba_sns.fig.set_figwidth(10) aba_sns.fig.set_figheight(10) aba_sns = sns.jointplot(data=abalone_l, x='Rings', y='Height', kind='resid',color='k') aba_sns.annotate(stats.pearsonr) aba_sns.fig.set_figwidth(10) aba_sns.fig.set_figheight(10) aba_sns = sns.jointplot(data=abalone_l, x='Rings', y='Whole weight', kind='resid') aba_sns.annotate(stats.pearsonr) aba_sns.fig.set_figwidth(10) aba_sns.fig.set_figheight(10) # try to use Multiple Linear Regression maba_y = pd.DataFrame(abalone.Rings) maba_x =pd.DataFrame(abalone[abalone.columns[0:7]]) # show the separate relationship figures of each element with Rings fig = plt.figure(figsize=(15,10)) i=1 for col in aba_x.columns: ax = fig.add_subplot(2,3,i) i=i+1 plt.scatter(aba_x[col], aba_y.Rings) ax.set_title(aba_x[col].name) # instantiate the model from sklearn.linear_model import LinearRegression model = LinearRegression(fit_intercept=True, n_jobs=4) # train mabax_train, mabax_test, mabay_train, mabay_test = train_test_split(maba_x, maba_y, test_size=0.2, random_state=42) fit = model.fit(mabax_train, mabay_train) # make predictions mpreds = model.predict(mabax_test) ## plot predicted vs actual plt.figure(figsize=(10,10)) plt.scatter(mabay_test, mpreds) plt.xlabel("True Values") plt.ylabel("Predictions") import statsmodels.api as sm from scipy import stats X2 = sm.add_constant(maba_x) est = sm.OLS(maba_y, X2) est2 = est.fit() print(est2.summary()) #Mean Absolute Error from sklearn.metrics import mean_absolute_error MAE = mean_absolute_error(mabay_test, mpreds) MAE #RMSE from sklearn.metrics import mean_squared_error MSE = mean_squared_error(mabay_test, mpreds) RMSE = np.sqrt(MSE) RMSE #R2 Score- Coefficient of Determination from sklearn.metrics import r2_score r2_score(mabay_test, mpreds) #Creating a Simple Linear Regression r_sq = abalone[["Length", "Rings"]].corr() #Calculating Slope (B1) import numpy as np B1 = r_sq.values[0][1] * (np.std(abalone.Rings)/np.std(abalone["Length"])) print("For 1 unit of change in Length, we can predict {} units of change in Rings".format(B1)) #Calculating the Intercept B0 = abalone.Rings.mean() - (B1 * abalone["Length"].mean()) B0 #Plotting the line of best fit plt.rcParams["figure.figsize"] = (12,7) abalone["Rings_line"] = B0 + (B1 * abalone["Length"]) plt.scatter(abalone["Length"],abalone.Rings) # create the main scatter plot plt.plot(abalone["Length"], abalone.Rings_line) # plot the regression line plt.ylabel("Dependent Variable") plt.xlabel("Independent Variable") #Split into Training and Test Sets from sklearn.model_selection import train_test_split aba_y = pd.DataFrame(abalone.Rings) aba_x =pd.DataFrame(abalone["Length"]) abax_train, abax_test, abay_train, abay_test = train_test_split(aba_x, aba_y, test_size=0.2, random_state=42) #Instantiating the linear model from sklearn.linear_model import LinearRegression lr = LinearRegression(fit_intercept=True, n_jobs=4) fit = lr.fit(abax_train, abay_train) #intercept lr.intercept_ #Coefficients coef_aba = pd.DataFrame({"feature": "Length",
Answered Same DayJun 03, 2021

Answer To: The Abalone Data Set was acquired from the open-source UCI Machine Learning repository...

Pooja answered on Jun 03 2021
141 Votes
Abalone Data
Abalone Data
Student ID:
DATA
    Attribute    Data Type    Units    Description
    Sex    nominal    N/A    M,
F, and I (infant)
    Length    continuous    mm    Longest shell measurement
    Diameter    continuous    mm    perpendicular to length
    Height    continuous    mm    with meat in shell
    Whole weight    continuous    grams    whole abalone
    Shucked weight    continuous    grams    weight of meat
    Viscera weight    continuous    grams    gut weight (after bleeding)
    Shell weight    continuous    grams    after being dried
    Rings    integer    N/A    +1.5 gives the age in years
Length of Males and Females
There is not much difference in the median Longest shell measurement of males and females.
There is presence of outliers in both the data sets.
With p<5%, reject Ho and conclude that that mean of length is not the same between two sexes of abalone.
There is sufficient evidence to conclude that growth pattern is different between male and female.
Rings of female and male
The distribution of rings for males and females is skewed to the right. There are very few males and females with high values of rings.
With Whitney...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here