This is a Data Science assignment, It is mostly math related questions. Please show all the calculations for each question and there is no coding required for these questions.1)We would like to...

1 answer below »
This is a Data Science assignment, It is mostly math related questions. Please show all the calculations for each question and there is no coding required for these questions.













1)


We would like to know if the age of a child is related to the number of cavities he or she has. The data are shown below. If there is a significant relationship, predict the number of cavities for a child of 11.

(20 points)









































































Age of child x







6







8







9







10







12







14







No. of cavities y







2







1







3







4







6







5













2)


Assume we gathered a random sample of the following dataset, where the independent variable (x) represents the number of hours a student studies, and the dependent variable y represents the exam score of the student. Is there a correlation between the two variables, and if so, how strong this correlation is?

(20 points)




































































































Hours of study(X)







Exam score(Y)







6







40







10







50







18







100







15







80







12







65







16







90










3)


The average age of a vehicle registered in the United States is 8 years, or 96 months. Assume that the standard deviation is 16 months. If a random sample of 36 vehicles is selected, find the probability that the mean of their ages is between 90 and 100 months. (10 points)




Hint: need to use the concept of the normal distribution and z score.










4)





Assume we gathered a random sample of the following dataset. Each column represents weekly sales of two stores. We would like to decide which store (A or B) most likely to predict their weekly sales with more certainty. (20 points)














































































































Store A












Store B








2000







2500







4500







6500







3000







2000







1500







5000







6000







1200







4200







7000
















5)


Assume we gather a random sample of the following dataset. We are trying to predict the body fat % of a person based on his/her weight in kg.

(30 points)










Table<br><br>Description automatically generated




a)


Find the best fitted line of the given data above.




b)


Find the R-squared value.




c)


Find the F value of the best fitted line.




d)


Why your best fitted line does better in predicting comparing to this line equation:




Y = 0.5x + 3.










6)


Build a Decision Tree Classification based on the following dataset. There are three independent variables (a1, a2, a3) that will help with the prediction, and the ‘Classification’ column is the dependent variable. (40 points)







Table<br><br>Description automatically generated













7)


Consider the following confusion matrix:

(10 points)















































































Predicted Yes











Predicted No











Actual Yes








95







5













Actual No









5







45













a)


Calculate the sensitivity, precision, and accuracy of the confusion matrix




b)


Define (give the values of) type I and type II errors in the given confusion matrix and explain the difference between the two.




Answered 3 days AfterFeb 26, 2023

Answer To: This is a Data Science assignment, It is mostly math related questions. Please show all the...

Shubham answered on Mar 01 2023
31 Votes
Question 1
The significant relationship between the age of a child and the number of cavities they have, it uses linear regression analysis. With the use of linear regression to model this relationship, the regressio
n equation is:
y = b0 + b1*x
where y is the number of cavities and x is the age of the child. b0 is the y-intercept and b1 is the slope of the line. It can estimate the values of b0 and b1 using the least squares method and the regression equation is:
y = 0.833 + 0.5*x
The coefficient of determination (R-squared) is 0.743, which means that 74.3% of the variation in the number of cavities can be explained by the age of the child.
To predict the number of cavities for a child of age 11, with simply substitute x = 11 into the regression equation:
y = 0.833 + 0.5*11 = 6.333
Therefore, it is predicted that a child of age 11 would have approximately 6 cavities.
Question 2
There is a correlation between the number of hours a student studies and their exam score. The correlation coefficient is a measure of the strength and direction of the linear relationship between two variables. The value of r ranges from -1 to 1, with values closer to -1 or 1 indicating a stronger linear relationship, and values closer to 0 indicating a weaker linear relationship.
The correlation coefficient is:
r = 0.951
The correlation coefficient is positive, indicating a positive linear relationship between the number of hours studied and the exam score (Liang et al. 2019). Additionally, the value of r is close to 1, indicating a strong linear relationship between the two variables. Therefore, it can conclude that there is a strong positive correlation between the number of hours a student studies and their exam score.
Question 3
It includes the use of Central Limit Theorem (CLT) and the properties of the normal distribution. The CLT states that for a large enough sample size, the distribution of the sample means will be approximately normal, regardless of the distribution of the population.
The population mean is μ = 96 months and the population standard deviation is σ = 16 months. In the probability that the mean of a sample of 36 vehicles is...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here