7CS039 Assessment Statistics for AI & Data Science Assessment Overview Your assessment takes the form of an exploratory data analysis using R. You must include screenshots of your R code and you must...

1 answer below »
attached


7CS039 Assessment Statistics for AI & Data Science Assessment Overview Your assessment takes the form of an exploratory data analysis using R. You must include screenshots of your R code and you must include charts and graphics produced in R as appropriate. Your assessment should be at least five pages in length, including images, but it should not be more than ten pages. The assessment should be typed and Harvard style referencing should be used where appropriate. The Data For this assessment you should use the “Survival from Malignant Melanoma” data set which is available on canvas. The data consists of measurements made on patients with malignant melanoma. Each patient had their tumour removed by surgery at the Depart- ment of Plastic Surgery, University Hospital of Odense, Denmark during the period 1962 to 1977. The surgery consisted of complete removal of the tumour together with about 2.5cm of the surrounding skin. Among the measurements taken were the thickness of the tumour and whether it was ulcerated or not. These are thought to be important prog- nostic variables in that patients with a thick and/or ulcerated tumour have an increased chance of death from melanoma. Patients were followed until the end of 1977. The data frame contains the following columns. • time - Survival time in days since the operation. • status - The patients status at the end of the study. 1 indicates that they had died from melanoma, 2 indicates that they were still alive and 3 indicates that they had died from causes unrelated to their melanoma. • sex - The patients sex; 1=male, 0=female. • age - Age in years at the time of the operation. • year - Year of operation. • thickness - Tumour thickness in mm. • ulcer - Indicator of ulceration; 1=present, 0=absent. 1 Outline Your assessment should include, at least, the following: (i) Appropriate summary statistics for each of the variables in the data set and a commentary on the values of these statistics. (ii) Appropriate graphical summaries of each of the variables in the data set and a commentary on any emerging aspects or trends. (iii) A regression analysis and appropriate correlation computations for the relationship between the following variables time ∼ thickness time ∼ age thickness ∼ age (iv) A commentary on any observed relationships between the variables in part (iii). (v) Appropriate two sample significance tests for the three pairs of variables in part (iii) grouped by gender. (vi) For the three variables in part (iii) (grouped by gender as appropriate), QQ-plots and a commentary about the underlying distribution of the variables. (vii) A discussion of the insights generated from the data as well as a recommenda- tions on any aspects that should be investigated in more detail. Submission You should upload your complete assessment to the portal on canvas before the due date shown on your own canvas. I do not reproduce the due date here because for some of you it will be different due to extensions, extenuating circumstances etc. 2
Answered 2 days AfterFeb 08, 2021

Answer To: 7CS039 Assessment Statistics for AI & Data Science Assessment Overview Your assessment takes the...

Suraj answered on Feb 11 2021
143 Votes
Assignment
Topic: Statistics for AI & Data Science using R
Submitted By:
Submitted To:
Date: 11/02/2021
(i)
The first task is to load the csv file in R and run the following comma
nds for the summary statistics:
df<-read.csv("C:/Users/Hp/Downloads/data.csv")
summary(df)
The output is given as follows:
Here, in time variable the minimum value is 10 days and maximum is 5565 days. The mean survival time since the operation is 2153 days.
For the age variable the minimum value is 4 years and maximum value is 95 years. The mean age at the time of operation is 52.46 years.
The thickness for tumour has minimum value 0.10 mm and maximum value is 17.42 mm. The average tumour thickness is 2.92 mm.
The 1st quartile means that 25% observations are less than a particular value and 3rd quartile means 75% observations are less than a particular value.
(ii)
The appropriate/useful plots are given for the three important variables as follows:
The R code is given as follows:
plot(df$time,df$age)
plot(df$time,df$thickness)
plot(df$age,df$thickness)
hist(df$time)
hist(df$age)
hist(df$thickness)
boxplot(df$time)
First is the histogram for distribution of the variables,
The distribution of the time is approximately normally distributed.
The distribution of the age variable is approximately normal.
The distribution of the thickness variable is positively skewed.
To see the relationship between two variables, scatter plot is an appropriate plot. It is drawn as follows:
To detect the outliers in the dataset, boxplot is good plot. It is plotted for time variable as follows:
Thus, from the above plot it is detected that there is one outlier present in the time variable.
(iii)
A regression analysis is used to make a model to predict the value of dependent variable by the use of independent variables. Here, we will...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here