[Title of your report]IntroductionProvides clear and concise context for the report, introducing the purpose of the analyses that follow. As a guideline, one paragraph will be sufficient....

1 answer below »
This is part of order 101341, I need a different person I got Bad feedback and many unanswered questions. Please needs to be thorough it is my final and is 40% . Need R codes



[Title of your report] Introduction Provides clear and concise context for the report, introducing the purpose of the analyses that follow. As a guideline, one paragraph will be sufficient. [Delete instruction text before submitting] [Type your introduction here] Motivation and Methodology Describe the motivation for the analysis methods and tools that you have used in each section. This section must answer the questions what you did, why you did that and how you did it. As a guideline, maximum two paragraphs will be sufficient. [Delete instruction text before submitting] [Type your description of methods here] Results & Discussion Summarise the main results of your analyses in each section I to IV. You may use subsections, tables etc. as you see fit. Present and discuss results in a clear and simple way: Present findings of statistical analyses in a logical sequence. Do not include code or dumps of R output. Results should either be incorporated into sentences or formatted appropriately to be neatly presented. Interpret your findings by discussing their practical significance. Discuss shortcomings, if any. As a guideline, maximum three paragraphs will be sufficient. [Delete instruction text before submitting] [Type your results and discussion here] Recommendations & Conclusions Based on your analysis, provide a brief overall discussion summarising/interpreting the results of the analyses you performed and final conclusions based on the hypothesis tested. As a guideline, one paragraph will be sufficient. Do not introduce any new information in this section, and do not simply repeat statements made elsewhere in your report! [Delete instruction text before submitting] [Type your recommendations and conclusions here] 1 [Title of your report] Introduction Provides clear and concise context for the report, introducing the purpose of the analyses that follow. As a guideline, one paragraph will be sufficient. [Delete instruction text before submitting] [Type your introduction here] Motivation and Methodology Describe the motivation for the analysis methods and tools that you have used in each section. This section must answer the questions what you did, why you did that and how you did it. As a guideline, maximum two paragraphs will be sufficient. [Delete instruction text before submitting] [Type your description of methods here] Results & Discussion Summarise the main results of your analyses in each section I to IV. You may use subsections, tables etc. as you see fit. Present and discuss results in a clear and simple way: Present findings of statistical analyses in a logical sequence. Do not include code or dumps of R output. Results should either be incorporated into sentences or formatted appropriately to be neatly presented. Interpret your findings by discussing their practical significance. Discuss shortcomings, if any. As a guideline, maximum three paragraphs will be sufficient. [Delete instruction text before submitting] [Type your results and discussion here] Recommendations & Conclusions Based on your analysis, provide a brief overall discussion summarising/interpreting the results of the analyses you performed and final conclusions based on the hypothesis tested. As a guideline, one paragraph will be sufficient. Do not introduce any new information in this section, and do not simply repeat statements made elsewhere in your report! [Delete instruction text before submitting] [Type your recommendations and conclusions here] 1 MATH 1081 UO Mathematical Methods for Data Analytics 2 Assessment 2.2 : Project Part B Instructions: • Structure of the assessment: This assessment is worth 35% of your final grade and is due no later than 5 pm on Friday, Week 10. This assessment consists of 20 questions under 4 sections to answer and a report writing. Your submission will be marked out of 100. • Use of R: This project is a guided case study. It is important that you follow any instructions or guidance in the questions, such as “Use R” where required. You must provide your R codes to get full marks wherever you use R to answer the questions. Upload your R script and screenshot the R codes and outputs in your answer sheet. • Save your work: Save your answer sheet as a pdf named “your student ID Assessment 2.2 MATH1081.pdf”. • Show your work: Show all necessary steps so that the reader can follow your solution procedure. • Submit your work: Create a folder with 1. your answer sheet 2. your R script and 3. the final dataset you used for the analysis in “.csv” format. Name your folder with your student ID and upload it as a zip file. • Acknowledgement of work: When submitting online, you acknowledge that the submitted assignment is your own work unless otherwise stated. 1 • Academic integrity: The University’s policy on academic misconduct will be strictly applied. Here are some tips to avoid academic misconduct: – Do not copy from any printed or electronic source or from any person. – Write your own solutions. You may discuss your work with others, but you must write up your solutions yourself. You are not allowed to use some- one else’s written work when writing up your submission. – Do not give inappropriate help. Giving inappropriate help is just as serious as receiving it and will have the same consequences. Do not show your completed exercise to others. Dispose of drafts so that no one can access them. – Acknowledge help and joint work. If you receive any help from another source (for example, students, tutors, friends, internet), you must make a note of it on your submission. • Late submission: Any late submission will attract a penalty of 5 marks avail- able per day for five days. The cut-off time is 5 pm each day. After five days from the assessment due date, no submissions will be marked, and zero marks will be granted. 2 Assessment Task Overview Photo by Luke van Zyl on Unsplash This assessment is based on the data in Melbourne housing.csv file. It con- tains residential building data, including construction cost, sales prices, some project variables, and some economic variables corresponding to real estate in Melbourne, Aus- tralia. The objective is to understand, analyse and develop a model to predict the sales price (Price). A brief description of variables is provided below. 3 https://unsplash.com/ Data dictionary Variable Description Suburb Suburb Address Street address Rooms number of Rooms Type Type of Housing Price Actual sales price (local currency) Method S - property sold; SP - property sold prior; PI - property passed in; PN - sold prior not disclosed; SN - sold not disclosed; NB - no bid; VB - vendor bid; W - withdrawn prior to auction; SA - sold after auction; SS - sold after auction price not disclosed; N/A - price NA. Type br - bedroom(s); h - house,cottage,villa, semi,terrace; u - unit, duplex; t - townhouse; dev site - development site; o res - other residential. SellerG Real Estate Agent Date Date sold Distance Distance from CBD in Kilometres Regionname General Region (West, North West, North, North east . . . etc) Propertycount Number of properties that exist in the suburb. Bedroom2 Scraped # of Bedrooms (from different source) Bathroom Number of Bathrooms Car Number of carspots Landsize Land Size in Metres BuildingArea Building Size in Metres YearBuilt Year the house was built CouncilArea Governing council for the area Lattitude: Self explanatory Longtitude Self explanatory Table 1: Data dictionary Melbourne Housing.csv Assessment Task Details You have to complete this assessment in two sections. 1. A list of questions to answer that comprising of 72% of the total grade (72 marks). Write your answers clearly in a well-organised manner with accurate notations. Label the questions and sub-questions. 2. A report summarising your analysis in Section 1 that comprising of 28% of the total grade (28 marks). A guide for the project report is provided in learnonline. 4 Section 1: Questions [I] Descriptive Statistics & Exploratory Analysis: The data is not always cleaned and presented in a working manner. There are some unnecessary columns and variables which do not have full completed entries. In addi- tion, you might have errors in this dataset, and you have to fix them before you start analysing. You can do data cleansing in R or Excel. (a). Choose & filter a single house ‘Type’. Use this for the remainder of the assign- ment as completed in Project Part A. Create a subset dataset of size at least 250 with the continuous variables and ‘Postcode” and ‘Year= 2018’. Hint: Use na.omit function. For full marks, provide a screenshot of the first 30 row entries of the cleaned dataset in R. [2 marks] (b). Use R to produce histograms of all the possible continuous variables. [4 marks] (c). Use R to produce descriptive statistics for all the variables in part (a). [4 marks] (d). Use R to produce boxplots describing the continuous variables side by side. This should be a picture of one plot. [2 marks] (e). Using your outputs from (a) to (d), comment on the shape of the distribution for each variable. In particular, briefly describe in a table form: • Whether there is one peak, or multiple peaks, in the distribution; • The shape of the distribution (skewed or symmetric); • Whether there appear to be any outliers. [5 marks] Example table layout: Variable Number of peaks in the distribution One/multiple Shape of the distribution Left-skewed/Right-Skewed /Symmetric Outliers present Yes/No (f). Which central tendency (mean/median) and dispersion (standard deviation/inter quartile range) measures are the most appropriate to summarise the variables numerically? Justify your choice of measures. Provide your answers in a table form. For full marks, provide the general interpretation for the listed summary measures. [4 marks] Example table layout: 5 Variable Measure of Central tendency mean/median Measure of dispersion SD/IQR Justification (g). Use R to test the variables for Normality. Briefly describe whether the data fol- lows a Normal distribution. Tabulate your answer. [4 marks] Example table layout: Variable P-value Reject H0 Yes/No Normally distributed Yes/No [25 marks] [II] Normal Distribution & Central Limit Theorem: (h). Use R to calculate the probability that the average house (unit) Price will be more than $1,000,000 ($600,000) using the provided data. For full marks, clearly state the distribution of average sales price and the correct probability statement. Interpret your final answer. [5 marks] (i). Use R to calculate the probability that the average house (unit) Price will be less than $1,000,000 ($600,000). clearly state the correct probability statement. Provide an interpretation to your final answer. [2 marks] (j). What is the cut off for the probability of an average Price higher than the cutoff would be 5%? For full marks, provide a correct probability statement. [3 marks] (l). Using R, produce a random sample of size 30 for variable BuildingArea by ran- domly selecting 30 values without replacement from the BuildingArea variable in the provided data. Repeat the same for Landsize variable. For full marks, provide a screenshot of your samples in a table format. Hint: Use data.frame() to tabulate samples [4 marks] (k). Use R to produce the descriptive statistics for each sample in part (l), and store the information in another table, please ensure you state the mean and standard deviation of each sample. [2 marks] 6 (m). Determine the sampling distribution of means for BuildingArea and Landsize and state the parameters based off your samples in part (l). Justify your answer, quoting any theorems you used. [3 marks] (n). Calculate the probability that the average Landsize is greater than 650 based on your sampling distribution of the means from part (m). For full marks, provide a correct probability statement and interpret the final answer. [3 marks] [22 marks] [III] Estimating & determining the population mean: (o). Manually construct 95% confidence interval for the population mean for Buildin- gArea and Landsize based on the sampled data in part (l). Use R to verify the results. Interpret your confidence interval. Hint: t29,0.025 = 2.045 [3 marks] (p). Repeat the previous question for a 99% confidence interval for the population mean of the same variables based on the sampled data in part (l). Hint: t29,0.005 = 2.756 [3 marks] (q). Compare and contrast the 99% confidence intervals for the two variables in part (p), and comment whether the means of the original dataset Melbourne Housing for BuildingArea and Landsize are included in these interval estimates. Justify your answer. [2 marks] [8 marks] [IV] Testing claims & Hypothesis Tests: Hint: Use the whole dataset to answer the question in this section. For full marks, define the parameters of interest appropriately, set-up of the null and alternative hypotheses, clearly state the decision and the conclusion of the test (r). The project management team of these housing projects is debating that there is no difference between the variables BuildingArea and Landsize. Use R to statis- tically test at a 5% level of significance if there is a difference in the average of BuildingArea and Landsize. Give a verdict and conclusion to your analysis. [5 marks] (s). They further claim that there is a difference between the variables BuildingArea and Landsize. Use R to statistically test at a 1% level of significance if there is a difference in the average of BuildingArea and Landsize. [5 marks] 7 (t). Another claim the project management team is making is that ideally the av- erage house (unit) Price should be greater than $1,000,000 ($600,000) using R. Statistically test at a 10% level of significance whether the average house (unit) price is greater than $1,000,000 ($600,000). Include a diagram for the hypothesis test. [5 marks] (u). What does it mean by
Answered 6 days AfterMar 15, 2023University Of South Australia

Answer To: [Title of your report]IntroductionProvides clear and concise context for the report, introducing...

Mohd answered on Mar 20 2023
36 Votes
-
-
-
2023-03-20
library(readr)
library(magrittr)
library(dplyr)
finaldata <- read_csv("finaldata.csv", col_types = cols(Date = col_date(format = "%d/%m/%Y")))
Section 1: Questions [I] Descriptive Statistics & Exploratory Analysis: The data is not always cleaned and presented in a working manner. There are some unnecessary columns and variables which do not have full completed entries. In addition, you might have errors in this dataset, and you have to fix them before you start analysing. You can do data cleansing in R or Excel.
(a). Choose & filter a single h
ouse ‘Type’. Use this for the remainder of the assignment as completed in Project Part A. Create a subset dataset of size at least 250 with the continuous variables and ‘Postcode” and ‘Year= 2018’. Hint: Use na.omit function. For full marks, provide a screenshot of the first 30 row entries of the cleaned dataset in R. [2 marks]
finaldata<-na.omit(finaldata)
finaldata1<-finaldata%>%
filter(Type=="h")
head(finaldata1,30)
## # A tibble: 30 × 21
## Suburb Address Rooms Type Price Method SellerG Date Dista…¹ Postc…²
##
## 1 Abbotsf… 25 Blo… 2 h 1.03e6 S Biggin 2016-02-04 2.5 3067
## 2 Abbotsf… 5 Char… 3 h 1.46e6 SP Biggin 2017-03-04 2.5 3067
## 3 Abbotsf… 55a Pa… 4 h 1.6 e6 VB Nelson 2016-06-04 2.5 3067
## 4 Abbotsf… 124 Ya… 3 h 1.88e6 S Nelson 2016-05-07 2.5 3067
## 5 Abbotsf… 98 Cha… 2 h 1.64e6 S Nelson 2016-10-08 2.5 3067
## 6 Abbotsf… 10 Val… 2 h 1.10e6 S Biggin 2016-10-08 2.5 3067
## 7 Abbotsf… 40 Nic… 3 h 1.35e6 VB Nelson 2016-11-12 2.5 3067
## 8 Abbotsf… 16 Wil… 2 h 1.31e6 S Jellis 2016-10-15 2.5 3067
## 9 Abbotsf… 42 Hen… 3 h 1.2 e6 S Jellis 2016-07-16 2.5 3067
## 10 Abbotsf… 78 Yar… 3 h 1.18e6 S LITTLE 2016-07-16 2.5 3067
## # … with 20 more rows, 11 more variables: Bedroom2 , Bathroom ,
## # Car , Landsize , BuildingArea , YearBuilt ,
## # CouncilArea , Lattitude , Longtitude , Regionname ,
## # Propertycount , and abbreviated variable names ¹​Distance, ²​Postcode
(b). Use R to produce histograms of all the possible continuous variables. [4 marks]
par("mfrow"=c(3, 4))
hist(finaldata1$Rooms, col="blue",main = "Rooms")
hist(finaldata1$Price, col="blue",main = "Price")
hist(finaldata1$Distance, col="blue",main = "Distance")
hist(finaldata1$Postcode, col="blue",main = "Postcode")
hist(finaldata1$Bedroom2, col="green",main = "Bedroom_2")
hist(finaldata1$Bathroom, col="green",main = "Bathroom")
hist(finaldata1$Car, col="green",main = "Car")
hist(finaldata1$Landsize, col="green",main = "Landsize")
hist(finaldata1$BuildingArea, col="red",main = "Building Area")
hist(finaldata1$YearBuilt, col="red",main = "Yearbuilt")
hist(finaldata1$Propertycount, col="red",main = "Proprtycount")
(c). Use R to produce descriptive statistics for all the variables in part (a). [4 marks]
skimr::skim(finaldata1)
Data summary
    Name
    finaldata1
    Number of rows
    4088
    Number of columns
    21
    _______________________
    
    Column type frequency:
    
    character
    7
    Date
    1
    numeric
    13
    ________________________
    
    Group variables
    None
Variable type: character
    skim_variable
    n_missing
    complete_rate
    min
    max
    empty
    n_unique
    whitespace
    Suburb
    0
    1
    3
    18
    0
    280
    0
    Address
    0
    1
    8
    22
    0
    4037
    0
    Type
    0
    1
    1
    1
    0
    1
    0
    Method
    0
    1
    1
    2
    0
    5
    0
    SellerG
    0
    1
    1
    17
    0
    172
    0
    CouncilArea
    0
    1
    4
    17
    0
    31
    0
    Regionname
    0
    1
    16
    26
    0
    8
    0
Variable type: Date
    skim_variable
    n_missing
    complete_rate
    min
    max
    median
    n_unique
    Date
    0
    1
    2016-02-04
    2017-08-12
    2016-11-27
    51
Variable type: numeric
    skim_variable
    n_missing
    complete_rate
    mean
    sd
    p0
    p25
    p50
    p75
    p100
    hist
    Rooms
    0
    1
    3.31
    0.85
    1.00
    3.00
    3.00
    4.00
    8.00
    ▂▇▆▁▁
    Price
    0
    1
    1273016.20
    720060.06
    131000.00
    785750.00
    1100000.00
    1555000.00
    9000000.00
    ▇▁▁▁▁
    Distance
    0
    1
    10.57
    5.96
    1.30
    6.68
    9.70
    13.10
    47.40
    ▇▆▁▁▁
    Postcode
    0
    1
    3100.99
    94.27
    3002.00
    3042.00
    3079.00
    3145.00
    3977.00
    ▇▁▁▁▁
    Bedroom2
    0
    1
    3.27
    0.86
    0.00
    3.00
    3.00
    4.00
    9.00
    ▁▇▅▁▁
    Bathroom
    0
    1
    1.69
    0.76
    1.00
    1.00
    2.00
    2.00
    8.00
    ▇▁▁▁▁
    Car
    0
    1
    1.75
    1.04
    0.00
    1.00
    2.00
    2.00
    10.00
    ▇▁▁▁▁
    Landsize
    0
    1
    513.57
    324.69
    0.00
    305.00
    541.00
    665.00
    8216.00
    ▇▁▁▁▁
    BuildingArea
    0
    1
    165.50
    96.20
    1.00
    112.00
    143.00
    194.00
    3112.00
    ▇▁▁▁▁
    YearBuilt
    0
    1
    1953.13
    38.63
    1196.00
    1925.00
    1955.00
    1980.00
    2018.00
    ▁▁▁▁▇
    Lattitude
    0
    1
    -37.80
    0.08
    -38.16
    -37.85
    -37.79
    -37.75
    -37.46
    ▁▂▇▂▁
    Longtitude
    0
    1
    144.99
    0.11
    144.54
    144.91
    145.00
    145.06
    145.53
    ▁▃▇▁▁
    Propertycount
    0
    1
    7164.28
    4242.28
    389.00
    3873.00
    6380.00
    9264.00
    21650.00
    ▆▇▃▂▁
(d). Use R to produce boxplots describing the continuous variables side by side. This should be a picture of one plot. [2 marks]
par("mfrow"=c(3, 4))
boxplot(finaldata1$Rooms, col="blue",main = "Rooms")
boxplot(finaldata1$Price, col="blue",main = "Price")
boxplot(finaldata1$Distance, col="blue",main = "Distance")
boxplot(finaldata1$Postcode, col="blue",main = "Postcode")
boxplot(finaldata1$Bedroom2, col="green",main = "Bedroom_2")
boxplot(finaldata1$Bathroom, col="green",main =...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here