In this assignment, you will work with a dataset about housing prices in Kings County, WA (file here download). As an up and coming real estate agent, you want to know what is the best price for all...

1 answer below »

In this assignment, you will work with a dataset about housing prices in Kings County, WA (file heredownload). As an up and coming real estate agent, you want to know what is the best price for all the houses in the area. Knowing this will give you an upper hand in helping your customers get the best price. To be able to accomplish this goal you are going to test out if data mining techniques will help you achieve this goal.

  • Build a multiple linear regression model (all variables except ID and date) and interpret key learnings from the model
  • Build A Clustering Model (KNN) When N=3 (for R follow Module Video and for Python:Link(Links to an external site.)andhttps://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html#sklearn.cluster.KMeans(Links to an external site.). Interpret what you see from the results
  • Build A PCA Model When N=3 (for R follow Module Video and for Python:https://towardsdatascience.com/pca-using-python-scikit-learn-e653f8989e60(Links to an external site.))
  • Compare and contrast the results from the model. What are key learnings that you now know about houses in the area due to modeling. Are you going to use any of these models for your work?

. You will do this work and submit the report independently.

  • Analysis:Based on the output, analyze andcomparetheresults of each model.
    • EDA & Data Cleansing
    • Model Building
    • What are the key insights to each model
    • How does interprability change between the models
    • What is different about the results of each model
  • Recommendations:Provide recommendations for actions to be taken based on your interpretation. Support thosewith the data. Explain why and what explicit variables you suggest incorporating.

Your report must be a 5 page Word document (no shorter, no longer). Include an appendix of visualizations and attach code segment as needed. The report format should follow APA formatting (12pt, Times New Roman font).

Answered Same DayAug 01, 2021

Answer To: In this assignment, you will work with a dataset about housing prices in Kings County, WA (file here...

Mohd answered on Aug 02 2021
56 Votes
Untitled
Untitled
-
8/2/2021
li
ary(readr)
li
ary(magrittr)
li
ary(dplyr)
li
ary(ggplot2)
li
ary(rmarkdown)
li
ary(tidyverse)
li
ary(readr)
kings <- read_csv("kings.csv")
hist(kings$bedrooms)
hist(kings$sqft_living)
ggplot(data=kings, aes(pri
ce,kings$sqft_living))+
geom_point()+
facet_wrap(~bedrooms)+
ggtitle("Scatter plot of price vs living area")
skimr::skim(kings)
Data summary
    Name
    kings
    Number of rows
    21613
    Number of columns
    16
    _______________________
    
    Column type frequency:
    
    numeric
    15
    POSIXct
    1
    ________________________
    
    Group variables
    None
Variable type: numeric
    skim_variable
    n_missing
    complete_rate
    mean
    sd
    p0
    p25
    p50
    p75
    p100
    hist
    id
    0
    1
    4.580302e+09
    2.876566e+09
    1000102
    2.123049e+09
    3.90493e+09
    7.3089e+09
    9900000190.0
    ▇▇▃▆▅
    price
    0
    1
    5.401822e+05
    3.673622e+05
    75000
    3.219500e+05
    4.50000e+05
    6.4500e+05
    7700000.0
    ▇▁▁▁▁
    bedrooms
    0
    1
    3.370000e+00
    9.300000e-01
    0
    3.000000e+00
    3.00000e+00
    4.0000e+00
    33.0
    ▇▁▁▁▁
    bathrooms
    0
    1
    2.110000e+00
    7.700000e-01
    0
    1.750000e+00
    2.25000e+00
    2.5000e+00
    8.0
    ▃▇▁▁▁
    sqft_living
    0
    1
    2.079900e+03
    9.184400e+02
    290
    1.427000e+03
    1.91000e+03
    2.5500e+03
    13540.0
    ▇▂▁▁▁
    sqft_lot
    0
    1
    1.510697e+04
    4.142051e+04
    520
    5.040000e+03
    7.61800e+03
    1.0688e+04
    1651359.0
    ▇▁▁▁▁
    floors
    0
    1
    1.490000e+00
    5.400000e-01
    1
    1.000000e+00
    1.50000e+00
    2.0000e+00
    3.5
    ▇▅▁▁▁
    waterfront
    0
    1
    1.000000e-02
    9.000000e-02
    0
    0.000000e+00
    0.00000e+00
    0.0000e+00
    1.0
    ▇▁▁▁▁
    view
    0
    1
    2.300000e-01
    7.700000e-01
    0
    0.000000e+00
    0.00000e+00
    0.0000e+00
    4.0
    ▇▁▁▁▁
    condition
    0
    1
    3.410000e+00
    6.500000e-01
    1
    3.000000e+00
    3.00000e+00
    4.0000e+00
    5.0
    ▁▁▇▃▁
    grade
    0
    1
    7.660000e+00
    1.180000e+00
    1
    7.000000e+00
    7.00000e+00
    8.0000e+00
    13.0
    ▁▁▇▂▁
    sqft_above_main
    0
    1
    1.788390e+03
    8.280900e+02
    290
    1.190000e+03
    1.56000e+03
    2.2100e+03
    9410.0
    ▇▃▁▁▁
    sqft_basement
    0
    1
    2.915100e+02
    4.425800e+02
    0
    0.000000e+00
    0.00000e+00
    5.6000e+02
    4820.0
    ▇▁▁▁▁
    yr_built
    0
    1
    1.971010e+03
    2.937000e+01
    1900
    1.951000e+03
    1.97500e+03
    1.9970e+03
    2015.0
    ▂▃▇▇▇
    yr_renovated
    0
    1
    8.440000e+01
    4.016800e+02
    0
    0.000000e+00
    0.00000e+00
    0.0000e+00
    2015.0
    ▇▁▁▁▁
Variable type: POSIXct
    skim_variable
    n_missing
    complete_rate
    min
    max
    median
    n_unique
    date
    0
    1
    2014-05-02
    2015-05-27
    2014-10-16
    372
summary(kings)
## id ...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here