Overview and Rationale Spark’s intended use is for data lakes which were discussed previously. It is important to be able to process these large data sets effectively with Spark. This assignment will...

1 answer below »


Overview and Rationale



Spark’s intended use is for data lakes which were discussed previously. It is important to be able to process these large data sets effectively with Spark. This assignment will provide you with experience and practice in using Spark to analyze a large data set.



Course Outcomes



This assignment is directly linked to the following key learning outcomes from the course syllabus:



  • Use methodologies to analyze big data sets and determine insights, including the use of Spark



Assignment Summary


For this assignment, you will download and process, with Spark, two of the following datasets OR two datasets your team selected for the final project. Note however, this is an individual assignment. If you choose your project dataset, your group may choose to incorporate your work into their assignment, but that is not required.


Dataset File Format ( Click this link to download pdf that has link to dataset -ALY6110_Spark Practice_with rubric-1.pdf



Actions
)



Annual House Price Indexes
(see Working Papers16-01,16-02, and16-04) -Five-Digit ZIP Codes (Developmental Index; Not Seasonally Adjusted)
Annual House Price Indexes
(see Working Papers16-01,16-02, and16-04) - Three-Digit ZIP Codes (Developmental Index;




Not (
Seasonally Adjusted)



Median Home Value – Zillow Home Value Index (ZHVI) by

Zip Code



[XLSX][XLSX][CSV]





Instructions


Select 2 datasets and use Spark to load the datasets to find insight of the details using different types of graphs and charts.


Write a 3-5 report that includes a section for each data set you choose to analyze.


For each data set include:



  • A description of the steps you took to perform the analysis,with screen shots

  • Results of your analysis

  • Your insights based on your analysis



Format & Guidelines



The paper should follow the following format:




  • Introduction

    Provide a short description of the dataset you analyzed and purpose for the analysis. Identify questions you are attempting to answer with or insights you want to gain from the analysis.


  • Analysis and results

    Outline your steps, with screen shots, and provide the results of your analysis. Connect the results and your analysis to the purpose described in the introduction. Be specific.


  • Insights

    Provide your insights based on your analysis. Connect your insights to the purpose of the analysis.

Answered 2 days AfterFeb 03, 2022

Answer To: Overview and Rationale Spark’s intended use is for data lakes which were discussed previously. It is...

Nithin answered on Feb 05 2022
112 Votes
Cryptocurrency Analysis using PySpark
—------------------------------------------------------------
----------------------------------------------
Installing PySpark first
Importing Necessary Modules
Creating Spark Session
Importing and Showcasing Dataset
Preprocessing
Data Visualization
( Performing Correlation Plotting against Bitcoin Dataset)
( Performing Scatter Plot against Bitcoin Cryptocurrency )
( Performing Line Plot against Bitcoin Cryptocurrency )
( Histogram analysis against Bitcoin )
( Determining weighted price per hour on Bitcoin )
Performing Feature Engineering
We have generated enough information...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here