ReadBig Data Processing with Apache Spark – Part 1: Introduction(Links to an external site.) https://www.infoq.com/articles/apache-spark-introduction/ (use this link) Under the section “Sample Spark...

1 answer below »

ReadBig Data Processing with Apache Spark – Part 1: Introduction(Links to an external site.)


https://www.infoq.com/articles/apache-spark-introduction/ (use this link)


Under the section “Sample Spark Application” are instructions to install the JDK and SPARK. Under the “Word Count Application” section are instructions for writing word count code.



  1. Following the instructions provided, install the JDK and SPARK

  2. Following the instructions provided, cache the README.md file

  3. Count the words in the README.md file and answer the set of questions below.

    • How many times is the word "Hadoop" counted when the tutorial has printed out all the word counts?

    • Which is the most common word used in the file? How many times does the word occur?

    • Which word occurs the fewest times? How many times does the word occur?



  4. Then look at the web console (http://<>:4040/jobs). How many seconds did it take to complete the word count job?


Submit the code, output and explanation of your steps with screenshots with date/time.



Answered 10 days AfterFeb 10, 2022

Answer To: ReadBig Data Processing with Apache Spark – Part 1: Introduction(Links to an external site.)...

Sandeep Kumar answered on Feb 21 2022
109 Votes
Step 1: installation and running java and spark
Step 2: writing the scala word counter program
Had
oop was there 9 times
The most frequent word was spark 45 times
Variable was the least frequent word amongst others, appearing only once.
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here