CIC362: Cloud Computing - BMCC Fall 2021 Professor Byron Program #2 specifications: Elastic MapReduce – 15 points Due: 1159pm Thu XXXXXXXXXX No credit received for late submission Your task for this...

It's in the PDF


CIC362: Cloud Computing - BMCC Fall 2021 Professor Byron Program #2 specifications: Elastic MapReduce – 15 points Due: 1159pm Thu 12-9-2021 No credit received for late submission Your task for this assignment is use AWS Elastic MapReduce (EMR) to process and analyze health data in your AWS S3 bucket with a big data framework cluster. You will launch a cluster using Spark and run a simple PySpark script stored in your AWS S3 bucket. 1. Accept invitation from AWS Academy to set up an Academic account 2. Run AWS EMR tutorial using your AWS Academy account: https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-gs.html 3. Obtain a screenshot (#1) of completion of EMR job Amazon EMR page > Steps tab … status = completed for all steps 4. Copy/Paste stderr log file and name it cis362_prog2_stderr_lastname.txt Amazon EMR page > Steps tab > view logs > stderr 5. Locate result report in Amazon S3. Use AWS console or AWS CLI to transfer result report from S3 bucket to your local host and rename it to cis362_prog2_results_lastname.txt. Obtain screenshot (#2) of the running of either AWS console or AWS CLI command 6. Create README.txt file with following information: a. course ID and section b. your full name c. the program assignment number and due date d. the program purpose e. the contents of the zip file 7. Paste your 2 screenshots onto a Word-type document named cis362_prog2_screenshots_lastname.docx. Store the Word-type document, stderr log file, result report file and README.txt file in a zip file named CIS362_prog2_lastname.zip 8. Submit your zip file as an attachment to an email message to [email protected] using a subject in this form: “csc362_prog2_lastname”. Do your own work. Students submitting copies of the same program will receive grades of zero for the assignment. Visit https://www.7-zip.org/ for an open source tool to create a zip archive. Here is a sample command: 7za a myzip.zip mydir/* 9. Do your own work. All students submitting copies of the same program will receive grades of zero for the assignment and may be referred to the Dean of Students, as described in the BMCC plagiarism policy. 10. Grading rubric Partial or extra credit will be awarded as follows: a. 80%: Run the EMR tutorial above “as is” to find top 10 RED violations b. 100%: part a plus modify the script and re-run the tutorial to find top 10 BLUE violations; use similar file naming conventions as in part a; store the Word-type document showing your 2 screenshots, stderr log file and result report file with updated README file in the same zip file as part a. c. 120%: part b plus use EMR to run the Wordcount application described in: https://github.com/Aliga8or/csds-spark-emr; use similar file naming conventions as in part a; take screenshots of cluster page and S3 page and post them onto a Word- type document; store the Word-type document, input text file and output result file with updated README file in the same zip file as part a. Note: • Choose an approved “big data” document to analyze • Replace each non-alpha character with space as follows, for example: Get-Content my_file.txt | Foreach {$_ -replace "[^a-zA-Z]", " "} | Set- Content "my_file_2.txt" (powershell) cat my_file.txt | sed 's/[^[:alpha:]]/ /g' > my_file_2.txt (linux) • Convert each lowercase letter to uppercase as follows, for example: (Get-Content my_file_2.txt -Raw).ToUpper() | Out-File my_file_3.txt (powershell) cat my_file_2.txt | tr [:lower:] [:upper:] > my_file_3.txt (linux)
Nov 09, 2021
SOLUTION.PDF

Get Answer To This Question

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here