In Part A your task is to answer a question about the data in an unprocessed text file using Hive. When you click the panel on the right you'll get a connection to a server that has, in your home...

1 answer below »

In Part A your task is to answer a question about the data in an unprocessed text file using Hive. When you click the panel on the right you'll get a connection to a server that has, in your home directory, a text file called "walden.txt", containing some sample text (feel free to open the file and explore its contents)(it's an extract from Walden, by Henry David Thoreau).



In this text file, each line is a
sentence. It is worth noting that there are
multiple spaces
at the end of each line in this unprocessed text file.



Your task is to
find the average word lengths according to the first letters of sentences. For example, given a toy input file as shown below:




Aaa bbb cc. Ab b.




The output should be:




Letter A 2.6




Because, for A, we have (3*3+2*2)/5 = 2.6.



You can assume that sentences are separated by full stops, and words are separated by spaces. For simplicity, we
include all punctuations, like ',' and '.', when calculating word length, like what we did in
example1
and
example2. (So, the length of 'cc.' is 3 instead of 2.) The case of letters can be ignored.



Given the walden.txt file as input, the format of the output is "letter:
avg_word_length" (The result should be rounded to two decimal places, with round(x,2) ), as shown below:




Letter A 4.17 Letter B 4.82 Letter F 4.18 Letter I 4.16 Letter S 4.32 Letter T 4.09 Letter W 4.89





Write a Hive script to do this. A file called "script.hql" has been created for you - you just need to fill in the details. You should be able to modify Hive scripts that you have already seen in this week's content. You might use some User-Defined Functions (UDFs) which can be found
here.

Answered 1 days AfterAug 12, 2022

Answer To: In Part A your task is to answer a question about the data in an unprocessed text file using Hive....

Karthi answered on Aug 13 2022
61 Votes
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here