Cleaning up the data (corpus) based on the provided instruction below. 1) Remove all the numbers (EXCEPT for 19 ) 2) Remove the dates/month names (January, February, etc...) 3) Remove fund names...

1 answer below »
Cleaning up the data (corpus) based on the provided instruction below.

1) Remove all the numbers (EXCEPT for
19) 2) Remove the dates/month names (January, February, etc...)

3) Remove fund names (i.e.,
invesco, pimco, blackrock, TIAA, vanguard,
etc.

4) There are a lot of joined words example, "localregional" when there should be a space "local regional". Correct the errors of joined words using one or more of the following approaches (or using any approach):




Generate a dictionary from the corpus and pass it (or the whole corpus?) through a script that either identified misspellings, errors, etc. or can compare it with an english language dictionary. A quick search for cleaning corpus scripts suggested this as a one such possible script:
https://predictivehacks.com/languagetool-grammar-and-spell-checker-in-python/#:~:text=LanguageTool%20is%20an%20open%2Dsource,through%20a%20command%2Dline%20interface.

Answered Same DayApr 20, 2021

Answer To: Cleaning up the data (corpus) based on the provided instruction below. 1) Remove all the numbers...

Sandeep Kumar answered on Apr 21 2021
119 Votes
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here