There are two python scripts uploaded. The function does the job to search news articles on Google and return the information of title, timestamp, url, source within the given date time range for the...

2 answer below »

There are two python scripts uploaded. The function does the job to search news articles on Google and return the information of title, timestamp, url, source within the given date time range for the SP500 stock names. For example, it search articles related to Apple or AAPL (company name or ticker) and return information in a csv. There are 500 stocks/tickers in the list. Run the codes you will see.

Now, your task in this assignment is look into entity extraction topic.Like how to identify the subject the article is talking about. Then we can map to our knowledge graph to bring in more entity information (even though right now we only have sp500 companies and only company name and ticker, we will enhance it in a later stage). Please research on this and leverage different available libraries. Please make the notebook illustrative and informative with appropriate comments and markdowns. The eventual goal is to let the first time reader understand the mechanism behind the entity extraction after reading the notebook.

If you run the python file news.ipynb, you will get the following outputs As you can see the outputs is in csv format and has four attributes: title, timestamp, url, source. This takes a long time to run as the function scrape 100 articles for each SP500 stocks, for a sum of over 490,000 articles totally. Below is what the generated csv looks like, part of it: You should be able to get a copy of this file if you run the script and make sure you directory is the same where you put the script at. Now there are four columns as you can see. Like I mentioned early, this assignment is about entity extraction. Please do some experiments and implement necessary libraries in order to extract the entity for each articles. Make this the 5th column right next to Source. So eventually it looks like this below. By analyzing the title, we only extract the entity, means we only care which entity is being mentioned or discussed. This is just an example, please do this for the entire csv not just Apple. Thank you. There is one more task where you need to amend the code. Nothing you can’t handle. I want you to modify the codes so that in the Source column, only the source name is displayed. Instead of showing {'href': 'https://www.apple.com', 'title': 'Apple'}, it should only show Apple. I’m expecting to see a final output csv like this:

news-yqerais5.ipynb sp500-stock-jczltx10.py instructions-update-v3xk3ccb-vj45rlgj.pdf

Answered 4 days AfterNov 30, 2022

Answer To: There are two python scripts uploaded. The function does the job to search news articles on Google...

Mohd answered on Dec 05 2022

40 Votes

SOLUTION.PDF

There are two python scripts uploaded. The function does the job to search news articles on Google and return the information of title, timestamp, url, source within the given date time range for the...

Answer To: There are two python scripts uploaded. The function does the job to search news articles on Google...

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment