MG5008 – Social Media Insights Tutorial GNIP Dataset Identify most active users: • Big Query (SQL) § Query Table: SELECT actor_preferredUsername, count(body) Number_of_Tweets FROM Table Name WHERE...

There are necessary programs such as R&Rstudio, Gephi and Tableau in use if the assignment is agreed upon the access to an email account and the big query will be given out.


MG5008 – Social Media Insights Tutorial GNIP Dataset Identify most active users: • Big Query (SQL) § Query Table: SELECT actor_preferredUsername, count(body) Number_of_Tweets FROM Table Name WHERE body IS NOT NULL GROUP BY actor_preferredUsername ORDER BY Number_of_Tweets DESC LIMIT No. of observations (optional) § Download as CSV Identify most visible users: • Big Query (SQL) § Query Table § Retrieve Reweets: SELECT actor_preferredUsername userscreenname, body, FROM Table Name WHERE body IS NOT NULL AND body LIKE "RT%" § Download as CSV § Save the file in your Working Directory § Rename the file as “retweets.csv” § Retrieve Replies: SELECT actor_preferredUsername userscreenname, body, FROM Table Name WHERE body IS NOT NULL AND body LIKE "@%" § Download as CSV § Save the file in your Working Directory § Rename the file as “replies.csv” • RStudio § Set the folder that contains the CSV file generated from Big Query as your Working Directory setwd(“directory”) § Save the R Script named “find_VisibleUsers.R” in your Working Directory § Open RStudio § Type “source(’find_VisibleUsers.R’)” in the console • Results § A file named “Visible_Users.csv” will be saved in your Working Directory § The file contains the following fields: a) Target (Username) b) Number_of_Retweets_Received (1) c) Number_of_Replies_Received (2) d) Visibility (= (1) + (2)) Content analysis - Most Recurring words: • Big Query (SQL) § Query table SELECT body tweet FROM Table Name WHERE body IS NOT NULL § Download as CSV § Save the file in your Working Directory § Rename the file as “text.csv” • RStudio § Set the folder that contains the CSV file generated from Big Query as your Working Directory setwd(“directory”) § Save the following R Scripts in your Working Directory: a) tweets_Cleaning.R b) find_FrequentWords_DataFrame.R c) Frequent_Occurring_Words.R § Open RStudio § Type “source(‘Frequent_Occurring_Words.R’)” in the console • Results § R will save two files in your Working Directory a) “Word Cloud.jpg” which contains a visualisation of the results b) “Frequent Terms.csv” which contains the list of words and corresponding frequencies. Content analysis - Most Co-Occurring words: • Big Query (SQL) § Query table SELECT body tweet FROM Table Name WHERE body IS NOT NULL § Download as CSV – § Save the file in your Working Directory § Rename the file as “text.csv” • RStudio § Set the folder that contains the CSV file generated from Big Query as your Working Directory setwd(“directory”) § Save the following R Scripts in your Working Directory: a) tweets_Cleaning.R b) find_Frequent_Cooccurring_Words_DataFrame.R c) Frequent_Cooccurring_Words.R § Open RStudio § Type “source(‘Frequent_Cooccurring_Words.R’)” in the console • Results § R will save four files in your Working Directory a) “Frequent Bi-grams.csv” which contains the frequency of each 2-word combination. b) “Frequent Tri-grams.csv” which contains the frequency of each 3-word combination. c) “Frequent Quad-grams.csv” which contains the frequency of each 4-word combination. d) “Frequent Penta-grams.csv” which contains the frequency of each 5-word combination § Note: If some of these files are empty and you don’t see any error in the R console, it means that none of the combinations exceeds the minimum threshold. The minimum threshold is defined in Frequent_Cooccurring_Words.R and has been set to 20. You can open the script and change it if needed. Sentiment analysis: • Big Query (SQL) § Query table SELECT body tweet FROM Table Name WHERE body IS NOT NULL § Download as CSV – § Save the file in your Working Directory § Rename the file as “text.csv” • RStudio § Set the folder that contains the CSV file generated from Big Query as your Working Directory setwd(“directory”) § Save the following R Scripts in your Working Directory: a) tweets_Cleaning.R b) Sentiment_Analysis.R c) Run_Sentiment_Analysis.R d) positive-words.txt e) negative-words.txt § Open RStudio § Type “source(‘Run_Sentiment_Analysis.R’)” in the console • Results § R will save four files in your Working Directory a) “Tweets with Sentiment Score.csv” which contains all tweets with their corresponding sentiment score. b) “Sentiment Distribution.csv” which contains the frequency distribution of the sentiment score. c) “Sentiment Distribution.jpeg” which contains a bar chart of the distribution. Network Analysis: • Big Query (SQL) § Query table: SELECT actor_preferredUsername userscreenname, body FROM Table Name WHERE verb like 'post' and body like '@%' § Download as CSV § Save the file in your Working Directory § Rename the file as “reply_network.csv” • RStudio § Set the folder that contains the CSV file generated from Big Query as your Working Directory setwd(“directory”) § Save the following R Scripts in your Working Directory: a) Construct_ReplyNetwork.R b) Reply_Network.R § Open RStudio § Type “source(‘Reply_Network.R’)” in the console • Results § R will save one files in your Working Directory a) “Reply Network.csv” which contains two variables named “Source” and “Target”. • Gephi § Import the file we generated in R: a) File – Import b) Data importer (co-occurrencies) c) Select the file “Reply Network.csv” d) This type of agent: Source is connected to This type of agent: Target e) Next f) Next g) Select “create links between Source…” h) Select “remove self-loops: …” i) Finish § Compute statistics: a) Overview Tab b) Statistics Tab (right-hand side of the screen) c) Average Degree – Run d) Network Diameter – Run e) Graph Density – Run f) PageRank – Run – Select “use edge weight” g) Eigen Vector Centrality – Run h) Avg. Path Length – Run § Export Statistics: a) Data Laboratory Tab b) Export Table § Identify Sub-Communities a) Statistics b) Modularity – Run c) Select “randomise” d) Select “Use Weights” e) Output: • Modularity ranges from 0 to 1. The closer to 1 the better the detection. • No. of detected communities. § Identify communities with different colours in the graph a) Partition Tab (left-hand side of the screen) b) Choose an attribute – Modularity Class c) Click on “Palette…” if you want to change colours d) Apply § Create a graphical representation of the network: a) Layout Tab (left-hand side of the screen) b) Choose a Layout – ForceAtlas 2 (recommended since it considers the strength of the relationships). c) Run – Note: this is a very computationally intensive process. You will notice that the graph becomes more and more “stable” so you might decide to stop the process after a while since changes in the layout are minimal. You can stop the process by clicking on the x on the lower right hand side corner of the screen. § To identify users within each community and corresponding statistics: a) Overview Tab b) Filter Tab (Right-hand side of the scree) c) Attributes d) Partition e) Modularity class f) Select the communities you want to visualise (they are stored in descending order) g) Once a filter is applied it is possible to estimate statistics and visualise tables for each sub-community.
Jan 18, 2021
SOLUTION.PDF

Get Answer To This Question

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here