For this assignment, you will create the following for your research project using the CRISP-DM project management methodology. : · A PERT chart that shows all the activities, paths etc. Reading: ·...

1 answer below »
Please create a PERT which shows all activities in this analytical report such as Modeling, Documentation, Deliver report etc.? Using the same format you submitted earlier, please give a ETA of each activity. One single pdf file. All necessary materials are uploaded for your reference. Please use a similar format andstyle as the "PERT" pdf file.


For this assignment, you will create the following for your research project using the CRISP-DM project management methodology.  : · A PERT chart that shows all the activities, paths etc. Reading: · How to perform Data Analysis using the CRISP-DM approach? https://towardsdatascience.com/how-to-perform-data-analysis-using-the-crisp-dm-approach-201708f220b2 · What is a Data Science R&D Approach? https://www.datascience-pm.com/research-and-development/ The topic is Fake New Detection Using Machine Learning in Python. Please refer to the document which is attached on the website as a guideline and proposal. You can follow the flowchart above to make a new flowchart in you topic, make sure give explanations for each stage/phase in the process. Please consolidate all these into a single pdf file and submit it. Microsoft PowerPoint - SOlution Import Data for NEWS Read Data Make Imports necessary for Detection imported data is read into DataFrame Obtain Data Sets • shape of the data • certain number of initial records • Labels are gathered from DataFrame. FAKE REAL Build Confusion Matrix The accuracy is checked and a confusion matrix is built It will provide detail of number of true and false positives and negatives Training SET Testing SET Dataset is split Fit Vectorizer on Training SET TfidfVectorizer It has certain English words and a maximum document frequency Initialize Vectoriser The vectorizer shall convert the collection of raw documents into matrix of TF-IDF. TF-IDF helps in calculating relative count of each word in a sentence and storing in document matrix TF-IDF Matrix Transform Vectorizer on Training SET The vectorizer it fitted on train set and transformed on the test set Prediction Initiate PassiveAggress iveClassifier Prediction is done on the test set from TfidfVectorizer and accuracy is calculated with accuracy_score() Passive aggressive classifier algorithm is used to classify large volumes of data streams. It is efficient in making updates to correct the loss. Critical Activity explanations for each stage/phase in the process LEGEND : : Classify the news as FAKE or REAL. This is an algorithm, which remains passive as long as the data seems to be correct but turns aggressive in case of anomalies or errors. It is built in way such that it updates and corrects the mistakes. Fake New Detection Using Machine Learning in Python Critical Activity Activity Fake New Detection Using Machine Learning in Python 1. Introduction We rely greatly on online sources for news in today’s day and age, but not all sources are reliable. There is a great inflow of fake news from various sources, thus the need to differentiate between the genuine news sources and the fake ones is extremely high. One’s perspective about life and society relies heavily on the news, to which we are exposed. To stop the plague of fake news from consuming people’s minds we should use every resource available and today one of the greatest resources we have is the advancement in data science and the use of Python. It is imperative that we use machine learning and Python to the best of its abilities to quell the endless flow of fake news. 2. Proposed Research Problem and Some Background Information 2.1 Fake New Detection Fake news is a very common term used today, which implies the news that is shared from questionable sources on web-based platforms or on social media, which is sensationalized or in a number of cases, which outright lies. This kind of news generally helps either one or another political ideology. Sometimes, such news is also spread for spreading an untrue image of products (De Beer & Matthee, 2020). These sources tend to be highly radicalized and consumers of such fake news start to be exposed to these radicalized lies. That is a very dangerous reality, which needs to be fought against. No matter what, one’s political ideology is or one’s choice of commercial products is it should be based off on true events. 3. Fake New Detection using TFIDF Vectorizer 3.1 TF (Term Frequency): The number of times a particular term is repeated through a document is known as the term frequency. A higher value is indicative of the document having the word or phrase repeated more often than the others do. If that particular word or phrase is a part of the search parameters then there is good chance that the document with higher TF is the match. 3.2 IDF (Inverse Document Frequency): Word or phrases that are common not just in one but all the documents in a corpus may not be relevant. IDF is the measure of how significant a word or phrase is over an entire collection of documents. The TFIDF Vectorizer categorizes the raw data in documents based on term frequency and its significance over a corpus (Singh & Shashi, 2019). 3.3 Passive Aggressive Classifier: This is an algorithm, which remains passive as long as the data seems to be correct but turns aggressive in case of anomalies or errors. It is built in way such that it updates and corrects the mistakes. 3.4 Natural Language Processing: This is a combination of linguistics, programming, and artificial intelligence (AI) to detect language and speech patterns of people (Gilda, 2017). It is the method used to process and analyze raw language data. 3.5 Machine Learning Classification: Machine learning is a set of algorithms that are used to produce better accurate results, when analyzing raw data. 3.6 Findings There are various projects, which have tried to create algorithms for fake news detection. Most of them use the TFIDF Vectorizer. Even though their efficiency is varied, it is still quite high. One used a political data set and executed the TFIDF Vectorizer along with the Passive Aggressive Classifier and their efficiency for removing documents with fake news was 92.82% (Gilda, 2017). Another used the Liar dataset and executed TFIDF Vectorizer with the passive Aggressive Classifier and the Machine learning classification and their efficiency for removing fake news documents resulted to be 92% (Choudhary & Arora, 2021). One of the most important publications in this area is as mentioned by Khanam, Alwasel, Sirafi and Rashid (2021). It uses a different approach to the Naives Bayes Classifier and increases its efficiency by almost twenty percent. Another important read in this context is by De Beer and Matthee (2020). It is a comprehensive review of all the research and the projects, which were done before the paper on identifying fake news using Python and machine learning. 3.7 Data Sets Some of the most commonly used data sets are: LIAR: It is one of the most commonly used political data set which was created using politifact.com as the raw data input. Fake News Net: It is the data set, which is used quite often, consisting of news content, social media content and spatial temporal data. This was created by using various reliable sources for the information (Albahr & Albahar, 2020). There are more data sets available, tailored for various categories depending on the focus of the different projects. 4. Conclusion There is considerable amount of research done on the detection of fake news, the efficiency of which is considerably high. Nevertheless, the execution of this on the web, on social media is practically nil. The results of the research have not provided an absolute detection; new programs and methods need to be hashed out to increase the efficiency of fake news detection. Along with this, the real-world executions of these methods on the internet are as important. References Albahr, A., & Albahar, M. (2020). An empirical comparison of fake news detection using different machine learning algorithms. IJACSA Choudhary, A., & Arora, A. (2021). Linguistic feature based learning model for fake news detection and classification. Expert Systems with Applications, 169, 114171 De Beer, D., & Matthee, M. (2020). Approaches to identify fake news: a systematic literature review. In International Conference on Integrated Science, 13-22 Gilda, S., (2017). Notice of violation of IEEE publication principles: Evaluating machine learning algorithms for fake news detection. In 2017 IEEE 15th student conference on research and development (SCOReD), 110-115 Khanam, Z., Alwasel, B. N., Sirafi, H., & Rashid, M. (2021). Fake News Detection Using Machine Learning Approaches. In IOP Conference Series: Materials Science and Engineering, 1099(1), 012040 Singh, A. K., & Shashi, M. (2019). Vectorization of text documents for identifying unifiable news articles. International Journal of Advanced Computer Science Application, 10 Fake News Detection Using Machine Learning by Python The WBS involves the steps for detecting fake news using ML with Python. All the imports that are necessary for the detection are made. The imported data is read into DataFrame thus obtaining the shape of the data and certain number of initial records. Labels are gathered from DataFrame. The detection process shall classify the news as FAKE or REAL. It will use a dataset which will be implemented in TfidfVectorizer. PassiveAggressiveClassifier will be initialized that would fit the model. The dataset is split into training set and testing set. Vectorizer (TfidfVectorizer) is initialized. It has certain English words and a maximum document frequency. The vectorizer shall convert the collection of raw documents into matrix of TF-IDF. TF-IDF helps in calculating relative count of each word in a sentence and storing in document matrix. The vectorizer it fitted on train set and transformed on the test set. The PassiveAggressiveClassifier is initialized, which is then fit on tfidf_train and y_train. Prediction is done on the test set from TfidfVectorizer and accuracy is calculated with accuracy_score(). The accuracy is checked and a confusion matrix is built. It will provide detail of number of true and false positives and negatives. Python is used with its sci-kit libraries and extensions. They can be used in Machine Learning. Sci-kit Learn Library of Python is a good source of ML algorithms where ML algorithms are available for Python thus making faster evaluation. Passive aggressive classifier algorithm is used to classify large volumes of data streams. It is efficient in making updates to correct the loss. Reference Z Khanam, B N Alwasel, H Sirafi and M Rashid, 2020, Fake News Detection Using Machine Learning Approaches. IOP Conference Series: Materials Science and Engineering, Volume 1099, International Conference on Applied Scientific Computational Intelligence using Data Science (ASCI 2020) 22nd-23rd December. Published under licence by IOP Publishing Ltd
Answered Same DayOct 15, 2021

Answer To: For this assignment, you will create the following for your research project using the CRISP-DM...

Neha answered on Oct 16 2021
120 Votes
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here