the course is data mining. This project has to 2 parts, project presentation power point slides...

Question

the course is data mining. This project has to 2 parts, project presentation power point slides (10slides) plus the code which is due on Monday (5/2/22) and project paper report (4 page) and code which is due on Wednesday (5/4/22). Please answer all the questions briefly. Carefully read the attached proposal document and you should be based on that.

COMP Project Description Spring 2022 This project has 2 parts, project presentation power point slides (10slides) plus the code which is due on Monday (5/2/22) and project paper report (4 page) and code which is due on Wednesday (5/4/22). Please answer all the questions briefly. Carefully read the attached proposal document and you should be based on that. Project Presentation: 10 slides Create Power Point slides to present your data mining techniques, evaluation, and results to the class. · data mining, prepare data, confusion matrix, evaluation. · Make slides · What is your project · Explain the technique you use, please don’t explain the definition of the technique. Instead explain what you do and how do you do it. Please don’t explain the definition of the technique. · How much data do you have? · What kind of algorithms do you have, ho do you do it? · What Evaluation technique do you use, run the algorithm, and get the evaluation number. · Conclusion · How much accuracy do you get? · compare the two algorithms to each other and write which one is better and which one is difficult, what was the problem, what is your suggestion for each one for future. · Code- do the code using R code · Based on your code, include all the graphs in data preparation, include all results on the power point slide. Explain what your result is, explain the graph, what did you do to get the graph. · What was the problem, how do you solve the problem, what is the conclusion? · How much the model is good? · Made a model and predict. measure the accuracy and compare to each other. Not only accuracy f1, f2, recall or any other measuring. · Work on confusion matrix · Generally, show the result, model, accuracy or recall and confusion matrix. Project Report . It should include: · Introduction: Describe the problem, data, data preparation, … · Related work: Go to school library dataset and find resource/magazine and pick 3 article that has the same problem. Explain if there is a relevant paper that worked on the same problem. 2-3 paragraph · Methods: Describe the methods, algorithm that you used to solve the problem · Evaluation: Compare the methods you used · Conclusion: Write a conclusion and future work for your project · Code: R code. All the code you wrote with comments that explain your implementation. Resources: use 3 resources from school library link: library.csun.edu Comp 541: project proposal Chronic Homelessness Summary of the proposed project: This work summarizes the challenges offered by homelessness service provision tasks, as well as the problems and the opportunities that exist for advancing both data science and human services. The problem is to know the social characteristics of homeless people to help them as best as possible and if it is possible to recognize homeless people based on the other data presented by the Dataset. Research technique: We will use R Code to analyze data about the homeless. We chose two different techniques to predict or to solve the problem of the dataset, these techniques are Random Forest model and Logistic Regression. Then we will use the two techniques to see if we can predict a homeless person. When we say multiple models are trained on a dataset, same model with different hyper parameters or different models can be trained on the training dataset. It combines the output of multiple decision trees and then finally produces its own output. Random Forest works on the same principle as Decision Tress; however, it does not select all the data points and variables in each of the trees. It randomly samples data points and variables in each of the trees that it creates and then combines the output at the end. It removes the bias that a decision tree model might introduce in the system. Also, it improves the predictive power significantly. We will see this in the next section when we take a sample data set and compare the accuracy of Random Forest and Decision Tree. The following steps · Load and summarize the data. · By deleting missing values, we have accidentally eliminated missing values as well as outliers. Therefore, no further treatment of the null values is required. · Make a plot to compare Veteran and No Veteran. · Make a frequency plot for quantitative variables split by Veteran. · Frequency plots for qualitative variables, split by Veteran. · Create training set and fit the final model to the training data. Generate predictions and bind them together with the true values from the test data. Now, we will create a Random Forest model with default parameters and then we will fine tune the model by changing ‘mtry’. We can tune the random forest model by changing the number of trees (ntree) and the number of variables randomly sampled at each stage (mtry). According to Random Forest package description: Ntree: Number of trees to grow. This should not be set to too small a number, to ensure that every input row gets predicted at least a few times. Mtry: Number of variables randomly sampled as candidates at each split. Note that the default values are different for classification · Using a loop to identify the right mtry for Random Forest Model. · We will use a mtry of 5 to have a more accurate model. We will test the random forest model on our training set. · Our training set has an error rate and has no error at the result. We will see if it is the same with our test set. · This graphic shows the most important datatypes for the Random Forest Model. We will compare the Random Forest Model with Logistic regression Model We implement two algorithms to make classification models. We will use ROC Curve for the two models to check the fitting of the data. The model with the highest ROC Score is considered the best model. Here, we created the data set using Bootstrap method. Where we generate data sets of the same size as data set earlier. Use the model to predict the homeless rate in the testing data. We will use the best model as an example but should compare the performance of all models. Conclusion: The power of assembling and the importance of using Random Forest over Decision Trees. Though Random Forest produces its own inherent limitations (in terms of number of factor levels a categorical variable can have), but it still is one of the best models that can be used for classification. It is easy to use and tune as compared to some of the other complex models, and still provides us with a satisfactory level of accuracy in this scenario. COMP 542 Handout Melissa Cober Computer Science Librarian [email protected] COMP 542 | Library Research Workshop Resources ● COMP 542 Library Research Guide: libguides.csun.edu/comp542 ● University Library Website: library.csun.edu o Type keywords into OneSearch to search the Library catalog: o Good for finding books (including eBooks), journal articles, datasets, etc. ● Library Databases: libguides.csun.edu/az.php?s=66108 o Databases → Choose Subject → Computer Science o Good for finding articles, conference papers and proceedings, and datasets o Helpful library databases for this assignment: ▪ ACM Digital Library ▪ IEEE Xplore ▪ O’Reilly Online Learning E-books (previously called Safari Tech Books Online) ● Google Scholar o scholar.google.com o Good for finding journal articles https://libguides.csun.edu/comp542 https://library.csun.edu https://libguides.csun.edu/az.php?s=66108 https://dl-acm-org.libproxy.csun.edu/ https://ieeexplore-ieee-org.libproxy.csun.edu/Xplore/dynhome.jsp?tag=1 https://www.oreilly.com/library/view/temporary-access/ https://scholar.google.com Melissa Cober Computer Science Librarian [email protected] IEEE Citations ● IEEE format is used in engineering, computer science, and information technology ● Guide to citing in IEEE format: libguides.csun.edu/comp282/citeyoursources o IEEE Examples: libguides.murdoch.edu.au/IEEE/all o Citing images, figures, & tables: guides.lib.monash.edu/c.php?g=219786&p=6610144 The Basics ● In-Text Citations: Figure 1 In-text citation example. Source: [1] ● References List: o A list of numerically-sorted full citations including complete and accurate information for each source o Goes at the end of your paper or presentation o Sample full citation: Figure 2 Full IEEE citation example. Source: [1] Since somebody else created the two images above, we give credit by using in-text citations to indicate the source, and then include a full citation in our References List, as shown below: References [1] Murdoch University Library, “IEEE - Referencing Guide,” Murdoch University Library. [Online]. Available: https://libguides.murdoch.edu.au/IEEE. [Accessed: Aug. 25, 2021]. https://libguides.csun.edu/comp282/citeyoursources https://libguides.murdoch.edu.au/IEEE/all https://guides.lib.monash.edu/c.php?g=219786&p=6610144 https://libguides.murdoch.edu.au/IEEE/home Melissa Cober Computer Science Librarian [email protected] Converting from Chicago to IEEE IEEE is based on Chicago style – when in doubt, export citations in Chicago and modify from there. See examples below; changes are underlined and highlighted. ● Key differences: o Add Reference Number in square brackets, e.g. [3] ▪ References List is sorted numerically, not alphabetically o Abbreviate names and reverse order ▪ Ex. Smith, Jane → J. Smith o Change format of volume/issue for journals o For e-resources, add: [Format]. Available: Database name, internet address. [Accessed: Date of access]. Example: Journal Article ● Chicago: Shrestha, Yash Raj, and Yongjie Yang. “Fairness in Algorithmic Decision-Making: Applications in Multi-Winner Voting, Machine Learning, and Recommender Systems.” Algorithms 12, no. 9 (September 2019): 199. https://doi.org/10.3390/a12090199. ● IEEE: [2] Y. R. Shrestha and Y. Yang, “Fairness in Algorithmic Decision-Making: Applications in Multi-Winner Voting, Machine Learning, and Recommender Systems,” Algorithms, vol. 12, no. 9, p. 199, Sep. 2019. [Online]. Available: mdpi.com. [Accessed August 10, 2021]. Example: Conference Paper ● Chicago: Hajian, Sara, Francesco Bonchi, and Carlos Castillo. “Algorithmic Bias: From Discrimination Discovery to Fairness-Aware Data Mining.” In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2125–26. ACM, 2016. https://doi.org/10.1145/2939672.2945386. ● IEEE: [3] S. Hajian, B. Francesco, and C. Castillo, “Algorithmic Bias: From Discrimination Discovery to Fairness-Aware Data

datamininggroupprojectspring2022-1-selm-5iiiy3wo.docx comp-541-project-proposal-2proj-5i1owjtb.docx comp-542-library-research-workshop-handout-1-1-daeft3bj.pdf

Mohd · Accepted Answer

Introduction:
Our primary objective is to predict our response variable(required) using generalized linear model (Logistic regression) and linear Discriminant Analysis algorithm. We have compared these models in order to maximize the accuracy. We have conducted exploratory data analysis before the model building for better understanding of dataset. we have considered “required” as response variable and remaining variable as in explanatory variable. We have also used stepAIC method for feature selection of logistic model.
Methods:
Logistic Regression:
This kind of factual analysis (otherwise called logit model) is frequently utilized for prescient examination and demonstrating, and stretches out to applications in AI. In this investigation approach, the response variable is limited or absolute: either An or B (Binary) or a scope of limited choices A, B, C or D (multinomial levels of response variable). It is utilized in measurable programming to comprehend the connection between the response variable and at least one explanatory factors by assessing probabilities utilizing a logistic regression model.
This kind of analysis can assist you with foreseeing the probability of an occasion it being settled on to a decision. For instance, you might need to know the probability of a guest picking a proposition made on your site — or not (response variable). Your analysis can take a Income at known attributes of Required. Calculated GLM models assist you with deciding a likelihood of what kind of guests are probably going to acknowledge the proposition — or not. Therefore, you can come to better conclusions about advancing your proposition or settle on conclusions about the actual deal.
Linear Discriminant Analysis:
Linear discriminant analysis is utilized as a device for classification and information representation. It has been around for a long while now. Regardless of its effortlessness, LDA frequently delivers strong, respectable, and interpretable classification results.

COMP XXXXXXXXXXProject Description XXXXXXXXXXSpring 2022 This project has 2 parts, project presentation power point slides (10slides) plus the code which is due on Monday (5/2/22) and project paper...

Answer To: COMP XXXXXXXXXXProject Description XXXXXXXXXXSpring 2022 This project has 2 parts, project...

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment