MIS 545 Lab 5 Assignment For this assignment, you need to use the association rules. For each question, turn in the R code. The data file Congressional Voting Records.csv is provided at D2L > Labs >...

1 answer below »
Association r programming


MIS 545 Lab 5 Assignment For this assignment, you need to use the association rules. For each question, turn in the R code. The data file Congressional Voting Records.csv is provided at D2L > Labs > Lab 5. Guidelines · You must write your answers in a single Word document. Your source code must be in text, NOT screenshot. The output could be in either text or screenshot. · You must include your name at the top of the Word document. · Your Word document should be uploaded at D2L. · It must be sent by the deadline. No late submission will be accepted. If do not follow these instructions, you will lose 10 points for your assignment. This is an independent assignment so please do not get answer from anyone else, including Internet, or provide your work to others. Evaluation Your submission will be graded using two factors: · A correctly written R code that answers the question. · The correct output that the code provides. Problem Description The Congressional Voting Records dataset includes votes for each of the U.S. House of Representatives Congressmen on the 16 votes, such as Handicapped Infants and Toddler Act (handicapped-infants), Cost Sharing for Federal Water Projects (water-project-cost-sharing), Adoption of the Budget Resolution (adoption-of-the-budget-resolution), etc. There are three possible vote results: yea, represented as ‘y’, nay, represented as ‘n’ and unknown disposition, represented as ‘?’. Unknown votes are treated as missing data. Use Apriori algorithm to find voting preference for both Republican and Democrat. E.g.: if {budget resolution = no, MX-missile = no, aid to El Salvador = yes} {Party = Republican} Questions: 1. For each following question please include your R source code helping you get the result. Please only include the most related code. 2. List the top 5 frequent association rules sorted by Support, regardless of party. Also show the rules in a two-key plot. – See Next Page – 3. List two most frequent association rules sorted by Confidence for each party. 4. Generate the parallel coordinates plot for Question 2 result. 5. Generate association rules network for Question 2 result. Please use both igraph and visNetwork packages. 6. Briefly discuss how to evaluate association rules. (One paragraph should be sufficient) 7. In what scenario, an association rule with high confidence could be a misleading rule? 2 Lab 5_Association Rules.docx MIS 545 Lab 5: Association Rules: Apriori Algorithm Find association rules based on transactions 1 Overview In this lab, we will convert purchase into basket and work with Apriori algorithm on two groceries data sets, which can be found under Lab 5 module on D2L. 1. groceries.csv: This dataset is used for part 2 Data Conversion. It contains purchase record from a grocery store. There are three attributes: customer membership number (Member_number), purchase date (Date), and item description (itemDescription). Items purchased on the same date and by the same person belong to the same transaction. These purchases will be converted to basket format. 2. itemSet_total.csv: This dataset is used for part 3, Basket Analysis. It contains clean data in basket format, converted from groceries.csv, ready for Apriori algorithm. Save them in your working directory. 2 Data Conversion For lab 5, we will use 5 new packages to manipulate data. arules: Provides the infrastructure for representing, manipulating, and analyzing transaction data and patterns (frequent itemSets and association rules) arulesViz: an extended package based on arules with various visualization techniques for association rules and itemSets igraph: is a library and R package for network analysis. You can find interesting network instruction here visNetwork: Provides an R interface to the 'vis.js' JavaScript charting library. It allows an interactive visualization of networks plyr: offers a set of tools used to solve problems by breaking down data size and putting results on small size back together # Install packages install.packages("arules") install.packages("arulesViz", dependencies = TRUE) install.packages("igraph") install.packages("visNetwork") install.packages("plyr") library(arules) library(arulesViz) library(igraph) library(visNetwork) library(plyr) First, use setwd() to assign your working directory. Save grocereies.csv under the directory. Then load transaction history data into the memory. Note, capital %Y means four-digit year, eg: 2016. %m means two-digit month, eg: 01 for January. %d means two-digit day, eg: 15 for fifteenth. # Read in csv file groceries.csv. groceries <- read.csv("groceries.csv")="" #="" correct="" data="" type="" groceries$member_number=""><- as.character(groceries$="" member_number)="" groceries$date=""><- as.date(groceries$date,="" '%m/%d/%y')="" groceries$itemdescription=""><- as.character(groceries$itemdescription)="" if="" you="" call="" head(groceries),="" you="" will="" see="" each="" record="" only="" catches="" one="" item,="" a="" pattern="" does="" not="" match="" the="" requirement="" of="" basket="" analysis.="" to="" grouping="" items="" together="" by="" member="" id="" and="" purchase="" date,="" we="" use="" a="" function="" called="" ddply()="" offered="" by="" package="" plyr.="" #="" ddply()="" is="" used="" to="" get="" grouping="" statistics="" of="" data.frame="" asket=""><- ddply(groceries,="" c("member_number",="" "date"),="" function(input)="" paste(input$itemdescription,="" collapse="," )="" )="" explore="" basket="" to="" see="" if="" items="" get="" converted="" into="" basket="" format.="" save="" the="" variable="" carrying="" information="" of="" itemsets="" for="" future="" association="" rules="" discovery.="" #="" only="" keeping="" itemset="" variable="" to="" get="" transaction="" list="" txnlist=""><- basket[,c(3)]="" #="" write="" into="" csv="" file="" write.csv(txnlist,"itemset.csv",="" row.names="FALSE)" 3="" basket="" analysis="" load="" the="" second="" dataset,="" itemset_total.csv.="" read.transaction()="" method="" treats="" each="" record="" as="" a="" basket,="" in="" which="" every="" item="" is="" detected="" as="" a="" single="" item.="" #="" display="" frequent="" items="" itemset=""><- read.transactions("itemset_total.csv",="" sep="," )="" #="" 9835="" transactions="" (rows)="" and="" #="" 169="" items="" (columns)="" #="" plot="" the="" frequent="" itemsets="" in="" term="" of="" support="" itemfrequencyplot(itemset,="" support="0.1)" #="" find="" top="" 20="" frequent="" itemsets="" itemfrequencyplot(itemset,="" topn="20)" in="" general,="" we="" can="" observe="" four="" types="" of="" combination="" by="" support="" and="" confidence.="" a)="" a="" rule="" that="" has="" high="" support="" and="" high="" confidence.="" e.g.:="" {milk}="" {diapers}="" support="3/5" =="" 60%="" confidence="3/4" =="" 0.75%="" such="" rules="" are="" not="" subjectively="" interesting="" due="" to="" any="" household="" that="" has="" children="" would="" use="" milk="" and="" diapers.="" b)="" a="" rule="" that="" has="" reasonably="" high="" support="" but="" low="" confidence.="" e.g.:="" {milk}="" {cola}="" support="2/5" =="" 40%="" confidence="2/4" =="" 50%="" such="" rule="" can="" be="" interesting="" besides="" the="" fact="" that="" both="" products="" are="" liquid="" that="" can="" be="" consumed,="" although="" one="" is="" a="" healthy="" drink="" well="" the="" other="" is="" a="" high="" sugar="" drink.="" c)="" a="" rule="" that="" has="" low="" support="" and="" low="" confidence.="" e.g.;="" {bread}="" {eggs}="" support="1/5" =="" 20%="" confidence="25%" such="" rule="" set="" is="" not="" subjectively="" interesting="" due="" to="" both="" items="" are="" staple="" foods.="" d)="" a="" rule="" that="" has="" low="" support="" and="" high="" confidence.="" e.g.:="" {eggs}="" {diapers}="" support="1/5" =="" 20%="" confidence="1/1" =="" 100%="" such="" rule="" set="" is="" subjectively="" interesting="" due="" to="" its="" not="" being="" expected.="" we="" would="" like="" to="" find="" associated="" items="" that="" are="" not="" common="" on="" average.="" #="" generate="" association="" rules="" rules=""><- apriori(itemset,="" parameter="list(sup=0.005," conf="0.5," target="rules" ))="" a="" confidence="" metric="" indicates="" a="" higher="" precision="" rate="" of="" your="" prediction.="" lift="" measures="" how="" much="" better="" your="" prediction="" performs="" than="" the="" base="" line="" (expected="" responding="" rate)="" #="" find="" top="" 10="" rules="" with="" highest="" confidence,="" it="" also="" can="" be="" support="" or="" lift.="" rules=""><- sort(rules,="" decreasing="TRUE," by="confidence" )="" inspect(rules[1:10])="" #="" sorting="" grocery="" rules="" by="" lift,="" show="" the="" top="" five="" rule="" inspect(sort(rules,="" by="lift" )[1:5])="" lhs="" rhs="" support="" confidence="" lift="" #{curd,tropical="" fruit}=""> {yogurt} 0.0052872 0.5148515 3.690645 #{onions,root vegetables} => {other vegetables} 0.00569395 0.6021505 3.11201 ... If want to sort the association rules by Support, simply specify “by = support” in function sort(). Other than the general association rules, we also would like to know associated items for an item that we are interested in. # What are customers likely to buy before purchase whole milk milkRule_1 <- apriori(itemset,="" parameter="list(sup" =="" 0.005,="" conf="0.2," target="rules" ),="" appearance="list(default" =="" 'lhs',="" rhs="whole milk" ))="" milkrule_1=""><- sort(milkrule_1,="" decreasing="TRUE," by="confidence" )="" inspect(milkrule_1[1:5])="" #="" what="" are="" customers="" likely="" to="" buy="" if="" they="" purchase="" whole="" milk="" milkrule_2=""><- apriori(itemset,="" parameter="list(sup" =="" 0.005,="" conf="0.2," target="rules" ),="" appearance="list(default" =="" 'rhs',="" lhs="whole milk" ))="" milkrule_2=""><- sort(milkrule_2,="" decreasing="TRUE," by="confidence" )="" inspect(milkrule_2)="" 4="" visualization="" plot()="" method="" provides="" multiple="" ways="" to="" display="" associations="" rules.="" the="" first="" plot()="" generate="" scatter="" plot="" of="" two="" key="" criteria="" (support="" and="" confidence),="" also="" called="" two-key="" plot.="" #="" overview="" of="" rules="" plot(rules,="" shading="lift" ,="" control="list(main" =="" "two-key="" plot"))="" #="" select="" top="" 10="" rules="" in="" term="" of="" confidence="" top_rules=""><- sort(rules,="" decreasing="TRUE," by="confidence" )[1:10]="" plot(top_rules,="" method="paracoord" ,="" shading="confidence" )="" igraph="" and="" visnetwork="" provide="" advanced="" features="" to="" explore="" association="" rules="" via="" a="" network,="" in="" which="" each="" node="" represents="" an="" itemset,="" and="" edges="" stand="" for="" observed="" rules.="" #="" create="" a="" basic="" graph="" structure="" ig=""><- plot(top_rules,="" method="graph" )="" #="" use="" our="" igraph="" ig_df=""><- get.data.frame(ig,="" what="both" )="" #="" generate="" nodes="" nodes=""><- data.frame(id="ig_df$vertices$name," #="" the="" size="" of="" nodes:="" can="" be="" lift="" or="" confidence="" value="ig_df$vertices$support," title="ifelse(ig_df$vertices$label==""," ig_df$vertices$name,="" ig_df$vertices$label),="" ig_df$vertices="" )="" #="" generate="" edges="" edges=""><- ig_df$edges="" #="" generate="" network="" with="" add-on="" features="" network=""><- visnetwork(nodes,="" edges)="" %="">% # Features to manipulate network visOptions(manipulation = TRUE) %>% # Directed network visEdges(arrows = 'to', scaling = list(min = 2, max = 2)) %>% # Navigation buttons visInteraction(navigationButtons = TRUE) network 5 csfiles/home_dir/groceries.csv "Member_number","Date","itemDescription" 1619052826499,"10/22/2012","citrus fruit" 1679031673299,"10/03/2010","tropical fruit" 1634042851699,"07/07/2011"
Answered Same DayAug 15, 2021

Answer To: MIS 545 Lab 5 Assignment For this assignment, you need to use the association rules. For each...

Pritam Kumar answered on Aug 16 2021
129 Votes
Qn1.
install.packages("arules")
library(arules)
install.packages("arulesViz")
library(arulesViz)

install.packages("igraph")
library(igraph)
install.packages("visNetwork")
library(visNetwork)
install.packages("plyr")
library(plyr)
install.packages("BETS")
library(BETS)
CVR <- read.csv("Congressional Voting Records.csv", na.strings="?")
CVR
result <- apriori(CVR,parameter=list(sup=0.35,conf=0.8, target="rules"),
appearance=list(default='lhs',
rhs=c('party=democrat','party=republican')))
rules <- sort(result, decreasing = TRUE, by = "confidence")[1:5]
inspect(rules)
plot(rules, shading="lift", control=list(main="Top 5"))
plot(rules,method="paracoord",shading="confidence")
result_democrat <- apriori(CVR, parameter...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here