COSC 2123/1285 Algorithms and Analysis Assignment 1: Word Completion Assessment Type Pair (Group of 2) Assignment. Submit online via Canvas → As- signments → Assignment 1. Clarifications/updates/FAQ...

1 answer below »
Assignment 1: Word Completion


COSC 2123/1285 Algorithms and Analysis Assignment 1: Word Completion Assessment Type Pair (Group of 2) Assignment. Submit online via Canvas → As- signments → Assignment 1. Clarifications/updates/FAQ can be found in Ed Discussion Forum → Assignment 1 General Discus- sion. Due Date Week 7, 11:59pm, Friday, April 22, 2022 Marks 30 1 Objectives There are a number of key objectives for this assignment: • Understand how a real-world problem can be implemented by different data structures and/or algorithms. • Evaluate and contrast the performance of the data structures and/or algorithms with respect to different usage scenarios and input data. In this assignment, we focus on the word completion problem. 2 Background Word/sentence (auto-)completion is a very handy feature of nowadays text editors and email browsers (you must have used it in your Outlook). While sentence completion is a much more challenging task and perhaps requires advanced learning methods, word completion is much easier to do as long as you have a dictionary available in the memory. In this assignment, we will focus on implementing a dictionary comprising of words and their frequencies that allows word completion. We will try several data structures and compare their performances. One of these data structures is the Ternary Search Tree, which is described below. Ternary Search Trees Ternary search trees (TST) is a data structure that allows memory-efficient storage of strings and fast operations such as spell checking and auto-completion. Each node of the a TST contains the following fields: • a lower-case letter from the English alphabet (‘a’ to ‘z’), • a positive integer indicating the word’s frequency (according to some dataset) if the letter is the last letter of a word in the dictionary, • a boolean variable that is True if this letter is the last letter of a word in the dictionary and False otherwise, • the left pointer points to the left child-node whose letter is smaller (in alphabetical order, i.e. ‘a’ < ‘b’),="" •="" the="" right="" pointer="" points="" to="" the="" right="" child-node="" whose="" letter="" is="" larger,="" •="" the="" middle="" pointer="" points="" to="" a="" node="" that="" stores="" the="" next="" letter="" in="" the="" word.="" as="" an="" example,="" consider="" figure="" 1.="" figure="" 1:="" an="" example="" of="" a="" ternary="" search="" tree="" storing="" five="" words="" and="" their="" frequencies.="" the="" boolean="" value="" (t)rue="" indicates="" that="" the="" letter="" is="" the="" end="" of="" a="" word.="" in="" that="" case,="" a="" frequency="" (an="" integer)="" is="" shown,="" e.g.,="" 10="" for="" ‘cut’.="" note="" that="" a="" word="" can="" be="" a="" prefix="" of="" another,="" e.g.,="" ‘cut’="" is="" a="" prefix="" of="" ‘cute’.="" construction="" a="" tst="" can="" be="" built="" by="" adding="" words="" to="" the="" tree="" one="" by="" one,="" noting="" that="" different="" orders="" of="" words="" can="" lead="" to="" different="" trees.="" one="" can="" start="" from="" the="" root="" node="" (or="" create="" one="" when="" adding="" the="" first="" word)="" and="" follow="" the="" left,="" middle,="" or="" right="" pointer="" depending="" on="" whether="" the="" current="" letter="" is="" smaller,="" equal="" to,="" or="" larger="" than="" the="" letter="" of="" the="" node="" (in="" alphabetical="" order).="" if="" the="" new="" word="" is="" a="" prefix="" of="" an="" existing="" word="" in="" the="" tree="" then="" we="" can="" simply="" change="" the="" boolean="" field="" of="" the="" node="" storing="" the="" last="" letter="" to="" true.="" for="" instance,="" in="" the="" example="" in="" figure="" 1,="" if="" (cut,="" 10)="" is="" added="" after="" (cute:="" 50),="" then="" one="" can="" simply="" change="" the="" boolean="" field="" of="" the="" node="" containing="" the="" letter="" ‘t’="" to="" true="" and="" set="" the="" frequency="" to="" 10,="" which="" signifies="" that="" now="" (cut,="" 10)="" is="" part="" of="" the="" tree.="" in="" another="" case,="" when="" (cup,="" 30)="" is="" added,="" then="" one="" new="" node="" has="" to="" be="" constructed="" to="" store="" ‘p’,="" which="" has="" a="" true="" in="" the="" boolean="" field="" and="" 20="" as="" its="" frequency.="" in="" this="" case,="" ‘cup’="" can="" reuse="" the="" two="" nodes="" 2="" storing="" ‘c’="" and="" ‘u’="" from="" the="" previous="" words.="" however,="" when="" (app,="" 20)="" and="" (farm,="" 40)="" are="" added,="" three="" and="" four="" new="" nodes="" have="" to="" be="" created,="" respectively.="" searching="" to="" search="" for="" a="" word="" (and="" to="" get="" its="" frequency)="" in="" a="" tst,="" we="" use="" its="" letters="" to="" navigate="" the="" tree.="" for="" each="" letter="" of="" the="" word="" (from="" the="" first="" to="" the="" last),="" we="" follow="" either="" the="" left,="" middle,="" or="" right="" pointer="" to="" the="" corresponding="" child="" node="" depending="" on="" whether="" this="" letter="" is="" smaller="" than,="" equal="" to,="" or="" larger="" than="" the="" letter="" stored="" at="" the="" current="" node.="" once="" we="" hit="" a="" node="" storing="" the="" same="" letter,="" we="" will="" follow="" the="" middle="" pointer="" and="" start="" using="" the="" next="" letter="" of="" the="" word="" on="" the="" middle="" child="" node="" and="" repeat="" the="" process.="" the="" search="" fails,="" that="" is,="" the="" word="" is="" not="" in="" the="" tree,="" if="" either="" a)="" the="" search="" algorithm="" hits="" a="" null="" node="" while="" there="" is="" still="" at="" least="" one="" letter="" in="" the="" word="" that="" hasn’t="" been="" matched,="" or="" b)="" it="" finds="" that="" the="" searched="" word="" is="" a="" prefix="" of="" an="" existing="" word="" in="" the="" tree="" but="" the="" boolean="" field="" of="" the="" node="" corresponding="" to="" the="" last="" letter="" is="" false.="" deletion="" the="" deletion="" succeeds="" if="" the="" word="" is="" already="" included="" in="" the="" tree.="" if="" the="" word="" is="" a="" prefix="" of="" another="" word="" then="" it="" can="" be="" deleted="" by="" simply="" setting="" the="" boolean="" field="" of="" the="" node="" storing="" the="" last="" letter="" to="" false.="" for="" example,="" if="" (cut,="" 10)="" is="" to="" be="" deleted="" from="" the="" tree="" in="" figure="" 1,="" then="" we="" only="" need="" to="" change="" the="" boolean="" field="" of="" the="" node="" storing="" ‘t’="" to="" false.="" otherwise,="" if="" the="" word="" has="" some="" unique="" suffix,="" then="" we="" need="" to="" delete="" nodes="" corresponding="" to="" the="" suffix="" only.="" for="" example,="" if="" (cup,="" 30)="" is="" to="" be="" removed,="" then="" only="" the="" node="" storing="" the="" last="" letter="" ‘p’="" needs="" to="" be="" deleted="" from="" the="" tree.="" auto-completion="" one="" simple="" way="" to="" achieve="" auto-completion="" from="" a="" partial="" word="" is="" to="" return="" three="" words="" in="" the="" dic-="" tionary="" (tree)="" of="" highest="" frequencies="" that="" have="" the="" given="" partial="" word="" as="" a="" prefix.="" for="" example,="" in="" the="" tree="" given="" in="" figure="" 1,="" •="" the="" auto-completion="" list="" for="" ‘cu’="" is:="" [(cute,="" 50),="" (cup,="" 30),="" (cut,="" 10)],="" •="" the="" auto-completion="" list="" for="" ‘f’="" is:="" [(farm,="" 40)].="" suppose="" we="" add="" one="" more="" word="" (curiosity,="" 60)="" into="" this="" tree,="" then="" the="" auto-completion="" list="" of="" ‘cu’="" will="" be="" changed="" to="" [[(curiosity,="" 60),="" (cute,="" 50),="" (cup,="" 30)].="" in="" this="" example,="" although="" ‘cut’="" contains="" ‘cu’="" as="" a="" prefix,="" it="" is="" not="" in="" the="" top="" three="" of="" the="" most="" common="" words="" having="" ‘cu’="" as="" a="" prefix.="" in="" general,="" the="" auto-completion="" list="" contains="" either="" three,="" two,="" one,="" or="" no="" words.="" they="" must="" be="" sorted="" in="" a="" decreasing="" frequencies,="" that="" is,="" the="" first="" word="" has="" the="" highest="" frequency="" and="" the="" last="" has="" the="" lowest="" frequency.="" 3="" tasks="" the="" assignment="" is="" broken="" up="" into="" a="" number="" of="" tasks,="" to="" help="" you="" progressively="" complete="" the="" assignment.="" 3.1="" task="" a:="" implement="" the="" nearest="" neighbour="" data="" structures="" and="" their="" oper-="" ations="" (12="" marks)="" in="" this="" task,="" you="" will="" implement="" a="" dictionary="" of="" english="" words="" that="" allows="" auto-completion,="" using="" three="" different="" data="" structures:="" python’s="" list="" (array),="" python’s="" dictionary="" (hash="" table),="" and="" ternary="" search="" tree.="" each="" implementation="" should="" support="" the="" following="" operations:="" 3="" •="" build="" a="" dictionary="" from="" a="" list="" of="" words="" and="" frequencies:="" create="" a="" dictionary="" that="" stores="" words="" and="" frequencies="" taken="" from="" a="" given="" list="" (using="" add="" below).="" •="" (a)dd="" a="" word="" and="" its="" frequency="" to="" the="" dictionary.="" •="" (s)earch="" for="" a="" word="" in="" a="" dictionary="" and="" return="" its="" frequency="" (return="" 0="" if="" not="" found).="" •="" (d)elete="" a="" word="" from="" the="" dictionary.="" •="" (ac)auto-complete="" a="" given="" string="" and="" return="" a="" list="" of="" three="" most="" frequent="" words="" (if="" any)="" in="" the="" dictionary="" that="" have="" the="" string="" as="" a="" prefix.="" 3.1.1="" implementation="" details="" list-based="" dictionary="" in="" this="" subtask,="" you="" will="" implement="" the="" dictionary="" using="" python’s="" lists,="" which="" are="" equivalent="" to="" arrays="" in="" other="" programming="" languages.="" in="" this="" implementation,="" all="" operations="" on="" lists="" are="" allowed.="" other="" data="" structures="" should="" not="" be="" used="" directly="" in="" the="" main="" operations="" of="" the="" list-based="" dictionary.="" hashtable-based="" dictionary="" in="" this="" subtask,="" you="" will="" use="" the="" native="" python’s="" dictionary.="" in="" this="" implementation,="" all="" standard="" python="" operations="" on="" dictionaries="" are="" allowed.="" ternary-search-tree-based="" dictionary="" in="" this="" subtask,="" you="" will="" implement="" the="" ternary="" search="" tree="" and="" use="" it="" to="" support="" auto-completion.="" both="" iterative="" or="" recursive="" implementations="" are="" acceptable.="" 3.1.2="" operations="" details="" operations="" to="" perform="" on="" the="" implementations="" are="" specified="" on="" the="" command="" file.="" they="" are="" in="" the="" following="" format:=""> [arguments] where operation is one of {S, A, D, AC} and arguments is for optional arguments of some of the operations. The operations take the following form: • S word – searches for word in the dictionary and returns its frequency (returns 0 if not found). • A word frequency – adds a new word and its frequency to the dictionary, returns True if succeeded and False if the word already exists in the dictionary. • D word – deletes word from the dictionary. If fails to delete (word is not in the dictionary), returns False. • AC partial word – returns a list of three words of highest frequencies in the dictionary that has partial word as a prefix. These words should be listed in a decreasing order of frequencies. Maximum three words and minimum zero word will be returned. As an example of the operations, consider the input and output from the provided testing files, e.g., sampleDataToy.txt, testToy.in, and the expected output, testToy.exp (Table 1). Note, you do NOT have to do the input and output reading yourself. The provided Python files will handle the necessary input and output formats. Your task is to implement the missing methods in the provided classes. 4 sampleDataToy.txt testToy.in (commands) testToy.exp (expected output) cute 10 S cute Found ‘cute’ with frequency 10 ant 20 D cute Delete ‘cute’ succeeded cut 30 S cute NOT Found ‘cute’ cuts 50 S book NOT Found ‘book’ apple 300 A book 10000 Add ’book’ succeeded cub 15 S book Found ‘book’ with frequency 10000 courage 1000 S apple Found ‘apple’ with frequency 300 annotation 5 D apple Delete ‘apple’ succeeded further 40 S apple NOT Found ‘apple’ furniture 500 D apple Delete ‘apple’ failed find 400 AC c Autocomplete for ‘c’: [ calm: 1000 cuts: 50 cut: 30 ] farm 5000 AC cut Autocomplete for ‘cut’: [ cuts: 50 cut: 30 ] farming 1000 D cut Delete ‘cut’ succeeded farmer 300 AC cut Autocomplete for ‘cut’: [ cuts: 50 ] appendix 10 AC farms Autocomplete for ‘farms’: [ ] apology 600 apologetic 1000 fur 10 fathom 40 apps 60 Table 1: The file sampleDataToy.txt provides the list of words and frequencies for the dictionary, while testToy.in and testToy.exp have the list of input commands and expected output. For instance, as ‘cute’ belongs to the dictionary, the expected outcome of “S cute” should be “Found ‘cute’ with frequency 10” and “D cute” should be successful. After deleting ‘cute’, “S cute” should return “NOT Found ‘cute’”. Note that although there are more than three words having ‘c’ as a prefix, “AC c” only
Answered Same DayApr 14, 2022

Answer To: COSC 2123/1285 Algorithms and Analysis Assignment 1: Word Completion Assessment Type Pair (Group of...

Karthi answered on Apr 14 2022
94 Votes
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here