Word Indexer
In this homework assignment you are going to implement a word indexer program. A word index is how you keep track of where an individual word occurs in a file (i.e., what line number). You've seen these before, as they commonly appear at the end of long books for a quick look-up of information. You will be doing this using a combination of the STL classes you experimented with this chapter. Specifically, you will be usingvector
,map
, andset
.
I havestarting code for you to download, which contains the function prototypes, an almost completemain
function, and some input files for you to try out.
Submissions that do not compile will receive a zero. You cannot change the given function prototypes. This is a solo homework assignment (no partners).
tokenize
This function is responsible for breaking a string of text into its separate words called tokens. Create avector
to store the tokens. Use thestring
parameter (which represents a line of text) andvarious methods from the string classto separate the words into their own strings.
For example, if the line contains "I found it in the room", this function should return avector
that contains the strings: "I", "found", "it", "in", "the", "room". As you can see, whitespace is not included in the words.
createMap
This function is responsible for reading input from a file stream and generating a map of words to their line numbers. One word can appear in multiple lines, so one representation we can use for this isstring
for the keys andset
for the values.
Write a loop that reads data from theifstream
parameter (this is the input file) line-by-line. For each line of text, call yourtokenize
function to get back avector
containing the separate words. For each word, you need to update themap
so that you insert the current line number.
For reference, you might want to look back at the various methods we can use with the
map
and
set
containers.
main
The majority of themain
function has been completed for you. There is, however, oneTODOtask to complete. You will need to write a loop using iterators that will traverse themap
and print out the information. See the sample runs below for how you might want to format this.