In this project, we’ll implement a Unix utility that recursively searches for matching words in text files. If you’ve ever used the grep command from a shell, our program will be somewhat similar,...


In this project, we’ll implement a Unix utility that recursively searches for matching words in text files. If you’ve ever used thegrepcommand from a shell, our program will be somewhat similar, except:



  • It operates recursively by default (traversing into subdirectories)

  • Only entire words are matched, not partial words (i.e., searching for ‘the’ does not match ‘theme’)

  • The line number where the matching search term was found is printed, along with some additional file system information.


A rough approximation of these features withgrepwould be using the-Rnwflags, like:



grep -Rnw term1 term2 term3


Our version of the tool will make use of multiple threads running in parallel, so we’ll call itprep. To give you an idea of how your program will work, here’s a quick example:



# Searches for hello in all the files located in /etc. Note that case is # ignored, and the line number where the match was found is also included. # Line numbers start at 1, not 0. $ ./prep -d /etc HELLO [ /etc/services | 1118 | -r-- | hello-port 652/tcp ] [ /etc/services | 1119 | -r-- | hello-port 652/udp ] [ /etc/services | 2915 | -r-- | hello 1789/tcp ] [ /etc/services | 2916 | -r-- | hello 1789/udp ] [ /etc/services | 5001 | -r-- | aimpp-hello 2846/tcp ] [ /etc/services | 5002 | -r-- | aimpp-hello 2846/udp ] # With the -e flag, the match is case-sensitive. No results are returned: $ ./prep -d /etc -e HELLO # Here we find a name in three different files. # Each file will be searched by a different thread: $ ./prep -d /usr/share manoj [ /usr/share/locale/or/LC_MESSAGES/Linux-PAM.mo | 36 | -r-- | Last-Translator: Manoj Kumar Giri ] [ /usr/share/locale/or/LC_MESSAGES/glib20.mo | 199 | -r-- | Last-Translator: Manoj Kumar Giri ] [ /usr/share/locale/or/LC_MESSAGES/gdk-pixbuf.mo | 17 | -r-- | Last-Translator: Manoj Kumar Giri ] [ /usr/share/doc/flex/NEWS | 495 | -r-- | space (problem report from Manoj Srivastava ] # We can specify multiple search terms, of course: $ ./prep -d /usr/share/cracklib nutella stranger whitman endian kapow [ /usr/share/cracklib/cracklib.magic | 1 | -r-- | 0 lelong 0x70775631 Cracklib password index, little endian ] [ /usr/share/cracklib/cracklib.magic | 5 | -r-- | 0 belong 0x70775631 Cracklib password index, big endian ] [ /usr/share/cracklib/cracklib.magic | 8 | -r-- | >4 belong 0x70775631 Cracklib password index, big endian ("64-bit") ] [ /usr/share/cracklib/cracklib-small | 47266 | -r-- | stranger ] [ /usr/share/cracklib/cracklib-small | 53792 | -r-- | whitman ] # By default, prep will search the current working directory (CWD). # The full path is always printed. $ ./prep main [ /home/mmalensek/prp/prep.c | 186 | orw- | int main(int argc, char *argv[]) ] # We can 'cd' somewhere else and then run prep from there. # This run also limits the number of threads to 2. $ cd /etc $ ~/P1-Solution/prep -e -t2 absolutely [ /etc/lvm/lvm.conf | 1547 | -r-- | # you are absolutely sure about what you are doing! ] [ /etc/lvm/lvm.conf | 1631 | -r-- | # by hand unless you are absolutely sure you know what you are doing! ]


Note that the output format is:



[ /absolute/path/to/file | line-number | permissions | the line the word was found in ]


An absolute path starts from the root directory:/. You can tell whether a path isabsoluteorrelativeby looking at the first character: if it’s/, the path is absolute. Otherwise, it’s relative (e.g.,./blah, or evensome/path/file.txt).


For permissions, the output is a set of character-based flags similar to the output ofls -l. For example, if the file is only readable by the owner of theprepprocess, then the output will be-r--. If the file is readable, writable, and executable, and owned by the owner of theprepprocess, then the output will beorwx.



o r w x | | | | | | | +-- If the process owner can execute the file, '-' otherwise | | | | | +---- If the file is writable by the process owner, '-' otherwise | | | +------ If the process owner can read the file, '-' otherwise | +-------- Whether the process owner ows the file, '-' otherwise


Unlikels -l, the program will only display permissions (known as the filemode) in terms of how they apply to the current user runningprep. This means that you will:



  • Check to determine if the process owner is the same as the file owner. If so, report user permissions;

  • If not, check to determine if the process owner’s group is the same as the file group. Report the group permissions if this is true, and

  • If the file is owned by some other user/group, then simply print the ‘others’ permissions (the last three bits of the file mode).


If multiple matches are present on a single line, only print it once. You should also remove punctuation when you are searching for words; the punctuation removed in the examples above is:



\t\r\n.,:?!`()[]-/\'\"<>


Along with spaces. Also remove trailing and preceding whitespace (spaces, tabs, etc.) from the matched lines.


Since this is a parallel search, your implementation should detect the number of cores on the machine and use this number as the default upper bound for threads launched by the program. For each file that you find (recursively), you will launch a thread that looks for occurrences of the search term(s) specified. If there are more files than threads available, then you should wait until a thread finishes before starting another. Using a semaphore from the pthreads library is a good way to accomplish this.


In this assignment, you will get experience working with:




  • opendirandreaddirfunctions for listing directory contents


  • statfor getting file information

  • Argument parsing withgetopt

  • Semaphores

  • Detecting active CPU cores on a machine (get_nprocs)


There are a few other features you need to implement. We’ll let the program do the talking by printing usage information (-h option):



$ ./prep -h Usage: ./prep [-eh] [-d directory] [-t threads] search_term1 search_term2 ... search_termN Options: * -d directory specify start directory (default: CWD) * -e print exact case matches only * -h show usage information * -t threads set maximum threads (default: num CPUs) # Note that ANY time the user passes in -h, you'll ignore the other options: $ ./prep -e -t 4 -d / -h (displays help, and exits)



Grading


Check your code against the provided test cases. You should make sure your code runs on your Arch Linux VM. See theTest Casesfor details.



Restrictions: you may use any standard C library functionality. External libraries are not allowed unless permission is granted in advance. Your codemustcompile and run on your Arch Linux VM as described in class – failure to do so will receive a grade of0.

Dec 06, 2021
SOLUTION.PDF

Get Answer To This Question

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here