All the instructions are in the pdf attached.
Project 1 Project 1: Word Analysis Due 11:59 p.m., Sunday, October 24, 2020 IMPORTANT! This is an individual assignment. You may discuss broad issues of interpretation and understanding and general approaches to a solution. However, conversion to a specific code must be your own work. The assignment is expected to be your work, designed and coded by you alone. If you need help, please consult with your instructor. Objectives The objectives of this laboratory are 1. to learn how to work with strings and lists 2. to learn how to work with files 1. Introduction The Moby project is an extensive public-domain collection of lexical resources (such as words, phrases, synonyms, etc.) started by Grady Ward in 1996. It is now part of Project Gutenberg, an ambitious effort to digitize and archive virtually all historically important books and documents. In this project, we will play with the official list of 113,809 crosswords (i.e., words considered to be valid in crosswords puzzles and other word games). Preparation. Download the file crosswords.txt . To work with a file in Python, we first create a file object by opening it: infile = open("crosswords.txt", "r") You can read and print all of the words in the file, one at a time, using a for loop: for line in infile: word = line[:len(line)-1] # remove the newline character '\n' at the end of each line print(word) Make sure to close the file when done: close(infile) Save this script as wordanalysis.py. Run the script, and it will print every word in the input file. https://en.wikipedia.org/wiki/Moby_Project http://www.gutenberg.org/ https://drive.google.com/file/d/17UBg8LZy4uxwFACrkzxiE5tZzbGczhvi/view?usp=sharing Building a list of words. Modify your script so that it builds a list of words in the input file using the append() method, without printing them. Words with more than 20 letters. Now, modify your script so that it prints all the words in the list with more than 20 letters. Run the script and save the output of your test run in a text file named out1.txt. Palindromes. A palindrome is a word that is spelled the same forward and backward. In your script, define a function named isPalindrome() that takes a word in a parameter and returns True if the given word is a palindrome and False otherwise. In the main section of your script, add code to count the number of all the palindromes in the list, and then print the shortest and longest palindromes. Run the script and save the output in a text file named out2.txt. It should output in the following format: In the official list of 113,809 crosswords, there are ... palindromes. The shortest palindrome is ... The longest palindrome is ... Words without ‘e’. In 1939, Ernest Vincent Wright published a 50,000-word novel titled Gadsby that does not contain the letter ‘e’. Since ‘e’ is the most common letter in English, that was not easy to do. In the main section, add a code segment to count the number of all the words that do not have ‘e’, and then print the shortest and longest such words. Run the script and save the output of your test run in a text file named out3.txt. It should output in the following format: In the official list of 113,809 crosswords, there are ... words that do not have 'e'. The shortest such word is ..., and the longest such word is ... 2. Frequency analysis In English, certain letters are used more frequently than others. For example, there are more words that begin with the letter ‘s’ than others. It is also well-known that ‘e’ is the most frequently-used letter in English. Historically, such knowledge has played very important roles in cryptanalysis (i.e., the study of breaking ciphers). The following two exercises concern frequency analysis of letters used in the official list of 113,809 crosswords. The most frequently-used first letter. In this official list of 113,809 crosswords, how many words begin with the letter ‘a’? How many words begin the letter ‘b’? Is it true that ‘s’ is the most frequently-used first letter? In the main section, add a code segment to count, for each letter in the alphabet, the number of all the words that begin with the letter, and then print the most frequently-used first letter. For this, use at most two loops, not 26 separate loops to cover the alphabet. Run the script and save the output of your test run in a text file named out4.txt. It should output in the following format: In the official list of 113,809 crosswords, ... words begin with 'a', ... words begin with 'b', . . . ... words begin with 'z', and ... is the most frequently-used first letter. Hint: Define a list of 26 counters, each of which keeps track of the number of words that begin with each letter. The most frequently-used letter. In this official list of 113,809 crosswords, how many words use the letter ‘a’? How many words use the letter ‘b’? Is it true that ‘e’ is the most frequently-used letter? In the main section, add a code segment to count, for each letter in the alphabet, the number of all the words that use the letter, and then print the most frequently-used letter. As before, use at most two loops, not 26 separate loops to cover the alphabet. Run the script and save the output of your test run in a text file out5.txt. It should output in the following format: In the official list of 113,809 crosswords, ... words use 'a', ... words use 'b', . . . ... words use 'z', and ... is the most frequently-used letter. Hint: Python ord() function takes a character and returns its integer Unicode code. For example, >>> print(ord("a")) 97 Python chr() function takes an integer Unicode code argument and returns the string representing a character at that code. >>> print(chr(97)) a You should have a list of size 26 that stores the frequency and you should increase the frequency there. For example, l = [0]*26 l[ord(‘a’)-97] += 1 What to hand in Upon completion of your project, create a folder by your last name, copy your python script and output files inside the folder. Then create a zip file(your_last_name.zip) and upload the zip file in the moodle. ● Python script: wordanalysis.py. As always, your script should be properly documented by including a header at the beginning of the script and inserting comments wherever appropriate. ● Output files: out1.txt, out2.txt, out3.txt, out4.txt, out5.txt