Project 1 Project 1: Word Analysis Due 11:59 p.m., Sunday, October 24, 2020 IMPORTANT! This is an individual assignment. You may discuss broad issues of interpretation and understanding and general...

1 answer below »
All the instructions are in the pdf attached.


Project 1 Project 1: Word Analysis Due 11:59 p.m., Sunday, October 24, 2020 IMPORTANT! This is an individual assignment. You may discuss broad issues of interpretation and understanding and general approaches to a solution. However, conversion to a specific code must be your own work. The assignment is expected to be your work, designed and coded by you alone. If you need help, please consult with your instructor. Objectives The objectives of this laboratory are 1. to learn how to work with strings and lists 2. to learn how to work with files 1. Introduction The Moby project is an extensive public-domain collection of lexical resources (such as words, phrases, synonyms, etc.) started by Grady Ward in 1996. It is now part of Project Gutenberg, an ambitious effort to digitize and archive virtually all historically important books and documents. In this project, we will play with the official list of 113,809 crosswords (i.e., words considered to be valid in crosswords puzzles and other word games). Preparation. Download the file crosswords.txt . To work with a file in Python, we first create a file object by opening it: infile = open("crosswords.txt", "r") You can read and print all of the words in the file, one at a time, using a for loop: for line in infile: word = line[:len(line)-1] # remove the newline character '\n' at the end of each line print(word) Make sure to close the file when done: close(infile) Save this script as wordanalysis.py. Run the script, and it will print every word in the input file. https://en.wikipedia.org/wiki/Moby_Project http://www.gutenberg.org/ https://drive.google.com/file/d/17UBg8LZy4uxwFACrkzxiE5tZzbGczhvi/view?usp=sharing Building a list of words. Modify your script so that it builds a list of words in the input file using the append() method, without printing them. Words with more than 20 letters. Now, modify your script so that it prints all the words in the list with more than 20 letters. Run the script and save the output of your test run in a text file named out1.txt. Palindromes. A palindrome is a word that is spelled the same forward and backward. In your script, define a function named isPalindrome() that takes a word in a parameter and returns True if the given word is a palindrome and False otherwise. In the main section of your script, add code to count the number of all the palindromes in the list, and then print the shortest and longest palindromes. Run the script and save the output in a text file named out2.txt. It should output in the following format: In the official list of 113,809 crosswords, there are ... palindromes. The shortest palindrome is ... The longest palindrome is ... Words without ‘e’. In 1939, Ernest Vincent Wright published a 50,000-word novel titled Gadsby that does not contain the letter ‘e’. Since ‘e’ is the most common letter in English, that was not easy to do. In the main section, add a code segment to count the number of all the words that do not have ‘e’, and then print the shortest and longest such words. Run the script and save the output of your test run in a text file named out3.txt. It should output in the following format: In the official list of 113,809 crosswords, there are ... words that do not have 'e'. The shortest such word is ..., and the longest such word is ... 2. Frequency analysis In English, certain letters are used more frequently than others. For example, there are more words that begin with the letter ‘s’ than others. It is also well-known that ‘e’ is the most frequently-used letter in English. Historically, such knowledge has played very important roles in cryptanalysis (i.e., the study of breaking ciphers). The following two exercises concern frequency analysis of letters used in the official list of 113,809 crosswords. The most frequently-used first letter. In this official list of 113,809 crosswords, how many words begin with the letter ‘a’? How many words begin the letter ‘b’? Is it true that ‘s’ is the most frequently-used first letter? In the main section, add a code segment to count, for each letter in the alphabet, the number of all the words that begin with the letter, and then print the most frequently-used first letter. For this, use at most two loops, not 26 separate loops to cover the alphabet. Run the script and save the output of your test run in a text file named out4.txt. It should output in the following format: In the official list of 113,809 crosswords, ... words begin with 'a', ... words begin with 'b', . . . ... words begin with 'z', and ... is the most frequently-used first letter. Hint: Define a list of 26 counters, each of which keeps track of the number of words that begin with each letter. The most frequently-used letter. In this official list of 113,809 crosswords, how many words use the letter ‘a’? How many words use the letter ‘b’? Is it true that ‘e’ is the most frequently-used letter? In the main section, add a code segment to count, for each letter in the alphabet, the number of all the words that use the letter, and then print the most frequently-used letter. As before, use at most two loops, not 26 separate loops to cover the alphabet. Run the script and save the output of your test run in a text file out5.txt. It should output in the following format: In the official list of 113,809 crosswords, ... words use 'a', ... words use 'b', . . . ... words use 'z', and ... is the most frequently-used letter. Hint: Python ord() function takes a character and returns its integer Unicode code. For example, >>> print(ord("a")) 97 Python chr() function takes an integer Unicode code argument and returns the string representing a character at that code. >>> print(chr(97)) a You should have a list of size 26 that stores the frequency and you should increase the frequency there. For example, l = [0]*26 l[ord(‘a’)-97] += 1 What to hand in Upon completion of your project, create a folder by your last name, copy your python script and output files inside the folder. Then create a zip file(your_last_name.zip) and upload the zip file in the moodle. ● Python script: wordanalysis.py. As always, your script should be properly documented by including a header at the beginning of the script and inserting comments wherever appropriate. ● Output files: out1.txt, out2.txt, out3.txt, out4.txt, out5.txt
Answered 6 days AfterOct 18, 2021

Answer To: Project 1 Project 1: Word Analysis Due 11:59 p.m., Sunday, October 24, 2020 IMPORTANT! This is an...

Darshan answered on Oct 25 2021
118 Votes
out1.txt
counterdemonstrations
hyperaggressivenesses
microminiaturizations
out2.txt
In the official list of 113809 crosswords, there are 91
palindromes.
The shortest palindrome is aa
The longest palindrome is deified
out3.txt
In the official list of 113809 crosswords, there are 37641 words that do not have 'e'.
The shortest such word is aa, and the longest such word ismicrominiaturizations
out4.txt
In the official list of 113809 crosswords,
6557 words begin with a,
6848 words begin with b,
10385 words begin with c,
6436 words begin with d,
4364 words begin with e,
4937 words begin with f,
3950 words begin with g,
4080 words begin with h,
4013 words begin with i,
1106 words begin with j,
1312 words begin with k,
3710 words begin with l,
6270 words begin with m,
2208 words begin with n,
3978 words begin with o,
8693 words begin with p,
568 words begin with q,
7141 words begin with r,
12591 words begin with s,
5951 words begin with t,
2934 words begin with u,
1932 words begin with v,
2927 words begin with w,
82 words begin with x,
438 words begin with y,
398 words begin with z,
and s is the most frequently-used first letter.
out5.txt
In the official list of 11380956613 words use a,
16305 words use b,
30466 words use c,
30648 words use d,
76168 words use e,
11277 words use f,
24979 words use g,
19096 words use h,
60314 words use i,
1747 words use j,
8978 words use k,
40133...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here