Hey my course is object oriented programming c++and this assignmentis due in 2-3 days please let me know if you can do it or notI am pasting all the details below.Thank you. Semantic Descriptors One...

1 answer below »
Hey my course is object oriented programming c++and this assignmentis due in 2-3 days please let me know if you can do it or notI am pasting all the details below.Thank you.





Semantic Descriptors


One type of question encountered in the Test of English as a Foreign Language (TOEFL) is the “Synonym Question”, where students are asked to pick a synonym of a word out of a list of alternatives. For example:



1. vexed (Answer: (a) annoyed)



(a) annoyed


(b) amused


(c) frightened


(d) excited


How would you build a system that could answer such a question intelligently?


The goal of this project is to begin to explore this idea. In order to do so, the system will approximate thesemantic similarityof any pair of words. The semantic similarity between two words is the measure of the closeness of their meanings. For example, the semantic similarity between “car” and “vehicle” is high, while that between “car” and “flower” is low.


In order to answer the TOEFL question, you will compute the semantic similarity between the word you are given and all the possible answers, and pick the answer with the highest semantic similarity to the given word. More precisely, given a wordwand a list of potential synonymss1, s2, s3, s4,we compute the similarities of(w, s1), (w, s2), (w, s3), (w, s4)and choose the word whose similarity towis the highest.


We will measure the semantic similarity of pairs of words by first computing asemantic descriptor vectorof each of the words, and then taking the similarity measure to be thecosine similaritybetween the two vectors.


Given a text with n words denoted by(w1, w2, ..., wn)and a wordw, letdescbe the semantic descriptor vector ofwcomputed using the text.descis an n-sized vector. The i-th coordinate ofdescis the number of sentences in which bothwandwioccur. For efficiency’s sake, we will not store the zeros that correspond to words which don’t co-occur with w. For example, suppose we are given the following text (the opening of Notes from the Underground by Fyodor Dostoyevsky, translated by Constance Garnett):



I am a sick man. I am a spiteful man. I am an unattractive man. I believe my liver is diseased.


However, I know nothing at all about my disease, and do not know for certain what ails me.


The word“man”only appears in the first three sentences. Its semantic descriptor vector would be:



{"i": 3, "am": 3, "a": 2, "sick": 1, "spiteful": 1, "an": 1, "unattractive": 1}


The word“liver”only occurs in the second sentence, so its semantic descriptor vector is:



{"i": 1, "believe": 1, "my": 1, "is": 1, "diseased": 1}


We store all words in all-lowercase, since we don’t consider, for example, “Man” and “man” to be different words. We do, however, consider, e.g., “believe” and “believes”, or “am” and “is” to be different words. We discard all punctuation.


Cosine Similarity


The cosine similarity between two vectors u = {u1, u2,. . . , uN} and v = {v1, v2, . . . , vN} is defined as:



LaTeX: similarity\left(u,v\right)=\cos\left(u,v\right)=\:\frac{u\cdot v}{\left|u\right|\left|v\right|}=\frac{u\cdot v}{\sqrt{u\cdot u}\cdot\sqrt{v\cdot v}}=\frac{\sum u_iv_i}{\sqrt{\left(\sum u_i^2\right)\left(\sum v_i^2\right)}}similarity(u,v)=cos⁡(u,v)=u⋅v|u||v|=u⋅vu⋅u⋅v⋅v=∑uivi(∑ui2)(∑vi2)


In our code, we will use the final version of the formula on the right side. There is a lot of theory behind this formula that you will learn more about when you take multivariable calculus. You can click on the following links if you are curious. If you are not curious, that is fine, you can follow the process and it will still work.


The pictures below illustrate the major idea of how cosine similarity looks in two dimensions:



Angle between two similar vectors
Angle between to dissimilar vectors


If you combine the distributive property, theLaw of Cosines(Links to an external site.), and the Pythagorean theorem, you can relate the cosine of the angle between two vectors in n-dimensional space to our final version of the formula that involves a computation on the elements of each vector. See thederivation(Links to an external site.). If an angle is close to 0, its cosine is close to 1, indicating that the vectors are aligned and therefore similar.


In this formula, cos(u,v) is thecosine(Links to an external site.)of the angle between u and v, treating u and v as vectors in an n-dimensional coordinate space. The dot (∙ ) represents thedot product(Links to an external site.)operator, which will multiply corresponding elements of each vector and sum their products. The bars, such as |v|, represent the magnitude of the vector inside, in this case v, as aEuclidian norm(Links to an external site.), which in short is the Pythagorean Theorem applied in higher dimensions.


Capital sigmarepresents summation, as we would do in a for-loop.
LaTeX: \sum u_iv_i∑uivirepresents a for loop that multiplies each element of vectors u and v and sums each product, such that:



LaTeX: \sum u_iv_i=u_0v_0+u_1v_1+...\:+\:u_nv_n∑uivi=u0v0+u1v1+...+unvn


where u and v are both assumed to have the same size n. The summation of a squared term represents a sum of products where each element is multiplied by itself:



LaTeX: \sum u_i^2=u_0u_0+u_1u_1+...+u_nu_n∑ui2=u0u0+u1u1+...+unun


Thus, we can compute the cosine of the angle between two semantic descriptors using for-loops, sum of products, square roots, and division, and the closer the cosine is to 1, the more similar we can assume the descriptors to be.


We cannot apply the formula directly to our semantic descriptors since we do not store the entries which are equal to zero. However, we can still compute the cosine similarity between vectors byonly considering non-zero matching words between the vectors. This works out because a non-zero element in one vector multiplied by a zero element in the other vector would have a product of 0 and contribute nothing to the sum anyway.


For example, the cosine similarity of“man”and“liver”, given the semantic descriptors above, is



LaTeX: \frac{3\cdot1\:\left(for\:the\:word\:3⋅1(fortheword"i")(32+32+22+12+12+12+12)(12+12+12+12+12)=3130=0.2631...


Design


In order to review the tools we have learned so far, we will take an object-oriented approach. There is a better container for semantic descriptors, but for now we will use thestd::vectorto hold our information. I will provide you withmain.cppand the header files:


main.cpp



SemanticDictionary
dis the object that will contain and manage aSemanticDescriptorobject for each target word.main.cppwill run as follows:



  • Get a line of text that presumably contains multiple sentences

  • CallgetSentenceLists. This function will return a two-dimensional vector where each row represents a sentence and each column represents a word from that sentence

  • OutputsentenceListsto verify that they match the given text, using an overloaded output operator

  • For each row ofsentenceLists, where each row represents one sentence, usedto process each word in that sentence as atargetWord.

    • This will build and update the semantic descriptor for each target word, incrementing its counters for each other word that appears in the sentence



  • Output the contents ofdto verify that theSemanticDescriptors were built correctly, using an overloaded output operator

  • Input the TOEFL question

  • If one of the words is not found,dwill throw an exception, so calculate the similarities in atry-catchblock

  • CallmostSimilarWord. This will compare the given word to each possible choice and return the word that gets the highest similarity score

  • Output the chosen word and its score

  • Compare it to the given correct answer to see if it guessed correctly


#include "SemanticDictionary.h"
using namespace std;
int main()
{
SemanticDictionary d;
string text;
cout

getline(cin, text);
vector> sentenceLists = getSentenceLists(text);
cout

for(vector sentenceList : sentenceLists)
for(string targetWord : sentenceList) d.processTargetWord(targetWord, sentenceList);
cout

cout :"

string word, answer, choice1, choice2, choice3;
cin >> word >> answer >> choice1 >> choice2 >> choice3;
try
{
string s = d.mostSimilarWord(word, vector{choice1, choice2, choice3});
cout

cout

if(s == answer) cout

else cout

}
catch(runtime_error& e)
{
cout

}
return 0;
}

SemanticDescriptor.h


TheSemanticDescriptorclass will contain itstargetWordand avectorofContextWordobjects. EachContextWordobject contains acountof how many times the givenwordshowed up in the same sentence as thetargetWord. It will be constructed with itstargetWordand will update itsvectorofContextWords each timeprocessContextWordis called.



void processContextWord(string s)



  • Loop throughcontextWords.Ifsis found, then increment itscount.

  • Ifsis not found, thenpush_backa newContextWordobject forswith a count of1



int operator*(const SemanticDescriptor& desc)



  • This is the dot product operator to find the dot product between twodescriptors as described above

  • Since the operator is being defined from within the class, the lvalue isthisand the rvalue isdesc, eg for u*v, u would be the current object that the code is operating within anddescwould be v.

  • For eachContextWord
    inthis, loop through eachContextWord
    indesc

    • If a matching word within theContextWords is found, multiply theircounts and add the product to a running sum



  • Return the sum



friend ostream& operator



  • Output the contents of theSemanticDescriptoras shown in the sample output below

  • Since it is declared as afriendfrom within the class, this operator will be able to access the private variables ofdesc

  • InSemanticDescriptor.cpp, define this asostream& operatoryou do not need to use the friend keyword again



Full Header File:


#ifndef SEMANTICDESCRIPTOR_H_INCLUDED
#define SEMANTICDESCRIPTOR_H_INCLUDED
#include

#include

#include

using namespace std;
struct ContextWord
{
string word;
int count;
};
class SemanticDescriptor
{
private:
string targetWord;
vector contextWords;
public:
SemanticDescriptor(string _targetWord) : targetWord(_targetWord) {}
string getTargetWord() {return targetWord;}
void processContextWord(string s);
int operator*(const SemanticDescriptor& desc);
friend ostream& operator
};
#endif // SEMANTICDESCRIPTOR_H_INCLUDED

SemanticDictionary.h


TheSemanticDictionaryclass will contain and manage avectorofSemanticDescriptorobjects. This file will also contain a function defined outside the class that will parse each sentence into a list of words.



getSentenceLists(string text)


The goal here is to create a two-dimensional vector where each row represents one sentence fromtext, and each column represents one word from the given sentence with any non-alphabetical characters removed. This function is much more difficult than it seems. Please use the following starter code:


vector> getSentenceLists(string text)
{
vector> sentenceLists;
vector sentenceList;
string word = "";
for(size_t i = 0; i

{
///if the ith char is alphabetical, concatenate it onto word
///else
/// if the size of word > 0, push it back into sentenceList and make the word empty
/// if the ith char is '?', '.', or '!', AND the size of sentenceList > 0,

/// then push back sentenceList into sentenceLists and clear sentenceList
}
///if the size of word > 0, push it back into sentenceList
///if the size of sentenceList > 0, push it back into sentenceLists
return sentenceLists;
}


int searchTargetWord(string targetWord)



  • Loop through thevector
    semanticDescriptors.

    • If aSemanticDescriptoris found with a matchingtargetWord, return its index.

    • Otherwise return -1.





void processTargetWord(string targetWord, vector sentenceList)



  • UsesearchTargetWordto find the index of theSemanticDescriptorwith the giventargetWord.

  • If thetargetWordis not found, construct and push back aSemanticDescriptorto the end of thevector.

  • Loop through each string insentenceList.

    • If the current word is not equal to thetargetWord, then callprocessContextWordon theSemanticDescriptorat the previously determined index





double getSimilarity(string targetWord1, string targetWord2)



  • CallsearchTargetWordonce for each argument and store their indices

  • If either target word is not found, throw a runtime error with the text "Target word(s) unknown"

  • Access and store theSemanticDescriptorsat the given indices

  • Return the cosine similarity of theSemanticDescriptorsusing your*operator:


    • LaTeX: \frac{a\cdot b}{\sqrt{a\cdot a}\sqrt{b\cdot b}}a⋅ba⋅ab⋅b

    • Be careful with integer divison rounding errors here





string mostSimilarWord(string word, vector choices)



  • Loop throughchoicesand usegetSimilarityto return the value of the string that is most similar toword



ostream& operator



  • Loop through and print eachSemanticDescriptor, using itsoperator

  • Format the output as shown in the sample output below

  • This is declared as afriendinside theSemanticDictionaryclass declaration, so the function will have access to private variables

    • Define this in the .cpp file outside of the class and do not use thefriendkeyword again





ostream& operator>& sentenceLists)



  • OutputsentenceListsas shown in the sample output in the next heading below



Full Header File:


#ifndef SEMANTICDICTIONARY_H_INCLUDED
#define SEMANTICDICTIONARY_H_INCLUDED
#include "SemanticDescriptor.h"
#include

using namespace std;
vector> getSentenceLists(string text);
class SemanticDictionary
{
private:
vector semanticDescriptors;
int searchTargetWord(string targetWord);
public:
void processTargetWord(string targetWord, vector sentenceList);
double getSimilarity(string targetWord1, string targetWord2);
string mostSimilarWord(string word, vector choices);
friend ostream& operator
};
ostream& operator>& sentenceLists);
#endif // SEMANTICDICTIONARY_H_INCLUDED

Sample Output


Type a paragraph followed by a newline to build semantic descriptors:
She was vexed, irritated and angry with his actions!!! He was annoyed and irritated as well, wishing he could explain to her how angry he was. The dog was amused - gleefully tearing the couch apart and sprinting around the room... The cat was frightened, trying to hide under the couch that the dog was destroying.
Sentence Lists:
[
[ she was vexed irritated and angry with his actions ]
[ he was annoyed and irritated as well wishing he could explain to her how angry he was ]
[ the dog was amused gleefully tearing the couch apart and sprinting around the room ]
[ the cat was frightened trying to hide under the couch that the dog was destroying ]
]
Semantic Descriptors:
{
she { was 1 vexed 1 irritated 1 and 1 angry 1 with 1 his 1 actions 1 }
was { she 1 vexed 1 irritated 3 and 4 angry 3 with 1 his 1 actions 1 he 6 annoyed 2 as 2 well 2 wishing 2 could 2 explain 2 to 4 her 2 how 2 the 9 dog 3 amused 1 gleefully 1 tearing 1 couch 3 apart 1 sprinting 1 around 1 room 1 cat 2 frightened 2 trying 2 hide 2 under 2 that 2 destroying 2 }
vexed { she 1 was 1 irritated 1 and 1 angry 1 with 1 his 1 actions 1 }
irritated { she 1 was 3 vexed 1 and 2 angry 2 with 1 his 1 actions 1 he 3 annoyed 1 as 1 well 1 wishing 1 could 1 explain 1 to 1 her 1 how 1 }
and { she 1 was 4 vexed 1 irritated 2 angry 2 with 1 his 1 actions 1 he 3 annoyed 1 as 1 well 1 wishing 1 could 1 explain 1 to 1 her 1 how 1 the 3 dog 1 amused 1 gleefully 1 tearing 1 couch 1 apart 1 sprinting 1 around 1 room 1 }
angry { she 1 was 3 vexed 1 irritated 2 and 2 with 1 his 1 actions 1 he 3 annoyed 1 as 1 well 1 wishing 1 could 1 explain 1 to 1 her 1 how 1 }
with { she 1 was 1 vexed 1 irritated 1 and 1 angry 1 his 1 actions 1 }
his { she 1 was 1 vexed 1 irritated 1 and 1 angry 1 with 1 actions 1 }
actions { she 1 was 1 vexed 1 irritated 1 and 1 angry 1 with 1 his 1 }
he { was 6 annoyed 3 and 3 irritated 3 as 3 well 3 wishing 3 could 3 explain 3 to 3 her 3 how 3 angry 3 }
annoyed { he 3 was 2 and 1 irritated 1 as 1 well 1 wishing 1 could 1 explain 1 to 1 her 1 how 1 angry 1 }
as { he 3 was 2 annoyed 1 and 1 irritated 1 well 1 wishing 1 could 1 explain 1 to 1 her 1 how 1 angry 1 }
well { he 3 was 2 annoyed 1 and 1 irritated 1 as 1 wishing 1 could 1 explain 1 to 1 her 1 how 1 angry 1 }
wishing { he 3 was 2 annoyed 1 and 1 irritated 1 as 1 well 1 could 1 explain 1 to 1 her 1 how 1 angry 1 }
could { he 3 was 2 annoyed 1 and 1 irritated 1 as 1 well 1 wishing 1 explain 1 to 1 her 1 how 1 angry 1 }
explain { he 3 was 2 annoyed 1 and 1 irritated 1 as 1 well 1 wishing 1 could 1 to 1 her 1 how 1 angry 1 }
to { he 3 was 4 annoyed 1 and 1 irritated 1 as 1 well 1 wishing 1 could 1 explain 1 her 1 how 1 angry 1 the 3 cat 1 frightened 1 trying 1 hide 1 under 1 couch 1 that 1 dog 1 destroying 1 }
her { he 3 was 2 annoyed 1 and 1 irritated 1 as 1 well 1 wishing 1 could 1 explain 1 to 1 how 1 angry 1 }
how { he 3 was 2 annoyed 1 and 1 irritated 1 as 1 well 1 wishing 1 could 1 explain 1 to 1 her 1 angry 1 }
the { dog 6 was 9 amused 3 gleefully 3 tearing 3 couch 6 apart 3 and 3 sprinting 3 around 3 room 3 cat 3 frightened 3 trying 3 to 3 hide 3 under 3 that 3 destroying 3 }
dog { the 6 was 3 amused 1 gleefully 1 tearing 1 couch 2 apart 1 and 1 sprinting 1 around 1 room 1 cat 1 frightened 1 trying 1 to 1 hide 1 under 1 that 1 destroying 1 }
amused { the 3 dog 1 was 1 gleefully 1 tearing 1 couch 1 apart 1 and 1 sprinting 1 around 1 room 1 }
gleefully { the 3 dog 1 was 1 amused 1 tearing 1 couch 1 apart 1 and 1 sprinting 1 around 1 room 1 }
tearing { the 3 dog 1 was 1 amused 1 gleefully 1 couch 1 apart 1 and 1 sprinting 1 around 1 room 1 }
couch { the 6 dog 2 was 3 amused 1 gleefully 1 tearing 1 apart 1 and 1 sprinting 1 around 1 room 1 cat 1 frightened 1 trying 1 to 1 hide 1 under 1 that 1 destroying 1 }
apart { the 3 dog 1 was 1 amused 1 gleefully 1 tearing 1 couch 1 and 1 sprinting 1 around 1 room 1 }
sprinting { the 3 dog 1 was 1 amused 1 gleefully 1 tearing 1 couch 1 apart 1 and 1 around 1 room 1 }
around { the 3 dog 1 was 1 amused 1 gleefully 1 tearing 1 couch 1 apart 1 and 1 sprinting 1 room 1 }
room { the 3 dog 1 was 1 amused 1 gleefully 1 tearing 1 couch 1 apart 1 and 1 sprinting 1 around 1 }
cat { the 3 was 2 frightened 1 trying 1 to 1 hide 1 under 1 couch 1 that 1 dog 1 destroying 1 }
frightened { the 3 cat 1 was 2 trying 1 to 1 hide 1 under 1 couch 1 that 1 dog 1 destroying 1 }
trying { the 3 cat 1 was 2 frightened 1 to 1 hide 1 under 1 couch 1 that 1 dog 1 destroying 1 }
hide { the 3 cat 1 was 2 frightened 1 trying 1 to 1 under 1 couch 1 that 1 dog 1 destroying 1 }
under { the 3 cat 1 was 2 frightened 1 trying 1 to 1 hide 1 couch 1 that 1 dog 1 destroying 1 }
that { the 3 cat 1 was 2 frightened 1 trying 1 to 1 hide 1 under 1 couch 1 dog 1 destroying 1 }
destroying { the 3 cat 1 was 2 frightened 1 trying 1 to 1 hide 1 under 1 couch 1 that 1 dog 1 }
}
Enter a TOEFL question as :
vexed annoyed annoyed amused frightened
Most similar: annoyed
Index: 0.360844
Correct answer.
Answered Same DayFeb 18, 2021

Answer To: Hey my course is object oriented programming c++and this assignmentis due in 2-3 days please let me...

Pulkit answered on Feb 19 2021
147 Votes
import time
import math
import os
def norm(vec):
'''Return the norm of a vector stored as a dictionary,
as describe
d in the handout for Project 3.
'''

sum_of_squares = 0.0 # floating point to handle large numbers
for x in vec:
sum_of_squares += vec[x] * vec[x]

return math.sqrt(sum_of_squares)
def cosine_similarity(vec1, vec2):
upper=0.0
for keys in vec1:
if keys in vec2:
upper += vec1[keys]*vec2[keys]
return upper / (norm(vec1)*norm(vec2))
def build_semantic_descriptors(sentences):
count = {}
for i in range(len(sentences)):
for j in range(len(sentences[i])):
if not sentences[i][j] in count:
count[sentences[i][j]] = { }
for k in range(len(sentences[i])):
if sentences[i][j] != sentences[i][k]:
if sentences[i][k] in count[sentences[i][j]]:
count[sentences[i][j]][sentences[i][k]] += 1
else:
count[sentences[i][j]][sentences[i][k]] = 1
else:
for k in range(len(sentences[i])):
if sentences[i][j] != sentences[i][k]:
...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here