Modify the skeleton .py file to translate words from English to Pig Latin. Please do not import anything other than what is already in the skeleton .py file. I've also attached the lecture slides...

1 answer below »
Modify the skeleton .py file to translate words from English to Pig Latin. Please do not import anything other than what is already in the skeleton .py file. I've also attached the lecture slides related to this topic if it helps. If you can, please comment your code so that a first-year college student can understand. Thank you!
Today’s Topics• ?'48"; &589 '[email protected]'99+-,#• Motivation: machine neural translation for long sentences• Decoder: attention• Transformer overview• Self-attentionSlides Thanks to Dana GurariConverting Text to Vectors1. Tokenize training data; convert data into sequence of tokens (e.g., data ->“This is tokening”)2. Learn vocabulary3. Encode data as vectorsTwo common approaches:https://nlpiation.medium.com/how-to-use-huggingfaces-transformers-pre-trained-tokenizers-e029e8d6d1faConverting Text to Vectors1. Tokenize training data2. Learn vocabulary by identifying all unique tokens in the training data3. Encode data as vectorsTwo common approaches:https://nlpiation.medium.com/how-to-use-huggingfaces-transformers-pre-trained-tokenizers-e029e8d6d1faToken a b c *** 0 1 *** ! @ ***Index 1 2 3 *** 27 28 *** 119 120 ***Token a an at *** bat ball *** zipper zoo ***Index 1 2 3 *** 527 528 *** 9,842 9,843 ***1. Tokenize training data2. Learn vocabulary by identifying all unique tokens in the training data3. Encode data as one-hot vectorshttps://github.com/DipLernin/Text_GenerationOne-hot encodingsInput sequence of 40 tokens representing characters or wordsConverting Text to VectorsConverting Text to VectorsWhat are the pros and cons for using word tokens instead of character tokens?- Pros: length of input/output sequences is shorter, simplifies learning semantics- Cons: “UNK” word token needed for out of vocabulary words; vocabulary can be largehttps://nlpiation.medium.com/how-to-use-huggingfaces-transformers-pre-trained-tokenizers-e029e8d6d1faToken a b c *** 0 1 *** ! @ ***Index 1 2 3 *** 27 28 *** 119 120 ***Token a an at *** bat ball *** zipper zoo ***Index 1 2 3 *** 527 528 *** 9,842 9,843 ***Converting Text to VectorsWord level representations are more commonly usedhttps://nlpiation.medium.com/how-to-use-huggingfaces-transformers-pre-trained-tokenizers-e029e8d6d1faToken a b c *** 0 1 *** ! @ ***Index 1 2 3 *** 27 28 *** 119 120 ***Token a an at *** bat ball *** zipper zoo ***Index 1 2 3 *** 527 528 *** 9,842 9,843 ***Problems with One-Hot Encoding Words?Kamath, Liu, and Whitaker. Deep Learning for NLP and Speech Recognition. 2019.• Huge memory burden • Computationally expensive Dimensionality = vocabulary sizee.g., English has ~170,000 words with ~10,000 commonly used wordsLimitation of One-Hot Encoding Words• No notion of which words are similar, yet such understanding can improve generalization• e.g., “walking”, “running”, and “skipping” are all suitable for “He was ____ to school.”Walking Soap Fire SkippingThe distance between all words is equal!Today’s Topics• Introduction to natural language processing• Text representation• Neural word embeddings• Programming tutorialIdea: Represent Each Word Compactly in a Space Where Vector Distance Indicates Word SimilarityKamath, Liu, and Whitaker. Deep Learning for NLP and Speech Recognition. 2019.Inspiration: Distributional Semantics“The distributional hypothesis says that the meaning of a word is derived from the context in which it is used, and words with similar meaning are used in similar contexts.”- Origins: Harris in 1954 and Firth in 1957Kamath, Liu, and Whitaker. Deep Learning for NLP and Speech Recognition. 2019.Inspiration: Distributional Semantics“The distributional hypothesis says that the meaning of a word is derived from the context in which it is used, and words with similar meaning are used in similar contexts.”Kamath, Liu, and Whitaker. Deep Learning for NLP and Speech Recognition. 2019.Inspiration: Distributional Semantics• What is the meaning of berimbau based on context?• Idea: context makes it easier to understand a word’s meaningBackground music from a berimbau offers a beautiful escape.Many people danced around the berimbau player.I practiced for many years to learn how to play the berimbau.https://capoeirasongbook.wordpress.com/instruments/berimbau/[Adapted from slides by Lena Voita]Inspiration: Distributional Semantics“The distributional hypothesis says that the meaning of a word is derived from the context in which it is used, and words with similar meaning are used in similar contexts.”Kamath, Liu, and Whitaker. Deep Learning for NLP and Speech Recognition. 2019.• What other words could fit into these context?Inspiration: Distributional Semantics[Adapted from slides by Lena Voita]1. Background music from a _______ offers a beautiful escape.2. Many people danced around the _______ player.3. I practiced for many years to learn how to play the _______.1 1 10 0 00 0 01 1 1BerimbauSoapFireGuitar1 if a word can appear in the context0 otherwise1. 2. 3. ContextsHypothesis is that words with similar row values have similar meaningsInspiration: Distributional Semantics“The distributional hypothesis says that the meaning of a word is derived from the context in which it is used, and words with similar meaning are used in similar contexts.”Kamath, Liu, and Whitaker. Deep Learning for NLP and Speech Recognition. 2019.Approach• Learn a dense (lower-dimensional) vector for each word by characterizing its context, which inherently will reflect similarity/differences to other wordsBerimbau and guitar are the closest word pairBerimbau Soap Fire GuitarThe distance between each pair of words differs!Note: many ways to measure distance (e.g., cosine distance)Approach• Learn a dense (lower-dimensional) vector for each word by characterizing its context, which inherently will reflect similarity/differences to other wordsWe embed words in a shared space so they can be compared with a few featuresWhat features would discriminate these words?Berimbau Soap Fire GuitarApproach• Learn a dense (lower-dimensional) vector for each word by characterizing its context, which inherently will reflect similarity/differences to other wordsBerimbau Soap Fire GuitarWoodenCommodityCleanerFoodTemperatureNoisyWeaponPotential, interpretable featuresApproach: Learn Word Embedding Space• An embedding space represents a finite number of words, decided in training• A word embedding is represented as a vector indicating its context• The dimensionality of all word embeddings in an embedding space match• What is the dimensionality for the shown example?…Approach: Learn Word Embedding Space• An embedding space represents a finite number of words, defined in training• A word embedding is represented as a vector indicating its context• The dimensionality of all word embeddings in an embedding space match???????In practice, the learned discriminating features are hard to interpretEmbedding Matrix• The embedding matrix converts an input word into a dense vector Kamath, Liu, and Whitaker. Deep Learning for NLP and Speech Recognition. 2019.Size of vocabularyBerimbau Soap Fire Guitar …Target dimensionality(e.g., 5)One hot encoding dictates the word embedding to useEmbedding Matrix• It converts an input word into a dense vector Kamath, Liu, and Whitaker. Deep Learning for NLP and Speech Recognition. 2019.Size of vocabularyBerimbau Soap Fire Guitar …Target dimensionality(e.g., 5)A word’s embedding can efficiently be extracted when we know the word’s indexPopular Word Embeddings• Bengio method• Word2vec (skip-gram model)• And more…Popular Word Embeddings• Bengio method• Word2vec (skip-gram model)• And more…Idea: Learn Word Embeddings That Help Predict Viable Next Wordse.g.,1. Background music from a _______2. Many people danced around the _______3. I practiced for many years to learn how to play the _______Bengio et al. A Neural Probabilistic Language Model. JMLR 2003.Task: Predict Next Word Given Previous Onese.g.,1. Background music from a _______2. Many people danced around the _______3. I practiced for many years to learn how to play the _______Task: Predict Next Word Given Previous OnesBengio et al. A Neural Probabilistic Language Model. JMLR 2003.e.g., a vocabulary size of 17,000 was used in experimentsWhat is the dimensionality of the output layer?ArchitectureBengio et al. A Neural Probabilistic Language Model. JMLR 2003.Embedding matrix:Word embeddings:ArchitectureBengio et al. A Neural Probabilistic Language Model. JMLR 2003.e.g., a vocabulary size of 17,000 was used with embedding sizes of 30, 60, and 100 in experimentsAssume a 30-d word embedding- what are the dimensions of the embedding matrix C?30 x 17,000 (i.e., 510,000 weights)ArchitectureBengio et al. A Neural Probabilistic Language Model. JMLR 2003.e.g., a vocabulary size of 17,000 was used with embedding sizes of 30, 60, and 100 in experimentsAssume a 30-d word embedding- what are the dimensions of each word embedding?1 x 30ArchitectureBengio et al. A Neural Probabilistic Language Model. JMLR 2003.Projection layer followed by a hidden layer with non-linearityTrainingBengio et al. A Neural Probabilistic Language Model. JMLR 2003.Input: tried 1, 3, 5, and 8 input words and used 2 datasets with ~1 million and ~34 million words respectively Use sliding window on input data; e.g., 3 wordsBackground music from a berimbau offers a beautiful escape…TrainingBengio et al. A Neural Probabilistic Language Model. JMLR 2003.Input: tried 1, 3, 5, and 8 input words and used 2 datasets with ~1 million and ~34 million words respectively Use sliding window on input data; e.g., 3 wordsBackground music from a berimbau offers a beautiful escape…TrainingBengio et al. A Neural Probabilistic Language Model. JMLR 2003.Input: tried 1, 3, 5, and 8 input words and used 2 datasets with ~1 million and ~34 million words respectively Use sliding window on input data; e.g., 3 wordsBackground music from a berimbau offers a beautiful escape…TrainingBengio et al. A Neural Probabilistic Language Model. JMLR 2003.Input: tried 1, 3, 5, and 8 input words and used 2 datasets with ~1 million and ~34 million words respectively Use sliding window on input data; e.g., 3 wordsBackground music from a berimbau offers a beautiful escape…TrainingBengio et al. A Neural Probabilistic Language Model. JMLR 2003.Input: tried 1, 3, 5, and 8 input words and used 2 datasets with ~1 million and ~34 million words respectively Cost function: minimize cross entropy loss plus regularization (L2 weight decay)Word embedding iteratively updatedSummary: Word Embeddings Are Learned that Support Predicting Viable Next Wordse.g.,1. Background music from a _______2. Many people danced around the _______3. I practiced for many years to learn how to
Answered 1 days AfterMay 11, 2022

Answer To: Modify the skeleton .py file to translate words from English to Pig Latin. Please do not import...

Dipansu answered on May 13 2022
28 Votes
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here