Please see the attachment, this is a machine learning homework that needs to be done in Google...

Question

Please see the attachment, this is a machine learning homework that needs to be done in Google Collab. To view the notebook, take the file I uploaded it drop it into google drive, then double click and a notebook will open using google collab.{   "cells": [     {       "cell_type": "markdown",       "metadata": {         "id": "8p0BMR7HfMH3"       },       "source": [         "# Neural Networks for Text Classification
",         "
",         "In this assignment, you will implement the Naive Bayes algorithm, a simple, but competitive neural bag-of-words model for text classification and also experiment with a state-of-the pre-trained transformer model.  You will train your models on a (provided) dataset of positive and negative movie reviews and report accuracy on a test set."       ]     },     {       "cell_type": "markdown",       "metadata": {         "id": "jZ03PcaBgA3m"       },       "source": [         "#Download the dataset
",         "
",         "First you will need to download the IMDB dataset - to do this, simply run the cell below.  We have prepared a small version of the ACL IMDB dataset for you to use to help make your experiments faster.  The full dataset is available [here](https://ai.stanford.edu/~amaas/data/sentiment/), in case you are interested, but there is no need to use this for the assignment.
",         "
",         "Note that files downloaded in Colab are only saved temporariliy - if your session reconnects you will need to re-download the files.  In case you need persistent storage, you can mount your Google drive folder like so:
",         "
",         "```
",         "from google.colab import drive
",         "drive.mount('/content/drive')
",         "```
",         "
",         "You can also open a command line prompt by clicking on the shell icon on the left hand side of the page, and upload/download files from your local machine by clicking on the file icon."       ]     },     {       "cell_type": "code",       "execution_count": null,       "metadata": {         "colab": {           "base_uri": "https://localhost:8080/"         },         "id": "oDvw3YUJgQHj",         "outputId": "b40deb87-f391-4842-f811-95cc4ec3ade6"       },       "outputs": [         {           "name": "stdout",           "output_type": "stream",           "text": [             "--2021-10-24 22:22:34--  https://github.com/aritter/5525_sentiment/raw/master/aclImdb_small.tgz
",             "Resolving github.com (github.com)... 192.30.255.113
",             "Connecting to github.com (github.com)|192.30.255.113|:443... connected.
",             "HTTP request sent, awaiting response... 302 Found
",             "Location: https://raw.githubusercontent.com/aritter/5525_sentiment/master/aclImdb_small.tgz [following]
",             "--2021-10-24 22:22:35--  https://raw.githubusercontent.com/aritter/5525_sentiment/master/aclImdb_small.tgz
",             "Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.111.133, 185.199.110.133, ...
",             "Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443... connected.
",             "HTTP request sent, awaiting response... 200 OK
",             "Length: 9635749 (9.2M) [application/octet-stream]
",             "Saving to: ‘aclImdb_small.tgz’
",             "
",             "aclImdb_small.tgz   100%[===================>]   9.19M  --.-KB/s    in 0.07s   
",             "
",             "2021-10-24 22:22:35 (138 MB/s) - ‘aclImdb_small.tgz’ saved [9635749/9635749]
",             "
"           ]         }       ],       "source": [         "#Download the data
",         "
",         "!wget https://github.com/aritter/5525_sentiment/raw/master/aclImdb_small.tgz
",         "!tar xvzf aclImdb_small.tgz > /dev/null"       ]     },     {       "cell_type": "markdown",       "metadata": {         "id": "Jv4obseHgEgb"       },       "source": [         "# Converting text to numbers
",         "
",         "Below is some code we are providing you to read in the IMDB dataset, perform tokenization (using `nltk`), and convert words into indices.  Please don't modify this code in your submitted version.  We will provide example usage below."       ]     },     {       "cell_type": "code",       "execution_count": null,       "metadata": {         "colab": {           "base_uri": "https://localhost:8080/"         },         "id": "WfdeHCQHfR2n",         "outputId": "3354a168-ee16-4ee2-b479-adba7b2aebb0"       },       "outputs": [         {           "name": "stdout",           "output_type": "stream",           "text": [             "[nltk_data] Downloading package punkt to /root/nltk_data...
",             "[nltk_data]   Unzipping tokenizers/punkt.zip.
"           ]         }       ],       "source": [         "import os
",         "import sys
",         "
",         "import nltk
",         "from nltk import word_tokenize
",         "nltk.download('punkt')
",         "import torch
",         "
",         "#Sparse matrix implementation
",         "from scipy.sparse import csr_matrix
",         "import numpy as np
",         "from collections import Counter
",         "
",         "np.random.seed(1)
",         "
",         "class Vocab:
",         "    def __init__(self, vocabFile=None):
",         "        self.locked = False
",         "        self.nextId = 0
",         "        self.word2id = {}
",         "        self.id2word = {}
",         "        if vocabFile:
",         "            for line in open(vocabFile):
",         "                line = line.rstrip('\n')
",         "                (word, wid) = line.split('\t')
",         "                self.word2id[word] = int(wid)
",         "                self.id2word[wid] = word
",         "                self.nextId = max(self.nextId, int(wid) + 1)
",         "
",         "    def GetID(self, word):
",         "        if not word in self.word2id:
",         "

Amar Kumar · Accepted Answer

Answer Attached Below:

{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "8p0BMR7HfMH3" }, "source": [ "# Neural Networks for Text Classification\n", "\n", "In this assignment, you will implement the Naive Bayes...

Answer To: { "cells": [ { "cell_type": "markdown", "metadata": { "id": "8p0BMR7HfMH3" }, "source": [ "# Neural...

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment