# Step 1. Import the packages that you need. You will need more than just sqlite3. # Make sure to install anything that's missing from the default Python packages. import sqlite3 import numpy as np...

The assignment only needs about 10 lines of coding



# Step 1. Import the packages that you need. You will need more than just sqlite3. # Make sure to install anything that's missing from the default Python packages. import sqlite3 import numpy as np import pandas as pd from sklearn.metrics.pairwise import cosine_similarity import math # Step 2. Open a connection to BSAN440_MovieDB.db conn = sqlite3.connect("yungc/downloads/BSAN440_MovieDB(1).db") cur = conn.cursor() # Step 3. Query the 1-hot-encoded table to get the matrix with the 0-1 values cur.execute("") # Step 4. Save the results from the query as an ndarray (numpy matrix). You don't want to use a Pandas Dataframe here because you won't be able to do the # matrix multiplication in step 5 below with Pandas. # add your code for step 4 here # Step 5. Use function np.dot() to find the dot product of every movie in the 0-1 matrix with every other movie. # https://numpy.org/doc/stable/reference/generated/numpy.dot.html # The dot product in this case will implement matrix-matrix multiplication. The 0-1 matrix is a rectangular # matrix (the rows are the movies and the columns are the genres' 0-1 values). # Remember that to multiply two matrices the number of columns in the first matrix must equal the number of # rows in the second matrix. For example, you can multiply two matrices when the first one has size 3 x 4 and # the second one has size 4 x 7. The product of the two will be a 3 x 7 matrix. When you are specifying the parameters # (input arguments) to the np.dot() function, you need to apply a transformation to one of the parameters. # Think about what that transformation is? # add your code for step 5 here # Step 6. Make sure that the result from the step above is a square matrix where the size is the number of movies. # Examine this matrix. Does it make sense to you? # Can you look at the first few movies in the database and tell if it worked right? # add your code for step 6 here # Step 7. Save the results from the matrix as a Pandas DataFrame where the column names are the movie titles. # You will need to get the movie titles from the database. # add your code for step 7 here # Step 8. Add the movie titles as a starting column to the same dataframe from step 7. # add your code for step 8 here # Step 9. Export the data for the first 10 movies from the DataFrame to a CSV file. # add your code for step 9 here # Step 10. Once again check the CSV file to make sure that the results are accurate before you submit. # no need to add code here but think about that matrix you are submitting. You can do the comparison in your head. If your answers are different go back and fix the code. Assignment 6 50 points For this assignment we will go back to the BSAN440_MovieDB.db. We will find similarity between the different movies based on their genres. For that we will use the 1-hot-encoded data that we created during the preprocessing in the beginning of the class. You can go back and review the slides from Week 4 – Data Preprocessing Part 4 if you cannot remember how to use the 1-hot-encoding for similarity. Here’s an example: We have two movies – Grumpier Old Men (1995) and Waiting to Exhale (1995). Grumpier Old Men has genres Comedy and Romance so it has a value 1 for columns comedy and romance in the 1-hot-encoded data. Waiting to Exhale has genres Comedy, Drama and Romance. So, it has 1s for these 3 genre columns. Two of the genres match – comedy and romance. If we find the dot product between the 1-0 vector for the first movie and the 1-0 vector for the second movie we should expect to see the value 2 (the number of matching genres). I. Make sure that in BSAN440-MovieDB.db you have a table with the 0-1 vectors for each genre. If you do not have that then re-run the script for your Assignment 2 to create the table. If you still have problems talk to me early and I will help you. II. (50 points) Complete the code in file Assignment6.py. Read the comments in the file carefully. III. Submit: 1) The .py file with all the code. 2) The .csv file that your code will create. The instructions on how to create it are in the .py file. There’s code in the in-class exercise that you can reuse. Notes: The entire assignment can be completed with 10 lines of Python code. Doing this is not a requirement. However, if you feel like you wrote a ton of code and things are not working out stop adding more, go back and edit. Use the in-class code for similarity measures as a reference. There is a lot there that you can re- use.
Oct 26, 2021
SOLUTION.PDF

Get Answer To This Question

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here