8/31/2019 cse6242-2019fall-hw1 https://docs.google.com/document/d/e/2PACX-1vQnYxNVrapPgNRkVORtouoxDDzYyjwiWpLdV4puZp6oYEqkNGD8a2VdJsuemEWNBfNaIBrLyooCZIyz/pub 2/13 Download the HW1 Skeleton before you...

Looking for a quote


8/31/2019 cse6242-2019fall-hw1 https://docs.google.com/document/d/e/2PACX-1vQnYxNVrapPgNRkVORtouoxDDzYyjwiWpLdV4puZp6oYEqkNGD8a2VdJsuemEWNBfNaIBrLyooCZIyz/pub 2/13 Download the HW1 Skeleton before you begin.         Grading The maximum possible score for this homework is 100 points. Introduction In Questions 1, 2, and 3, you will perform data collection, exploration, and visualization of the extensive LEGO database available from ‘Rebrickable’. In the following tasks, you will build both current and historical domain knowledge about LEGO themes, sets, and parts. In Q1, we focus on collecting data using an API and then building a graph that shows the relationships between Sets and Parts. From this we can gain insights into what LEGO part is used the most frequently throughout various LEGO sets. In Q2, you will work directly with the data files to build a portion of the Rebrickable database locally using SQLite. Next, you will explore the hierarchy of LEGO themes / sub-themes, as well as the historical growth of LEGO sets over time. In Q3, you will visualize the growth of LEGO sets through the years. This will serve as an introduction to D3. Q4 focuses on cleaning and preparing data for visualization. Q1 [40 points] Collecting and visualizing Rebrickable Lego data Q1.1 [25 points] Collecting Rebrickable Lego Data You will use the “Rebrickable” API version 3 to: (1) download data about the Lego sets and (2) for each set, download the parts that comprise it. You will write code using Python 3.7.x in script.py in this question. You will need an API key to use the Rebrickable data. Your API key will be an input to script.py so that we can run your code with our own API key to check the results. Running the following command should generate a .gexf graph file specified in Q1.1.d. python3 script.py Please refer to this tutorial to learn how to parse command line arguments. DO NOT leave your API key written in the code. In general, it is good practice to not store any sensitive information like API keys and passwords as part of your code. Note: You may only use the modules and libraries provided at the top the script.py file included in the skeleton for Q1 and modules from the Python Standard Library. A module for creating a .gexf graph file is included in the skeleton and also imported at the top of the script.py file. Pandas and Numpy CANNOT be used --- while we understand that they are useful libraries to learn, completing this question is not critically https://www.google.com/url?q=https://poloclub.github.io/cse6242-2019fall-online/hw1/hw1-skeleton.zip&sa=D&ust=1567262878061000 https://www.google.com/url?q=https://rebrickable.com/&sa=D&ust=1567262878062000 https://www.google.com/url?q=https://www.tutorialspoint.com/python3/python_command_line_arguments.htm&sa=D&ust=1567262878065000 https://www.google.com/url?q=https://docs.python.org/3/library/&sa=D&ust=1567262878066000 https://www.google.com/url?q=https://pandas.pydata.org&sa=D&ust=1567262878067000 https://www.google.com/url?q=https://numpy.org&sa=D&ust=1567262878067000 8/31/2019 cse6242-2019fall-hw1 https://docs.google.com/document/d/e/2PACX-1vQnYxNVrapPgNRkVORtouoxDDzYyjwiWpLdV4puZp6oYEqkNGD8a2VdJsuemEWNBfNaIBrLyooCZIyz/pub 3/13 dependent on their functionality. In addition, to make grading more manageable and to enable our TAs to provide better, more consistent support to our students, we have decided to restrict the libraries accordingly. How to use the Rebrickable API ○ Create a Rebrickable account and generate an API Key. Refer to this document for detailed instructions. ○ Refer to the API documentation at https://rebrickable.com/api/ as you work on this question. Within the documentation you will find a helpful ‘try-it-out’ feature for interacting with the API calls. Note: The API allows you to make 1 request every second. Set appropriate timeout intervals in your code while making requests. We recommend you think about how much time your script will run for when solving this question, so you will complete it on time. You may be penalized for a runtime exceeding 10 minutes. a. [10 points] Using the Rebrickable API, retrieve the top LEGO sets that have the most parts. Since this is a live database, the results may vary as you implement your solution. Adjust the parameters of the API calls such that you retrieve at least 270 and no more than 300 sets. The sets should be ordered by the number of parts they contain. For each set, you will need its: ● set number ● set name Hints: ● Sorting on number of parts can be accomplished in the API call ● Adjust the min_parts parameter ● Set the page_size to be larger than the expected number of results to avoid pagination issues Complete the following functions (necessary for us to grade your work). ● min_parts() in script.py ● lego_sets() in script.py b. [5 points] Retrieving Parts for Lego Sets. For each set returned in part a, use the API to get a list of all inventory parts in that set. Since we are only interested in the parts that are used most frequently in a set, attempt to retrieve up to but no more than the top 20 parts for each set. For each part, you will need its: ● part color ● part quantity ● part name ● part number To address the fact that some parts for a set have the same part_num, construct a unique id by concatenating the part number and color. e.g., A part having a part_num = “3203” and a color “C9C9C9” would be concatenated into an id = “3203_C9C9C9”. You will use this part id as the node id when you add it to the graph in part c. Note: Not all sets have 20 different parts. It is allowable to have fewer than 20 parts for a set. Hint: Set the page_size parameter = 1000 when retrieving parts to avoid pagination issues. c. [10 points] Constructing a graph using the pygexf library (included in skeleton under Q1→ gexf/) . Use the pygexf module to construct a graph that can be imported into the Gephi Open Graph Viz Platform software. You can review details about the .gexf file format here. You may also review a simple and a more complex graph created using the module. Instantiate and construct a static, undirected graph as follows: Note: script.py includes an import statement for the pygexf library. d. Declare a string-valued node attribute titled ‘Type’. Add this attribute to each node in the graph. You will use node attributes to perform partitioning operations within Gephi. https://www.google.com/url?q=https://docs.google.com/document/d/e/2PACX-1vQkaZGXI6lvWDzrnuIRHv6SPrIxFiL_Y4-T0YZQlKnMGzlHOwcGt-LZqQoB-to6PxpDy0ZSr7GWzdUY/pub&sa=D&ust=1567262878068000 https://www.google.com/url?q=https://rebrickable.com/api/&sa=D&ust=1567262878068000 https://www.google.com/url?q=https://github.com/gephi/gexf/wiki/Basic-Concepts&sa=D&ust=1567262878074000 https://www.google.com/url?q=https://github.com/paulgirard/pygexf/blob/master/test/test.py&sa=D&ust=1567262878074000 https://www.google.com/url?q=https://github.com/paulgirard/pygexf/blob/master/test/gexf.net.dynamics_openintervals.gexf&sa=D&ust=1567262878074000 8/31/2019 cse6242-2019fall-hw1 https://docs.google.com/document/d/e/2PACX-1vQnYxNVrapPgNRkVORtouoxDDzYyjwiWpLdV4puZp6oYEqkNGD8a2VdJsuemEWNBfNaIBrLyooCZIyz/pub 4/13 ○ For nodes representing a [Lego] set, set the attribute value = “set” ○ For nodes representing a [Lego] part, set the attribute value = “part” ● Each set should be added as a node to the graph ○ node id = set number retrieved in part a ○ node label = set name retrieved in part a ○ node color is set using RGB values of ‘0’, ‘0’, ‘0’ ● Each part should be added as a node to the graph ○ node id = the part id you made by concatenating the part number and color in part b ○ node label = part name retrieved in part b ○ node color is set using the part color retrieved in part b. RGB values must be converted from the original hexadecimal representation in the data. ● Add an edge between each part and the set(s) it belongs to. ○ edge id = construct a unique id of your choosing. ○ edge source = set number retrieved in part a ○ edge target = the unique part id you constructed from the part number and color. retrieved in part b ○ edge weight = part quantity retrieved in part b Note: Ensure that you do not add the same node more than once to the graph. pygexf has some functions that you can use to check this. Use pygexf’s .write() command to output a file named bricks_graph.gexf Complete the following function (necessary for us to grade your work). ● gexf_graph() in script.py Note : Q1.2 builds on the results of Q1.1 Q1.2 [15 points] Visualizing a Lego Sets and Parts Graph Using Gephi version 0.9.2, visualize the network of the Lego sets and their most used-parts. You can download Gephi here. Ensure your system fulfills all requirements for running Gephi. a. Go through the Gephi quick-start guide. b. [2 points] Start Gephi and then use File→ Open to open bricks_graph.gexf. Within the import report dialogue window, ensure the graph type is set to ‘undirected’. Under ‘more options...’, ensure that these boxes are checked: ● ‘Auto-Scale’ ● ‘Create missing nodes’ ● ‘Self-loops’ ● Leave the Edges merge strategy selected to ‘Sum’. Ignore the GEXF version 1.2 deprecation warning. c. [8 points] Using the following guidelines, create a visually meaningful graph: ● Keep edge crossing to a minimum, and avoid as much node overlap as possible. ● Keep the graph compact and symmetric if possible. ● Whenever possible, show node labels. If showing all node labels create too much visual complexity, try showing those for the “important” nodes. We recommend that you first run Gephi’s built-in stat functions to gain more insight about a given node. ● Using nodes’ spatial positions to convey information (e.g., “clusters” or groups). Experiment with Gephi’s features, such as graph layouts, changing node size and label color, edge thickness, etc. The objective of this task is to familiarize yourself with Gephi; therefore this is a fairly open-ended task. It is not possible to create a “perfect” visualization for most graph datasets. The above guidelines are ones that generally help. However, like most design tasks, creating a visualization is about making selective design compromises. Some guidelines could create competing demands, and following all guidelines may not guarantee a “perfect” design. https://www.google.com/url?q=http://gephi.org&sa=D&ust=1567262878078000 https://www.google.com/url?q=https://gephi.org/users/requirements/&sa=D&ust=1567262878078000 https://www.google.com/url?q=https://gephi.org/users/quick-start/&sa=D&ust=1567262878079000 8/31/2019 cse6242-2019fall-hw1 https://docs.google.com/document/d/e/2PACX-1vQnYxNVrapPgNRkVORtouoxDDzYyjwiWpLdV4puZp6oYEqkNGD8a2VdJsuemEWNBfNaIBrLyooCZIyz/pub 5/13 Hint: Install more Layout plugins/algorithms through Tools → Plugins → Available Plugins. Check and install plugins in the ‘Layout’ category. d. [5 points] Using Gephi’s built-in functions, compute the following metrics for your graph: ● Average node degree (run the function called “Average Degree”) ● Diameter of the graph (run the function called “Network Diameter”) ● Average path length (run the function called “Avg. Path Length”) You will learn about these metrics in the “graphs” lectures. Complete the following functions for auto-grading purposes. ● avg_node_degree() in script.py ● graph_diameter() in script.py ● avg_path_length() in script.py Deliverables: Place all the files listed below in the Q1 folder. ● script.py: The Python 3.7 script you write that generates bricks_graph.gexf contains the 6 completed functions described in Q1.1
Aug 31, 2021
SOLUTION.PDF

Get Answer To This Question

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here