{ "cells": [ { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "c256a997490a91f0708490a794d55a2b", "grade": false,...

1 answer below »
Assignment must be done in Jupyter Notebook format.


{ "cells": [ { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "c256a997490a91f0708490a794d55a2b", "grade": false, "grade_id": "cell-76038a88b4d3af8c", "locked": true, "schema_version": 3, "solution": false } }, "source": [ "## Unit 9 Assignment - W200 Introduction to Data Science Programming, UC Berkeley MIDS\n", "\n", "Write code in this Jupyter Notebook to solve the following problems. Please upload this **Notebook** with your solutions to your GitHub repository in your SUBMISSIONS/week_10 folder by 11:59PM PST the night before class. Do not upload the data files or the answer .csv (we want your notebook to make the answers when we run it)" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "c23f7fc9d9970d3da7cc114e7d58d8de", "grade": false, "grade_id": "cell-ae5d30c3c3f47580", "locked": true, "schema_version": 3, "solution": false } }, "source": [ "This homework assignment is assigned during Week 10 but corresponds to the Unit #9 async." ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "6b52c4ac0395c1e0a459473b439e2153", "grade": false, "grade_id": "cell-5977c1d8f1d55d67", "locked": true, "schema_version": 3, "solution": false } }, "source": [ "## Objectives\n", "\n", "- Demonstrate how to import different data files\n", "- Get a small glimpse on how messy data can be\n", "- Design and implement an algorithm to standardize the information and fix the messiness\n", "- Work with Python data structures to sort and output the correct information\n", "- Demonstrate how to export required information to a .csv file" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "2c14663ff2d7819ac8f8abe12c58dd5e", "grade": false, "grade_id": "cell-f3df226b1112e4f1", "locked": true, "schema_version": 3, "solution": false } }, "source": [ "## Reading and Writing Data (25 Points)\n", "\n", "In this assignment, you will be reading and writing data. Yes, finally some data science (or at least some exploratory data analysis)! In the week_10 assignment folder, there are three data files named: \n", "\n", "* data.csv\n", "* data.json\n", "* data.pkl\n", "\n", "These are three common file formats. You can run the following **on the bash command line** to see what is in each file (this will not work from a Windows prompt but will work in git bash):\n", "\n", "```sh\n", "head data.csv\n", "head data.pkl\n", "head data.json\n", "```\n", "\n", "You'll see that there is some method to the madness but that each file format has its peculiarities. Each file contains a portion of the total dataset that altogether comprises 100 records, so you need to **read in all of the files and combine them into some standard format** with which you are comfortable. Aim for something standard where each \"row\" is represented in the same format. **Name this object that contains the data for all three files combined ```full_data```**\n", "\n", "### Questions to answer (75 points: each question is worth 15 points):\n", "After you've standardized all of the data, report the following information: \n", "\n", "1. What are the unique countries in the dataset, sorted alphabetically? Write to a new file called question_1.csv.\n", "2. What are the unique complete email domains in the dataset, sorted alphabetically? Write to a new file called question_2.csv. \n", "3. What are the first names of everyone (including duplicates) that do not have a P.O. Box address, sorted alphabetically? Write to a new file called question_3.csv.\n", "4. What are the full names of the first 5 people when you sort the data alphabetically by country? Write to a new file called question_4.csv.\n", "5. What are the full names of the first 5 people when you sort the data numerically ascending by phone number? Write to a new file called question_5.csv.\n", "\n", "We will be using a script to examine and grade your .csv files so please make sure: \n", "- The answers are all in one **column** with one list item per cell, sorted as stated in the question. I.e., looking at the .csv in a spreadsheet editor like Google Sheets, all answers would be in the 'A' column, with the first entry in A1, the second in A2, etc.\n", "- Please do not include a header; just the answers to the questions.\n", "- It is strongly recommended that you open each .csv file to ensure the answers are there and displayed correctly! \n", "- Don't include quotes around the list items. I.e., strip the leading and trailing quotes, if necessary, from items when you write to the .csv files. For example, a list entry should look like ```Spain``` rather than ```\"Spain\"```. One exception: Some country names do contain commas and it is ok to have quotes: ```\"\"``` around just those country names so that they will be in one cell in the .csv. \n", "\n", "\n", "In addition, show all of your work in this **Jupyter notebook**." ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "635f227045350bca94591906e3873ad2", "grade": false, "grade_id": "cell-ad4b864c26503a51", "locked": true, "schema_version": 3, "solution": false } }, "source": [ "### Assumptions\n", "\n", "- You might have to make decisions about the data. For example, what to do with ties or how to sort the phone numbers numerically. \n", "- Write your assumptions in this Jupyter notebook at the top of your code under the heading below that says ASSUMPTIONS\n", "- Please do some research before making an assumption (e.g. what is a domain name?); put your notes inside that assumption so we can understand your thought process. \n", " - NOTE: If you don't know what an email domain is - do some research and write what you found in your assumptions; there is a correct answer to this question! \n", "- This is a good habit to do as you analyze data so that you can remember why you made the decisions you did and other people can follow your analysis later!" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "57450666cda1cb410247d946aa6801fe", "grade": false, "grade_id": "cell-ac3d57f37fc71750", "locked": true, "schema_version": 3, "solution": false } }, "source": [ "### Restrictions\n", "You should use these standard library imports:\n", "\n", "```python\n", "import json\n", "import csv\n", "import pickle\n", "```\n", "\n", "Some of you may be familiar with a Python package called `pandas` which would greatly speed up this sort of file processing. The point of this homework is to do the work manually. You can use `pandas` to independently check your work if you are so inclined but do not use `pandas` as the sole solution method. Don't worry if you are not familiar with `pandas`. We will do this homework as a class exercise using `pandas` in the near future." ] }, { "cell_type": "markdown", "metadata": { "deletable": false,
Answered 2 days AfterOct 30, 2021

Answer To: { "cells": [ { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false,...

Dinesh answered on Nov 01 2021
110 Votes
complete_data.csv
,Name,Phone,Address,City,Country,Email
0,Hillary Benton,1-243-669-7472,144-1225 In Road,Navsari,Togo,[email protected]
1,Morgan Y. Little,155-3483,Ap #909-6656 Ac St.,Kitimat,Nauru,[email protected]
2,Camden Z. Blair,123-5058,"P.O. Box 441, 6183 Ligula St.",Casanova Elvo,"Palestine, State of",[email protected]
3,Alexandra E. Saunders,1-637-740-7614,305-496 Morbi Rd.,Biggleswade,Malawi,[email protected]
4,Hanae P. Walsh,901-2461,7058 Dapibus St.,Dhuy,Qatar,[email protected]
5,Jescie Sargent,265-1176,421-5501 Cursus. St.,Tulsa,Holy See (Vatican City State),[email protected]
6,Kessie Morgan,945-0713,Ap #481-6631 Vehicula Rd.,Pedro Aguirre Cerda,"Bonaire, Sint Eustatius and Saba",[email protected]
7,Bevis M. Santos,227-9994,"P.O. Box 575, 4033 Mi St.",Saint-Vincent,Kuwait,[email protected]
8,Flynn Alston,398-8097,"Ap #763-5990 Nec, Av.",Tirúa,Romania,[email protected]
9,Charles F. Crawford,791-5111,Ap #841-1623 Vitae Avenue,Hindupur,South Georgia and The South Sandwich Islands,[email protected]
10,Cairo Wolfe,1-930-942-2322,9269 Libero Ave,Whitchurch,Lesotho,[email protected]
11,Elijah Myers,1-238-336-4864,"P.O. Box 677, 2311 Aliquet. Road",Port Harcourt,Kyrgyzstan,[email protected]
12,Thane Burch,1-894-978-3696,"7438 Amet, Rd.",Algeciras,Anguilla,[email protected]
13,Katelyn Munoz,220-5054,"P.O. Box 432, 9085 Nulla Ave",Requínoa,Congo (Brazzaville),[email protected]
14,Genevieve Holland,992-6968,1768 Magna. Road,Moose Jaw,Uruguay,[email protected]
15,Wesley Z. Sharp,1-960-740-2261,"P.O. Box 497, 8354 Habitant St.",Bear,Cayman Islands,[email protected]
16,Tatyana H. French,1-120-782-6047,217-9163 Lobortis Road,Salles,Eritrea,[email protected]
17,Meredith F. Clayton,425-7583,Ap #929-9420 Vivamus Rd.,Friedberg,Czech Republic,[email protected]
18,Rajah Carrillo,1-576-789-5730,910-8300 Varius Rd.,Bertiolo,Afghanistan,[email protected]
19,Gabriel Richmond,1-
387-932-2096,7458 Sapien. St.,Tropea,Cambodia,[email protected]
20,Paul Merrill,1-313-739-3854,916-8087 Vehicula Rd.,Le Mans,Somalia,[email protected]
21,Brynne S. Barr,939-4818,878-2231 Suspendisse Rd.,Wilhelmshaven,Samoa,[email protected]
22,Cyrus Buckley,266-3123,"P.O. Box 572, 7680 Ullamcorper Ave",Sangli,Taiwan,[email protected]
23,Chloe Burnett,828-0406,563-4105 Donec Avenue,Wabamun,Morocco,[email protected]
24,Zachery Wilcox,1-611-756-4723,462-2112 In Rd.,Barddhaman,Hong Kong,[email protected]
25,Casey Mcgowan,1-155-558-4461,420-7327 Facilisis Street,Pfungstadt,Iran,[email protected]
26,Cole X. Hopper,1-328-505-0545,561-7476 Eget St.,Saint John,Macao,[email protected]
27,Tara Bender,1-757-378-4079,1247 Nonummy Rd.,Avellino,Dominica,[email protected]
28,Malik Grimes,793-4359,Ap #603-3303 Libero. St.,Winnipeg,Congo (Brazzaville),[email protected]
29,Ulla Russo,662-7778,"P.O. Box 975, 4593 Ante. Street",Vitória da Conquista,Slovakia,[email protected]
30,Colby Moran,1-788-230-1991,3696 Augue Ave,Hualpén,France,[email protected]
31,Maggy Wooten,912-7242,"P.O. Box 365, 6109 Metus. Rd.",Kapuskasing,Indonesia,[email protected]
32,Cameron Guthrie,988-2217,Ap #861-8699 Non Ave,Pontypridd,Turks and Caicos Islands,[email protected]
33,Gail Villarreal,1-405-823-4207,371-7266 Tortor Avenue,Saint-Remy-Geest,Marshall Islands,[email protected]
34,Harding Salinas,1-505-843-5401,4167 Nunc Ave,Arsimont,Montserrat,[email protected]
35,Idona W. Bonner,283-6921,Ap #302-2966 Cum Av.,Nieuwenrode,Faroe Islands,[email protected]
36,Warren Castillo,1-250-875-9104,Ap #275-2917 Curabitur Rd.,La Baie,Ireland,[email protected]
37,Clayton Harmon,1-609-380-9257,6930 Duis Road,College,United States,[email protected]
38,Alana Vasquez,1-853-288-4269,1511 Lobortis Ave,Richmond Hill,Israel,[email protected]
39,Mason R. Trujillo,172-5777,Ap #711-213 Sagittis Avenue,Quinta Normal,Sudan,[email protected]
40,Garrison Lindsey,420-1477,"P.O. Box 466, 7919 In Av.",Dunbar,Zambia,[email protected]
41,Jenna Mercado,102-2189,"P.O. Box 484, 9648 Sit Avenue",Pollena Trocchia,Burkina Faso,[email protected]
42,Drake Savage,1-790-105-7695,"P.O. Box 254, 2688 Luctus, Street",Hastings,Tunisia,[email protected]
43,Rana Z. Colon,486-7539,Ap #682-9992 Neque Rd.,Gespeg,Canada,[email protected]
44,Melodie Knox,1-479-861-6093,245-8811 Ut St.,Whitehorse,Norway,[email protected]
45,Cooper T. Horton,768-1000,"P.O. Box 383, 139 A Ave",Fernie,Israel,[email protected]
46,Eaton Nelson,746-8562,7989 Magna Rd.,Ludlow,Cocos (Keeling) Islands,[email protected]
47,Lucian W. Lynn,1-392-783-0634,7312 Tristique St.,Tirrases,Western Sahara,[email protected]
48,Sydney Anderson,1-610-717-0447,"P.O. Box 720, 9179 Fermentum Street",Ingolstadt,Saint Vincent and The Grenadines,[email protected]
49,Jane Joyner,1-131-574-3183,200-5702 Mollis St.,HavrŽ,Austria,[email protected]
50,Yen P. Browning,473-1433,Ap #221-1593 Fringilla St.,Gentbrugge,Isle of Man,[email protected]
51,Katell Simmons,1-647-852-3590,"P.O. Box 133, 5382 Enim Ave",Rionero in Vulture,Hong Kong,[email protected]
52,Freya B. Fischer,514-9914,Ap #869-5869 Neque Avenue,Pudahuel,Rwanda,[email protected]
53,Rama W. Mack,1-849-217-6292,2992 Vitae Rd.,Moricone,Jamaica,[email protected]
54,Lawrence Z. Carrillo,352-3711,6427 Eros Avenue,Northumberland,Latvia,[email protected]
55,Quyn Serrano,1-450-807-5530,"P.O. Box 133, 6862 Diam Road",Lagonegro,Ireland,[email protected]
56,Indira L. Mccormick,1-330-764-3846,"P.O. Box 679, 7373 Mollis Ave",Rampur,Saint Pierre and Miquelon,[email protected]
57,Rina W. Harris,760-1654,"P.O. Box 642, 2289 Volutpat. Street",Dolceacqua,Ghana,[email protected]
58,Cherokee George,1-722-165-1370,221-3908 Pellentesque Av.,Ourense,Ukraine,[email protected]
59,Michael Riddle,476-0145,581-1223 Aliquam Rd.,Logan City,Guatemala,[email protected]
60,Kay Rice,477-5481,"2398 Lectus, Road",Rutten,Isle of Man,[email protected]
61,Arden Leonard,383-6541,1274 Nullam St.,Esslingen,Italy,[email protected]
62,Chantale Sharpe,1-600-834-9076,1229 Nisl. Av.,Windsor,Mauritius,[email protected]
63,Calvin Herman,1-461-665-6848,263-4846 Sed St.,Castel Maggiore,Romania,[email protected]
64,Walter R. Gaines,370-5831,3247 Parturient Ave,Kitchener,Niger,[email protected]
65,Berk Finley,1-765-752-4793,6138 Faucibus Ave,Cavallino,Chile,[email protected]
66,Timothy Chambers,819-2872,865-2066 Vel Rd.,Dordrecht,Egypt,[email protected]
67,Ariana M. Olson,447-5000,"173-4952 Pede, Avenue",Zwevegem,Dominican Republic,[email protected]
68,Mason E. Kelly,1-896-767-7525,593 Turpis. Av.,Fraser-Fort George,Uganda,[email protected]
69,Keane Stein,457-2683,567-6664 Egestas St.,Burgos,Turks and Caicos Islands,[email protected]
70,Ginger Morse,1-228-310-1687,"P.O. Box 618, 8055 Integer St.",Bosa,Anguilla,[email protected]
71,Maggy Cotton,1-541-405-3049,7304 Euismod Avenue,Sommariva Perno,Kyrgyzstan,[email protected]
72,Talon R. May,143-7688,890-4439 Varius. Avenue,Richmond,Turkmenistan,[email protected]
73,Devin L. Boone,1-132-242-8605,1488 Dignissim Ave,Teruel,Switzerland,[email protected]
74,Orli E. Baxter,371-7491,1506 Egestas Rd.,Piła,Croatia,[email protected]
75,Wing Velazquez,354-5776,859-3576 Tincidunt Street,Oostkamp,Mayotte,[email protected]
76,Inez Simon,461-0691,Ap #788-4701 Aliquet St.,Chaitén,Antigua and Barbuda,[email protected]
77,Kyle Leonard,179-3944,685-2553 Ultrices Avenue,Placilla,Botswana,[email protected]
78,Selma Christensen,978-6407,Ap #595-189 Malesuada Road,Mira Bhayandar,Montserrat,[email protected]
79,Gwendolyn Crosby,692-9172,997 Posuere Rd.,San Miguel,Albania,[email protected]
80,Gary Alvarez,1-692-738-4449,Ap #890-8397 Euismod Ave,Westende,Dominican Republic,[email protected]
81,Knox L. Cash,535-9704,"P.O. Box 469, 4278 Condimentum Rd.",Gönen,American Samoa,[email protected]
82,Drake P. Guerrero,250-6382,6247 Aliquet Av.,Caerphilly,Guyana,[email protected]
83,Blossom Chandler,142-2607,7121 Diam. Rd.,Illapel,Czech Republic,[email protected]
84,Joan O. Ingram,1-889-203-6592,8198 Curae; Av.,Baton Rouge,Wallis and Futuna,[email protected]
85,Buffy R. Austin,413-3678,203-4493 A Road,Herentals,Iraq,[email protected]
86,Yoko M. Mcgowan,1-731-637-5890,"P.O. Box 881, 7563 Nisl. Av.",Bihain,Guinea,[email protected]
87,Walker Q. Wolfe,1-240-595-6907,482-4531 Mauris Rd.,Schepdaal,Moldova,[email protected]
88,Blake Cross,979-7498,Ap #678-3509 Nascetur Av.,Albany,Sint Maarten,[email protected]
89,Naida Guthrie,1-138-699-9182,"P.O. Box 656, 5397 Gravida. Ave",Tulita,Sudan,[email protected]
90,Yardley Singleton,945-1641,Ap #815-7648 Non Rd.,Duffel,Botswana,[email protected]
91,Lenore M. Boyer,513-0044,Ap #836-7039 Lorem Avenue,Attimis,Venezuela,[email protected]
92,Edan Cortez,1-223-433-5209,"159-6608 Eu, St.",High Level,Algeria,[email protected]
93,Quintessa T. Martinez,1-672-341-8336,8916 Pede St.,Rothesay,Tajikistan,[email protected]
94,Reuben Skinner,1-790-135-9618,Ap #810-9918 Enim Street,Essex,Bahamas,[email protected]
95,Yoshio Leblanc,1-508-613-2127,177-9263 Vitae Ave,Redwater,Bahamas,[email protected]
96,Rebecca French,397-3408,870-7596 Eros Rd.,Pocatello,Northern Mariana Islands,[email protected]
97,Shana K. Kerr,354-7392,Ap #215-8596 Cursus Ave,Toronto,Estonia,[email protected]
98,Gemma Leonard,175-7956,"P.O. Box 634, 4298 Elit. Rd.",Cuccaro Vetere,Saudi Arabia,[email protected]
99,Adara Estrada,1-893-111-1453,624-5679 Nulla. Rd.,Gijzelbrechtegem,Kyrgyzstan,[email protected]
data.csv
,Name,Phone,Address,City,Country,Email
0,Hillary Benton,1-243-669-7472,144-1225 In Road,Navsari,Togo,[email protected]
1,Morgan Y. Little,155-3483,Ap #909-6656 Ac St.,Kitimat,Nauru,[email protected]
2,Camden Z. Blair,123-5058,"P.O. Box 441, 6183 Ligula St.",Casanova Elvo,"Palestine, State of",[email protected]
3,Alexandra E. Saunders,1-637-740-7614,305-496 Morbi Rd.,Biggleswade,Malawi,[email protected]
4,Hanae P. Walsh,901-2461,7058 Dapibus St.,Dhuy,Qatar,[email protected]
5,Jescie Sargent,265-1176,421-5501 Cursus. St.,Tulsa,Holy See (Vatican City State),[email protected]
6,Kessie Morgan,945-0713,Ap #481-6631 Vehicula Rd.,Pedro Aguirre Cerda,"Bonaire, Sint Eustatius and Saba",[email protected]
7,Bevis M. Santos,227-9994,"P.O. Box 575, 4033 Mi St.",Saint-Vincent,Kuwait,[email protected]
8,Flynn Alston,398-8097,"Ap #763-5990 Nec, Av.",Tirúa,Romania,[email protected]
9,Charles F. Crawford,791-5111,Ap #841-1623 Vitae Avenue,Hindupur,South Georgia and The South Sandwich Islands,[email protected]
10,Cairo Wolfe,1-930-942-2322,9269 Libero Ave,Whitchurch,Lesotho,[email protected]
11,Elijah Myers,1-238-336-4864,"P.O. Box 677, 2311 Aliquet. Road",Port Harcourt,Kyrgyzstan,[email protected]
12,Thane Burch,1-894-978-3696,"7438 Amet, Rd.",Algeciras,Anguilla,[email protected]
13,Katelyn Munoz,220-5054,"P.O. Box 432, 9085 Nulla Ave",Requínoa,Congo (Brazzaville),[email protected]
14,Genevieve Holland,992-6968,1768 Magna. Road,Moose Jaw,Uruguay,[email protected]
15,Wesley Z. Sharp,1-960-740-2261,"P.O. Box 497, 8354 Habitant St.",Bear,Cayman Islands,[email protected]
16,Tatyana H. French,1-120-782-6047,217-9163 Lobortis Road,Salles,Eritrea,[email protected]
17,Meredith F. Clayton,425-7583,Ap #929-9420 Vivamus Rd.,Friedberg,Czech Republic,[email protected]
18,Rajah Carrillo,1-576-789-5730,910-8300 Varius Rd.,Bertiolo,Afghanistan,[email protected]
19,Gabriel Richmond,1-387-932-2096,7458 Sapien. St.,Tropea,Cambodia,[email protected]
data.json
{"Name":{"20":"Paul Merrill","21":"Brynne S. Barr","22":"Cyrus Buckley","23":"Chloe Burnett","24":"Zachery Wilcox","25":"Casey Mcgowan","26":"Cole X. Hopper","27":"Tara Bender","28":"Malik Grimes","29":"Ulla Russo","30":"Colby Moran","31":"Maggy Wooten","32":"Cameron Guthrie","33":"Gail Villarreal","34":"Harding Salinas","35":"Idona W. Bonner","36":"Warren Castillo","37":"Clayton Harmon","38":"Alana Vasquez","39":"Mason R. Trujillo"},"Phone":{"20":"1-313-739-3854","21":"939-4818","22":"266-3123","23":"828-0406","24":"1-611-756-4723","25":"1-155-558-4461","26":"1-328-505-0545","27":"1-757-378-4079","28":"793-4359","29":"662-7778","30":"1-788-230-1991","31":"912-7242","32":"988-2217","33":"1-405-823-4207","34":"1-505-843-5401","35":"283-6921","36":"1-250-875-9104","37":"1-609-380-9257","38":"1-853-288-4269","39":"172-5777"},"Address":{"20":"916-8087 Vehicula Rd.","21":"878-2231 Suspendisse Rd.","22":"P.O. Box 572, 7680 Ullamcorper Ave","23":"563-4105 Donec Avenue","24":"462-2112 In Rd.","25":"420-7327 Facilisis Street","26":"561-7476 Eget St.","27":"1247 Nonummy Rd.","28":"Ap #603-3303 Libero. St.","29":"P.O. Box 975, 4593 Ante. Street","30":"3696 Augue Ave","31":"P.O. Box 365, 6109 Metus. Rd.","32":"Ap #861-8699 Non Ave","33":"371-7266 Tortor Avenue","34":"4167 Nunc Ave","35":"Ap #302-2966 Cum Av.","36":"Ap #275-2917 Curabitur Rd.","37":"6930 Duis Road","38":"1511 Lobortis Ave","39":"Ap #711-213 Sagittis Avenue"},"City":{"20":"Le Mans","21":"Wilhelmshaven","22":"Sangli","23":"Wabamun","24":"Barddhaman","25":"Pfungstadt","26":"Saint John","27":"Avellino","28":"Winnipeg","29":"Vit\u00f3ria da Conquista","30":"Hualp\u00e9n","31":"Kapuskasing","32":"Pontypridd","33":"Saint-Remy-Geest","34":"Arsimont","35":"Nieuwenrode","36":"La Baie","37":"College","38":"Richmond Hill","39":"Quinta Normal"},"Country":{"20":"Somalia","21":"Samoa","22":"Taiwan","23":"Morocco","24":"Hong Kong","25":"Iran","26":"Macao","27":"Dominica","28":"Congo (Brazzaville)","29":"Slovakia","30":"France","31":"Indonesia","32":"Turks and Caicos Islands","33":"Marshall Islands","34":"Montserrat","35":"Faroe Islands","36":"Ireland","37":"United States","38":"Israel","39":"Sudan"},"Email":{"20":"[email protected]","21":"[email protected]","22":"[email protected]","23":"[email protected]","24":"[email protected]","25":"[email protected]","26":"[email protected]","27":"[email protected]","28":"[email protected]","29":"[email protected]","30":"[email protected]","31":"[email protected]","32":"[email protected]","33":"[email protected]","34":"[email protected]","35":"[email protected]","36":"[email protected]","37":"[email protected]","38":"[email protected]","39":"[email protected]"}}
data.pkl
docs/hwunit09-gde3mp5g-mzowjagy.ipynb
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"deletable": false,
"editable": false,
"nbgrader": {
"cell_type": "markdown",
"checksum": "c256a997490a91f0708490a794d55a2b",
"grade": false,
"grade_id": "cell-76038a88b4d3af8c",
"locked": true,
"schema_version": 3,
"solution": false
}
},
"source": [
"## Unit 9 Assignment - W200 Introduction to Data Science Programming, UC Berkeley MIDS\n",
"\n",
"Write code in this Jupyter Notebook to solve the following problems. Please upload this **Notebook** with your solutions to your GitHub repository in your SUBMISSIONS/week_10 folder by 11:59PM PST the night before class. Do not upload the data files or the answer .csv (we want your notebook to make the answers when we run it)"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": false,
"editable": false,
"nbgrader": {
"cell_type": "markdown",
"checksum": "c23f7fc9d9970d3da7cc114e7d58d8de",
"grade": false,
"grade_id": "cell-ae5d30c3c3f47580",
"locked": true,
"schema_version": 3,
"solution": false
}
},
"source": [
"This homework assignment is assigned during Week 10 but corresponds to the Unit #9 async."
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": false,
"editable": false,
"nbgrader": {
"cell_type": "markdown",
"checksum": "6b52c4ac0395c1e0a459473b439e2153",
"grade": false,
"grade_id": "cell-5977c1d8f1d55d67",
"locked": true,
"schema_version": 3,
"solution": false
}
},
"source": [
"## Objectives\n",
"\n",
"- Demonstrate how to import different data files\n",
"- Get a small glimpse on how messy data can be\n",
"- Design and implement an algorithm to standardize the information and fix the messiness\n",
"- Work with Python data structures to sort and output the correct information\n",
"- Demonstrate how to export required information to a .csv file"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": false,
"editable": false,
"nbgrader": {
"cell_type": "markdown",
"checksum": "2c14663ff2d7819ac8f8abe12c58dd5e",
"grade": false,
"grade_id": "cell-f3df226b1112e4f1",
"locked": true,
"schema_version": 3,
"solution": false
}
},
"source": [
"## Reading and Writing Data (25 Points)\n",
"\n",
"In this assignment, you will be reading and writing data. Yes, finally some data science (or at least some exploratory data analysis)! In the week_10 assignment folder, there are three data files named: \n",
"\n",
"* data.csv\n",
"* data.json\n",
"* data.pkl\n",
"\n",
"These are three common file formats. You can run the following **on the bash command line** to see what is in each file (this will not work from a Windows prompt but will work in git bash):\n",
"\n",
"```sh\n",
"head data.csv\n",
"head data.pkl\n",
"head data.json\n",
"```\n",
"\n",
"You'll see that there is some method to the madness but that each file format has its peculiarities. Each file contains a portion of the total dataset that altogether comprises 100 records, so you need to **read in all of the files and combine them into some standard format** with which you are comfortable. Aim for something standard where each \"row\" is represented in the same format. **Name this object that contains the data for all three files combined ```full_data```**\n",
"\n",
"### Questions to answer (75 points: each question is worth 15 points):\n",
"After you've standardized all of the data, report the following information: \n",
"\n",
"1. What are the unique countries in the dataset, sorted alphabetically? Write to a new file called question_1.csv.\n",
"2. What are the unique complete email domains in the dataset, sorted alphabetically? Write to a new file called question_2.csv. \n",
"3. What are the first names of everyone (including duplicates) that do not have a P.O. Box address, sorted alphabetically? Write to a new file called question_3.csv.\n",
"4. What are the full names of the first 5 people when you sort the data alphabetically by country? Write to a new file called question_4.csv.\n",
"5. What are the full names of the first 5 people when you sort the data numerically ascending by phone number? Write to a new file called question_5.csv.\n",
"\n",
"We will be using a script to examine and grade your .csv files so please make sure: \n",
"- The answers are all in one **column** with one list item per cell, sorted as stated in the question. I.e., looking at the .csv in a spreadsheet editor like Google Sheets, all answers would be in the 'A' column, with the first entry in A1, the second in A2, etc.\n",
"- Please do not include a header; just the answers to the questions.\n",
"- It is strongly recommended that you open each .csv file to ensure the answers are there and displayed correctly! \n",
"- Don't include quotes around the list items. I.e., strip the leading and trailing quotes, if necessary, from items when you write to the .csv files. For example, a list entry should look like ```Spain``` rather than ```\"Spain\"```. One exception: Some country names do contain commas and it is ok to have quotes: ```\"\"``` around just those country names so that they will be in one cell in the .csv. \n",
...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here