if you have any doubts pls let me know, im attaching the format of the report as well1 2 Title of...

Question

if you have any doubts pls let me know, im attaching the format of the report as well

1 2 Title of your report Your name, your id Course name, Assignment … VIT address Supervisor/Lecturer name Abstract A very brief summary of 5-10 lines, what analytics you are doing here, steps and so on. I. Introduction O verviw of project you are doing. This is kind of introduction and literature review of the report. When someone read only this part, should understand what you are going to do in next section and the motivation and why someone spends time to read your report. You should also talk about the data and steps in brief. I expect a page of writing for summary. You can do some literature review from internet and use them here. I expect at least the following sections after this: 1) Discussion, 2) Conclusion, 3) Reference and 4) Appendix for your code. The below is only a template. Remove the text and type your own discussion and graphs. In reference section, you might have references. II. Discussion and Analytics Your discussion and your graphs all should be here. Graphs need to be labeled, e.g. Fig 1, and discussed inside the context. Put label below the figure and discuss them inside the text. At least 4-6 plots. Plots should be the results of your python code. III. Conclusion You need to conclude your results and give recommendation. Recommendations and so on. Acknowledgment If someone helped you in preparing and implementing python code, you should acknowledge him/her here.. References [1] Reference 1. [2] Reference 2 [3] Reference 3. [4] Reference 4. [5] Reference 5. Appendix: Python Implementation Copy or type your python code here. Make sure it is enough clear to follow. When I implement your code, I should get the same results. Student Guidelines Assessment 1 Research Study & Presentation Due: 22 December 2019 - 11:59 pm Total Weightage: 20% Individual assignment Python is one of the most frequently used programming languages in many fields, particularly in data science. It is also one of the best data science tools for the big data job. Task The assignment has two phases: 1) writing a report and 2) presentation of findings using Python codes. 1. Report (Weightage: 10%) Choose data: Choose a data from Kaggle website, https://www.kaggle.com/datasets , or a government open source data. You can also use Twitter data, which you can download using Python Tweepy package. Analytics: Find out what you can do with that data or what kind of decision making you can do with it. First (Step 1), do an exploratory data analysis on the data that you have gathered. Exploratory data analysis is an approach for analysing data sets to summarize their main characteristics, often with visual methods. Then (Step 2), Build a machine learning model on top of your data and make necessary recommendations. Python implementation: To be consistent with all students, implementation must be done in google Colab: https://colab.research.google.com/notebooks/welcome.ipynb Colab is a free notebook environment that requires no setup and runs entirely in the cloud. You need to login to google Colab and write your Python code for analysing the data. Add your google Colab account showing your name on it into your report, by clicking orange button on top-right corner and taking screenshot. Your report should have 1500-2000 words addressing the following: information on the data and why it is important, literature review on the data and methodology you are going to work, what you are going to solve and how, plots and recommendations. The report should have at least 4-6 plots (screenshots) from your findings with explanations. 2. Presentation (Weightage: 10%) The presentation should be a maximum of 10 minutes. It must cover the research report, research findings and visualisation and step by step discussion on how you’ve done this project. https://www.kaggle.com/datasets https://www.kaggle.com/datasets https://colab.research.google.com/notebooks/welcome.ipynb https://colab.research.google.com/notebooks/welcome.ipynb Submission Guidelines All submissions are to be submitted through turn-it-in. Drop-boxes linked to turn-it-in will be set up in the Unit of Study Moodle account. Assignments not submitted through these drop-boxes will not be considered. Submissions must be made by the due date and time (which will be in the session detailed above) and determined by your Unit coordinator. Submissions made after the due date and time will be penalized at the rate of 10% per day (including weekend days). The turn-it-in similarity score will be used in determining the level if any of plagiarism. Turn-it-in will check conference web-sites, Journal articles, the Web and your own class member submissions for plagiarism. You can see your turn-it-in similarity score when you submit your assignment to the appropriate drop-box. If this is a concern you will have a chance to change your assignment and re-submit. However, re-submission is only allowed prior to the submission due date and time. After the due date and time have elapsed you cannot make re-submissions and you will have to live with the similarity score as there will be no chance for changing. Thus, plan early and submit early to take advantage of this feature. You can make multiple submissions, but please remember we only see the last submission, and the date and time you submitted will be taken from that submission. Your report should be a single word or pdf document containing your report. Your presentation file should have a standard video format and it should not exceed 200 MB. Slides and your face should be clear in the video file. You need to submit the presentation file (not link to your video) in the provided video submission link. Please do not submit the link for your video, which will not be considered for marking.

report-format-for-assignment-34yzfnje.doc assignment-1-big-data-feyvbyq5.pdf

Neha · Accepted Answer

1
2
Madrid Data Analysis
Your name, your id
Course name, Assignment … 
VIT address
Supervisor/Lecturer name
Abstract
This report is based on the analysis of Madrid hotel data. This dataset includes the data about the hotels present in the regions of the Madrid and their features. It consists of columns for the prices and room types of the hotel. Exploratory data analysis is used to analyze the data and all the possible outcomes of it. It is used to get the results in pictorial form which is easier to understand. Python is used to implement the EDA as it provides multiple libraries to store the data into an object and draw out the graphs and charts based on the conditions. 
I. Introduction
This report is all about data analysis done on the dataset of the Madrid hotels. Exploratory data analysis is an approach or can be called as a philosophy which is used to analyze and visualize the insights of a dataset. A dataset can be of thousands of rows which can’t be processed or analyzed by a human mind. EDA helps to analyze the data in a correct and attractive manner which reduces the human effort and gives the result according to the conditions. EDA helps to identify all the missing values of the dataset. It helps to analyze the relationship between various variables. Supervised learning is a type of machine learning which uses the predict labels. I used Python to analyze the dataset. Python provides a long list of the libraries which can be used to read the data from a file and plot the graph. We just need to include the library and we can use the functions related with those libraries. It is one of the easiest languages for the implementation of the EDA.
I chose a dataset from the Kaggle for this task. This dataset is for the analysis of the hotel stay of the guest to help new visitors to choose the hotel. This dataset contains multiple columns and more than thousands of rows which makes the analysis more real. 
Here are the columns of the dataset included in the analysis
· Id – unique id for each row
· Name- hotel name
· host_id – id of the hotel
· host_name – name of the hotel
· neighbourhood_group – neighborhood group name for the hotel
· Neighbourhood – neighborhood name for the hotel
· Latitude- latitude values of the hotel location
· Longitude – longitude location of the hotel
· room_type- type of the room available in the hotel
· Price – price of the room in the hotel
· minimum_nights – minimum nights guest has stayed in a room
· number_of_reviews – reviews submitted by the guest of a hotel
· last_review- last review for a hotel
· reviews_per_month – reviews submitted in a particular month
· calculated_host_listings_count 
· availability_365- availability of the room in a hotel
II. Discussion and Analytics
Before implementing the EDA on any dataset, we should make sure that the data does not contain any error. The data should be free from any error so that the analyzed result is also not corrupted. The first step for the code is to include the pandas and NumPy in the code. These libraries are used to read the csv file from the system and store the dataset into an object. Python reads the data from the data frame instead of the dataset. All the columns can be retrieved from the data frame by simply calling the .column method. This method will return the name of all the columns. We can use .shape to get the shape of the data, .head to get few rows from the top of the dataset, .describe to get the features of the data and there are many other functions which help us to understand the data in the best possible way. To analyze a dataset, we should understand it first. The understanding of the dataset helps to collect the questions. These questions can be helpful to get the result from the datasets. 
To plot the graphs based on the conditions we need to include the seaborn and matplotlib libraries of the python. The matplotlib includes different types of graph formats and designs. It allows to plot customized graphs. 
The analysis which I made was about the price and neighborhood group column. For this graph I used the condition of greater than on the values of price column. The size of the graph is (16,6) and x axis is for neighbourhood_group and y axis represent the price which is greated than 3000. This graph tells us about the hotels where room is having price more than 3000.
Fig1.

1 2 Title of your report Your name, your id Course name, Assignment … VIT address Supervisor/Lecturer name Abstract A very brief summary of 5-10 lines, what analytics you are doing here, steps and so...

Answer To: 1 2 Title of your report Your name, your id Course name, Assignment … VIT address...

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment