School of Computing and Information Systems The University of Melbourne COMP30027, Machine Learning, 2021 Project 2: How long will it take to cook this? Task: Build a classifier to predict the cooking...

Need just the coding part


School of Computing and Information Systems The University of Melbourne COMP30027, Machine Learning, 2021 Project 2: How long will it take to cook this? Task: Build a classifier to predict the cooking time of recipes Due: Stage I: Wednesday 19 May, 5pm UTC+10 (Australian Eastern Standard Time) Stage II: Wednesday 26 May, 5pm UTC+10 (Australian Eastern Standard Time) Stage I: Friday 21 May, 5pm UTC+10 (Australian Eastern Standard Time) Stage II: Friday 28 May, 5pm UTC+10 (Australian Eastern Standard Time) Submission: Stage I: Report (PDF) and code to Canvas; test outputs to Kaggle in-class competition Stage II: Peer reviews and reflection to Canvas Marks: The Project will be marked out of 20, and will contribute 20% of your total mark. Groups: Groups of 1 or 2, with commensurate expectations for each (see Sections 2 and 6). 1 Overview The goal of this Project is to build and critically analyse supervised Machine Learning methods, to predict the cooking time for recipes based on their steps, ingredients and other features. The cooking time of a recipe has been categorised into three classes, corresponding to quick, medium and slow. This assignment aims to reinforce the largely theoretical lecture concepts surrounding data representation, clas- sifier construction, and evaluation, by applying them to an open-ended problem. You will also have an oppor- tunity to practice your general problem-solving skills, written communication skills, and creativity. This project has two stages. The main focus of these stages will be the written report, where you will demon- strate the knowledge that you have gained and the critical analysis you have conducted in a manner that is accessible to a reasonably informed reader. 2 Deliverables More details about deliverables are given in the Submission (Section 6). Stage I: 1. Report: an anonymous written report, of 1000-1500 words (for a group of one person) or 2000-2500 words (for a group of two people) 2. Output: the output of your classifiers, comprising predictions of labels for the test instances, submitted to the Kaggle1 in-class competition described below. 3. Code: one or more programs, written in Python, which implement machine learning models, make predictions, and evaluate the results. Stage II: 1. Peer review: reviews of two reports written by other students, of 200-300 words each (for a group of one person) or 300-400 words each (for a group of two people). 2. Reflection: a written reflection piece of 400-600 words. This deliverable is individual work. 1https://www.kaggle.com/ 1 3 Terms of Use The data has been collected from Food.com (formerly GeniusKitchen), under the provision that any resulting work should cite this resource: Generating Personalized Recipes from Historical User Preferences. Bodhisattwa Prasad Majumder, Shuyang Li, Jianmo Ni, Julian McAuley, in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019. This reference must be cited in the bibliography. We reserve the right to mark any submission lacking this reference with a 0, due to violation of the Terms of Use. Please note that the dataset is a sample of actual data posted to the World Wide Web. As such, it may contain information that is in poor taste, or that could be considered offensive. We would ask you, as much as possible, to look beyond this to the task at hand. For example, it is generally not necessary to read individual records. The opinions expressed within the data are those of the anonymised authors, and in no way express the official views of the University of Melbourne or any of its employees; using the data in an educative capacity does not constitute endorsement of the content contained therein. If you object to these terms, please contact us ([email protected] or [email protected]) as soon as possible. 4 Data The data files are available via Canvas, and are described in a corresponding README. The recipes are collected from Food.com2, which is a platform that allows the user to publish recipes and comments on others’ recipes. In our dataset, each recipe contains: • recipe features: name, ingredients, steps, number of steps, and number of ingredients • text features: produced by various text encoding methods for name, ingredients, and steps. Each feature is provided as a single file with rows corresponding to the file of recipe features. • class label: the preparation time of a recipe duration (3 possible levels, 1, 2 or 3) You will be provided with training set and a test set. The training set contains the recipe features, text features, and the duration, which is the “class label” of our task. The test set only contains the recipe and text features without the label. The files provided are: • recipe train.csv: recipe features and class label of training instances. • recipe test.csv: recipe features of test instances. • recipe text features *.zip: preprocessed text features for training and test sets, 1 zipped file for each text encoding method. Details about using these text features are provided in README. 5 Task You are expected to develop Machine Learning models to predict the preparation of a recipe based on its features (e.g. name, ingredients, steps etc.). You will implement and compare different machine learning models and explore the effective features for this task. 2https://www.food.com/ 2 • The training-evaluation phase: The holdout or cross-validation approaches can be applied on the train- ing data provided. • The test phase: the trained classifiers will be evaluated on the unlabelled test data. The predicted labels of test cases should be submitted as part of the Stage I deliverable. Various machine learning techniques have been (or will be) discussed in this subject (0R, Naive Bayes, Decision Trees, kNN, SVM, neural network, etc.); many more exist. You may use any machine learning method you consider suitable for this problem. You are strongly encouraged to make use of machine learning software and/or existing libraries (such as sklearn) in your attempts at this project. In addition to different learning algorithms, there are many different ways to encode text for these algorithms. The files in recipe text features *.zip are some possible representations of the name, ingredients and steps of recipes we have provided. For example, one of the encoding method is CountVectorizer in sklearn, which converts text documents into “Bag of Words” – the documents are described by word occurrences while ignoring the relative position information of the words. You can use these representations to develop your classifiers, but you should also feel free to extract your own features from the raw recipe features, according to your needs. Just keep in mind that any data representation you use for the text in the training set will need to be able to generalise to the test set. 6 Submission The report, code, peer reviews and reflections should be submitted via Canvas; the predictions on test data should be submitted to Kaggle. Stage I submissions will be open one week before the due date. Stage II submissions will be open as soon as the reports are available (24 hours following the Stage I submission deadline). 6.1 Individual vs. Team Participation You have the option of participating as a “group” of one individual, or in a group of two. In the case that you opt to participate individually, you will be required to implement at least 1 and up to 4 distinct Machine Learning models. Groups of two will be required to implement at least 3 and up to 5 distinct Machine Learning models, of which one is to be an ensemble model – stacking based on the other models. The report length requirement also differs, as detailed below: Group size Distinct models required Report length 1 1–4 1,000–1,500 words 2 3–5 2,000–2,500 words If you wish to form a group of 2, only one of the members needs to register on Canvas by Wednesday 5 May, via the survey “Assignment 2 Group Registration”. For a group of 2, only one of the members needs to submit deliverables. Note that once you have signed up for a given group, you will not be allowed to change groups. If you do not register before the deadline above, we will assume that you will be completing the assignment as an individual, even if you were in a two-person group for Assignment 1. 6.2 Stage I: Report The report should be 1,000-1,500 words (groups of one person) or 2,000-2,500 words (groups of two people) in length and provide a basic description of: 1. the task, and a short summary of some related work 3 2. what you have done, including any learners that you have used, or features that you have engineered. This should be at a conceptual level; a detailed description of the code is not appropriate for the report. The description should be similar to what you would see in a machine learning conference paper. 3. evaluation of your classifiers. You should also aim to have a more detailed discussion, which: 4. Contextualises the behaviour of the method(s), in terms of the theoretical properties we have identified in the lectures 5. Attempts some error analysis of the method(s) And don’t forget: 6. A bibliography, which includes the paper listed in Terms of Use, and other related work Note that we are more interested in seeing evidence that you have thought about the task and investigated the reasons for the relative performance of different methods, rather than in the raw scores of the different methods. This is not to say that you should ignore the relative performance of different runs over the data, but rather that you should think beyond simple numbers to the reasons that underlie them, and connect these to the theory that we have discussed in this class. Reports must be submitted in the form of a single PDF file. If a report is submitted in any format other than PDF, we reserve the right to return the report with a mark of 0. To facilitate anonymous peer-review, your name and student ID should not appear anywhere in the report, including the metadata (filename, etc.). 6.3 Stage I: Predictions of test data To give you the possibility of evaluating your models on the test set, we will be setting up this project as a Kaggle in-class competition. You can submit results on the test set there, and get immediate feedback on your model’s performance. There is a Leaderboard, that will allow you to see how well you are doing as compared to other classmates participating on-line. The Kaggle in-class competition URL and instructions will be announced on Canvas shortly. You will receive marks for submitting (at least) one set of predictions for the unlabelled test dataset into the competition; and get basically reasonable accuracy, e.g. better than
May 15, 2021
SOLUTION.PDF

Get Answer To This Question

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here