Page 1 of 8 COSC 1114 Operating Systems Principles Semester 2, 2022 Assignment 1 Assessment Type Individual assignment. See Canvas for submission details Marks awarded for meeting requirements as...

Operating Systems Assignment 1


Page 1 of 8 COSC 1114 Operating Systems Principles Semester 2, 2022 Assignment 1 Assessment Type Individual assignment. See Canvas for submission details Marks awarded for meeting requirements as closely as possible. Clarifications/updates may be made via announcements/relevant discussion forums. Submission Due Date Start of week 7. Marks 40 (40% of semester) 1. Research Question We can use Load Balancing as a technique of ensuring that all processes/threads in a particular [problem are treated in such a fashion that they all finish at approximately the same time. We frame the problem in the form of a map/reduce problem, and then use Static Load Balancing (S-LBAN) to adjust the process load for optimal net performance. The question is how accurate can we do this? Is it worth the effort? Clearly Amazon’s Elastic Computing (EC2) tells us that this is so on a large scale, which typically includes a lot of data movement and networking. But what about in a real-time system, with no significant networking – the data arriving in a batch before the program begins, and having repeatable statistical properties? In other words, we measure, adjust, and deliver optimally. The central task of this assignment is to produce an evidence-based report answering the above questions. Methodology We propose to answer the RQ by creating a series of tasks. First some simple ones in order to establish a baseline of result expectations, then to rewrite the tasks using the map/reduce software pattern to produce a set of concurrent threads of different complexity. We can then adjust the scheduling in order to answer the research question. 2. Overview of Methodology In this assignment, you must write C/C++ code to implement a map / reduce problem. Map() is a process or function that maps the input problem to the available resources which may mean subdividing the problem into components and solving each of those component problems using prioritized job scheduling. Then reduce() gathers the component solutions and combines them to form the global solution. You will be provided with “dirty data” and the assignment is then divided into a number of tasks, each of which purports to solve the problem a different way. In each case, there will be a performance measuring phase, and for the later methods, an adjustment phase based on the performance metrics obtained. Throughout, you will be measuring and properly documenting performance (and hopefully improving it) In the last task, where streaming data is used, it will be important to reduce idle time by ensuring that all subtasks created by map() finish at the same time, so that reduce() is not kept waiting. Page 2 of 8 3. Learning Outcomes This assessment relates to the following course learning outcomes: • Summarise the full range of considerations in the design of file systems, summarise techniques for achieving synchronisation in an operation system, • Compare and contrast the common algorithms used for both pre-emptive and non-pre-emptive scheduling of tasks in operating systems, such as priority, performance comparison, and fair- share schemes. Contrast kernel and user mode in an operating system • Evaluate and report appropriate design choices when solving real-world problems • Analyse the key trade-offs between multiple approaches to operating system design. 4. Assessment details General Your solution to the problem you choose must not use busy waiting, which is where a program sits in a tight loop until some condition is met as this waste’s CPU time. Instead, you must have each thread sleep until a condition is met (use a condition variable), wake up and do some processing and then perhaps signal other threads that the job has been done. Please ensure sufficient output so that it is clear when your program performs any actions such as adding to an array, locking a lock, etc. You should ensure in your design of the program that you only share between threads the minimum state (variables) possible as the more information that is shared the more likely there is to be a conflict (deadlocks, race conditions, etc). Please see the courseware material for more detail on these problems. If your algorithm requires randomness, you should ensure you seed the random number generator correctly. To ease marking of your assignment, you must call your executables "taskN", where N is the task number as described below. All materials necessary for the markers to build your tasks must be included in the submitted ZIP file. The markers will use the CS servers (titan, saturn, jupiter) for the marking, and the code is expected to run there. Makefile Your solution must be written in C / c++ and you must supply a Makefile that builds your solution incrementally and then links them together. If you are feeling a big rusty about make files or have not used them before, we recommend going through the following tutorial: https://www.tutorialspoint.com/makefile/index.htm Compiler Settings Please note that as a minimum you must use the ‘-Wall -g’ flags on gcc or equivalent on other compilers to generate warnings. You may use any supported c++ compiler on the server - if you wish to use a standard above c++ on the server with g++ you will need to use the scl command, e.g.: scl enable devtoolset-9 bash This should get you gcc version 9.3.1 (2020) instead of gcc 4.8.5 (2015). Use “gcc –version” to confirm. Graceful Exit In order to account for the possibility of thread starvation, you will need to gracefully exit your simulation at the end of around 10 seconds. We would normally expect even the slowest task – task1() – to not take longer than 10 seconds to execute. Use the following method to do this: once you have launched all the required threads from main, call the sleep function, to specify sleep for ten seconds. Once the sleep function finishes, change the value of a global variable to indicate that all the threads should exit, which they should be checking regularly. A less graceful exit might be to call exit() after 10 seconds, but that risks leaving zombies behind with unfinished business – potentially leading to data loss. https://www.tutorialspoint.com/makefile/index.htm Page 3 of 8 Valgrind The solution you submit will ideally be as bug-free as possible including free of memory bugs. Your markers will mark your submission on the titan/jupiter/saturn servers provided by the university and we will use the tool "valgrind" to test that your program is memory correct (no invalid access to memory and all memory freed) and as such you should check your program on titan before submission with this tool. This is for debugging and memory testing only. When gathering performance data, you should do this on a dedicated machine or VM, since on titan, being a shared resource, the timings will obviously not be repeatable. The following command: valgrind --track-origins=yes --leak-check=full --show-leak-kinds=all ./simulation Will run your program with valgrind. Please note that in this case the name of the executable produced by compilation of your program is 'executable'. Allowed Concurrency Functions For the concurrency part of the assignment, you must limit yourself to the following functions: • pthread_create • pthread_join • pthread_detach • pthread_mutex_init • pthread_mutex_destroy • pthread_mutex_trylock • pthread_mutex_lock • pthread_cond_wait • pthread_cond_signal • pthread_cond_init • you may also need the pthread_mutexattr_* functions • you may also use the scheduling priority function. This list is not exhaustive and so if you are unsure, please ask on the discussion board about the function you wish to use. Please note that in practice beyond this course, you would use higher-level c++ functions that call these functions, but part of what you are learning here is the underlying calls that are made as part of managing concurrency. That is, part of what you are learning is to apply the theory that you learn in the workshops to practical problems. The Source Data To start off, you may use a words list conventionally stored in ‘/usr/share/dict/linux.words’ as a clean data file. You may need to install this in your distro. Titan does not have it. The actual data to be used is at https://www.keithv.com/software/wlist/. You will find a number of ZIP files containing text files containing (ideally) 1 word per line which you read into an array for subsequent processing. In fact, the data is ‘dirty’ and you will need to clean It up first. In your performance measurement using Amdahl’s Law, this is the serial part. Performance Measurement and Reporting You will need to devise a way of measuring improvement using primarily execution time, but also resource usage. Consider some of the metrics discussed in class. You then need to describe this using graphs and other ways to describe how what you did improved performance, and why. See reporting details below. https://www.keithv.com/software/wlist/ Page 4 of 8 5. The Tasks in detail. The fundamental task is to receive a set of words, clean it up, then use map/reduce ultimately to sort it. Words of length 3-15 characters are to be sorted on the third character onwards. Words of other lengths are to be ignored (filtered out). The tasks below achieve this in different ways. Below you will find a selection of five tasks called task1 … task5 that must all be done in sequence. In each case, the simulations should not exceed about 10 seconds and terminate cleanly - no crashes, no deadlocks, no race conditions. This should be enough time to gather event statistics. You may need to replicate the data in order to make it run long enough. If so
Aug 10, 2022
SOLUTION.PDF

Get Answer To This Question

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here