Question 1 (20pts): Briefly comment on the following: a) The number of bytes required to save these data types in C: int *, int, char *, char. Clarify any assumptions. b) At least two factors that...

1 answer below »






Question 1 (20pts):


Briefly comment on the following:


a) The number of bytes required to save these data types in C: int *, int, char *, char. Clarify any assumptions.


b) At least two factors that determine the stack size associate with a TI RTOS task.


c) At least one advantage and one disadvantage of pinned memory.


d) How to guarantee real time deadlines when using the Direct Memory Access (DMA) unit to transfer data between two memory locations in a single bus system?


e) The context switching overhead. What factors contribute to this overhead?


f) Thread concurrency. Give an example when it is beneficial.


g) Two issues to take into consideration when profiling an embedded software package. Describe a workaround for every issue.


h) The need for atomic operations to implement semaphores. Explain an alternative method if the micro architecture does not provide atomic operations.


i) Three reasons why a program might not scale in terms of reducing its runtime when it moves from a single core to a multicore platform?


j) Consider a system with three tasks T1, T2 and T3 and three resources R1, R2, and R3. Task T1 can acquire R1 and R2, task T2 can acquire R2 and R3, task T3 can acquire R3. There is no restriction on when a task can ask for a resource. Can this system deadlock? Explain.







Question 2 (20pts):


Given the following SDF graph A, B, C, and D are actors. Adjacent to each port is the number of tokens consumed or produced by a firing of the actor on that port. Assume the variable x represent the number of initial tokens (delay values) on the corresponding edge from C to B. x has non-negative integer value.


a) Write and solve the balance equations for this SDF graph. What is the minimum value for x for this graph to have a valid SDF schedule?


b) Given the value of x from step ‘a’, derive a schedule that requires the minimum buffer sizes for the shown data flow edges. What is the total buffer space required for this schedule?






Define the function T{V  R} as the amount of time units that an actor V takes to complete one firing. For the shown graph, T(A) = 1, T(B)=4, T(C)=4, and T(D)=5. If we define the latency of one iteration of the schedule by the time when A starts the firing till D ends firing within that iteration. The average latency of the system is the average latency of the graph across multiple iterations. The throughput of the system is defined by the number of completed iterations per unit time. Answer the following:


c) Given a platform with one processor, what is the average latency and the throughput of the system?


d) Given a platform with two processors, can you reduce the average latency of the system? Draw a time diagram that shows how actor instances are assigned to processors to validate your solution. What is the minimum value of x in this case? What is the minimum buffer sizes for all edges? What is the throughput and latency of the system?


e) Assume x=2, and given the throughput value you calculated in step d, can you increase the throughput further? If so, what is the latency? Explain and support your answer with a timing diagram as you did in step d. (Hint: the back edge from C to B means that consecutive graph iterations are data dependent).







Question 3 (20pts):


Consider two tasks to be executed periodically on a single processor, where task 1 has period p1 = 4 and task 2 has period p2 = 10. Assume task 1 has execution time e1 = 1, and task 2 has execution time e2 = 7.


a) Sketch a rate-monotonic schedule (for 20 time units, the least common multiple of 4 and 10). Is the schedule feasible?


b) Now suppose task 1 and 2 contend for a mutex lock, assuming that the lock is acquired at the beginning of each execution and released at the end of each execution. Also, suppose that acquiring or releasing locks takes zero time and the priority inheritance protocol is used. Is the rate-monotonic schedule feasible?


c) Assume still that tasks 1 and 2 content for a mutex lock, as in part (b). Suppose that task 2 is running an anytime algorithm, which is an algorithm that can be terminated early and still deliver useful results. For example, it might be an image processing algorithm that will deliver a lower quality image when terminated early. Find the maximum value for the execution time e2 of task 2 such that the rate-monotonic schedule is feasible. Construct the resulting schedule, with the reduced execution time for task 2, and sketch the schedule for 20 time units. You may assume that execution times are always positive integers.


d) For the original problem, where e1 = 1 and e2 = 7, and there is no mutex lock, sketch an EDF schedule for 20 time units. For tie-breaking among task executions with the same deadline, assume the execution of task 1 has higher priority than the execution of task 2. Is the schedule feasible?


e) Now consider adding a third task, task 3, which has period p3 = 5 and execution time e3 = 2. In addition, assume as in part (c) that we can adjust execution time of task 2, tasks 1 and 2 content for a mutex lock, and the priority inheritance protocol is used. Find the maximum value for the execution time e2 of task 2 such that the EDF schedule is feasible and sketch the schedule for 20 time units. Again, you may assume that the execution times are always positive integers. For tie-breaking among task executions with the same deadline, assume task i has higher priority than task j if i







Question 4 (15pts):


Given the TI RTOS task threading modules, and a system that consists of three independent tasks T1, T2, and T3. Explain how you can use the provided APIs to schedule these two tasks using:


a) Rate Monotonic Strategy (RMS).


b) Earliest Deadline First strategy. In this case, you can assume that when the inputs of either task T1, T2, or T3 are available, an interrupt service routine called “system_task” is triggered. “system_task has the highest thread priority. Inputs to “system_task” are the “ID” of the ready task (e.g., “Task1” or “Task2”, or “Task3”), and the deadline of the task.


c) Round robin strategy where every task should execute for a fixed number of clock cycles.






Notes:


 Describe how you will architect the system. You can use pseudo code to highlight your design.


 You can use any other modules provided by TI RTOS.


 Make and document any reasonable assumptions.


 Limit your answer for this question to one page maximum.







Question 5 (10pts):


The following CUDA kernel is configured with a grid of 100 blocks where every block consists of 1 thread. All the elements of the array “sum” = 0 before launching the kernel.


a) (3 pts) Trace the kernel for the first thread in block 25. For your tracking, indicate the value threadIdx.x, blockIdx.x, blockDim.x and index.


b) (7 pts)What can be the final value of the elements of “sum” after the kernel returns? Explain your answer. You can assume that the size of the arrays ‘a’ and “sum” is large enough so that segmentation fault will not take place. __global__ void sum_kernel( int *a, int *sum) { int index = threadIdx.x + blockIdx.x * blockDim.x; sum[threadIdx.x] += a[index]; __syncthreads(); }







Question 6 (10pts)


Section 4.3 in the TI RTOS kernel documentation explains various ways for implementing gates that prevent concurrent access to critical regions. Given a system that consists of three tasks: T1, T2, and T3 and two shared resources R1 and R2. The priority of T1 is greater than T2 which is greater than T3. Both T1 and T3 access the resources R1 and R2. Is it possible using the provided gates API to implement the priority ceiling algorit hm? Explain your answer. You can use pseudo code to clarify your point. You can consider using both preemption based gate and semaphore based gate implementations. Limit your answer to one page.

Answered Same DayAug 20, 2021

Answer To: Question 1 (20pts): Briefly comment on the following: a) The number of bytes required to save these...

Robert answered on Aug 21 2021
130 Votes
Question 1:
Briefly comment on the following:
a) The number of bytes required to save these data types in C:
i. int* : This stores the value of a memory address where the actual integer is stored. So,
depending on the memory architecture, we need 8 bytes or 16 bytes for 32 bit and 64 bit
architectures respectively.
ii. int : This typically requires 4 bytes of memory
iii. char* : Same as int* . 8 bytes or 16 bytes.
iv. char
: 1 byte only
b) At least two factors that determine the stack size associated with a TI RTOS task:
i. The number of local variables to be used in the associated task
ii. The number of nested function calls
c) At least one advantage and one disadvantage of pinned memory
i. Advantage: Higher transfer throughput for large memory transfers
ii. Disadvantage: Expensive to allocate and Deallocate as it corresponds to a physical
memory which is not pagable instead of a virtual memory by an OS (an overhead is
involved).
d) How to guarantee real time deadlines when using the Direct Memory Access (DMA) unit to
transfer data between two memory locations in a single bus system?
i. The DMA can do many data operations such as read, write, decrement the count,
increment/decrement the memory location in one clock cycle.
ii. However, to guarantee a real time deadline, it can be operated in a ‘burst mode’ where a
block of data is operated before transfering the control of the bus.
e) The context switching overhead. What are the factors that contribute to this overhead?
i. The size of the hardware based context: that is, the number of registers used in a context
space (either task/process/thread) – this is specified by the processor or the mode it is
running.
ii. The size of the software based context: This is defined by the OS which gives several
data types for the processes/threads/tasks. So, everytime there is a context switch the OS
switches the context, it saves all the memory locations of attributes contained this data
types in a temporary space.
iii. How much of the context switching is happening in the Software vs Hardware.
Switching in Hardware is very fast. It only has to change the program counter register
along with a few other registers. Whereas in software, there is the overhead of other
instructions.
f) Thread concurrency: Give an example when it is benficial
i. Graphics Processing in a game software which uses a GPU (Graphic Processing Unit)
g) Two issues to take into consideration when profiling an embeded software package.
Describe a work around for every issue.
i. All the peripherals need to be faithfully simulated on the host system
a) Use of In-system programming to use the real peripherals and the environment even
while debugging.
ii. Profiling overhead (due to code instrumentation) needs to be taken into account.
a) Use highly platform dependent, customized mechanisms so that it does not put an
additional overhead which renders the data useless.
h) The need for atomic operations to implement semaphores. Explain an alternative method if
the micro architecture does not provide atomic operations.
i. If the operations are not atomic, the delay between requesting a semaphore and aquiring
it might be huge where there can be another thread/process has asked for the same
semaphore. This will lead to race conditions.
ii. Alternatives: Implement proper task synchronization with the help of inter process
communications. For instance, a request-response model between tasks for resources.
i) Three reasons why a program might not scale in terms of reducing its runtime when it
moves from a single core to a multicore platform?
i. The overhead of allocating tasks to processor cores might not be as efficient as expected.
ii. The same task has to be always scheduled on the same processor. Otherwise, it may not
best utilize the processor cache efficiently and importing the context causes a huge
delay.
iii. On the other hand, if we use the same processor for the same tasks, it is not efficient and
some cores might be idle for a while.
iv. The overhead of managing shared resources (eg. Memory) might not be as efficient as
expected.
j) Consider a system with three tasks T1, T2 and T3 and there resources R1, R2, and R3. Task
T1 can acquire R1 and R2, task T2 can acquire R2 and R3, task T3 can acquire R3. There is
no restriction on...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here