COSC 1P03 Tutorial Exercise 10 Apr. 4-7, 2022 Sorting — QuickSort pivot selection Background: In lecture, we had a brief introduction to QuickSort. Your lab covered (or will cover) how to code the...

1 answer below »


COSC 1P03 Tutorial Exercise 10
Apr. 4-7, 2022
Sorting — QuickSort pivot selection

Background:
In lecture, we had a
ief introduction to QuickSort. Your lab covered (or will cover) how
to code the algorithm, but it glosses over one point: how to select the pivot.
• Recall that the QuickSort algorithm relies on taking a list of unsorted values,
selecting some value as the 'pivot', and then partitioning the records into three groups:
the records that precede the pivot, the pivot, and the records following it For example,
without even considering indices at all:
Suppose we were to select OCP as our pivot, we'd end up with:
Recursing first on the left side, and choosing CAB; and then on the right side, and
choosing TCP; would yield:
And, of course, recursing into the remaining slices (e.g. ABA) would identify them as
'trivial cases', so they're sorted as well.
For these particular records, and choices of pivots, everything was very 'balanced'.
• i.e. the problem was
oken up into two equal-sized 'sub-problems' at each step,
meaning the execution tree had a maximum depth bounded at log(N) But what if we'd
chosen poorly?
COSC 1P03 Tutorial 10
Pivot selection — picking the 'last' record:
Recall that, for each call, we'll have some 'lower-bound' and some 'upper-bound', marking
the left and right boundaries of that slice to sort.
Before getting into specific strategies for pivot selection, it's common to go with the
easiest: picking the record in that upper-bound position. e.g.:
But what would happen if we were to try the exact same algorithm, and sort the same
(already-sorted) records?

That is… far less impressive.
COSC 1P03 Tutorial 10
But it makes sense, right? We're picking a record, and then
eaking up the slice into
everything before that record, that record, and what comes after it. If the a
ay (and by
extension, that slice) is already sorted, and we pick the right-most record within the slice,
then clearly nothing could come after it, right?
That means, at each step, we're decreasing the length of the problem by one.
This means two things:
• The maximum depth of recursion will be O(N)
• Since each depth has a cost of O(N), the total algorithm is O(N×N), i.e. O(N²)
Whether or not this is 'a problem' ostensibly depends on the expected usage. Sorting an
already-sorted list can be quite common: for some applications, minor operations might
have only slightly disorganized the a
ay. For those cases, this would be catastrophic.
To answer the question of how to mitigate the risk here, first consider why the initial
example worked out so well: because it always picked a pivot that evenly-divided the
emaining records within the slice.
Can we force that to happen?
Picking the median record:
By coincidence, each 'rightmost record' happened to be the median record of that slice:
• OCP was the median (or middle) record of the whole a
ay
• CAB was the median of FDD, ABC, and CAB
• TCP was the median of XYZ, RNA, and TCP
So what if we were to simply manually search for the median record within each slice?
• This is the first question for the exercise at the end
• Consider what would be involved with finding that median value
◦ Definitely give this some thought, before the TA explains it For
now, we'll assume this is too expensive, and look for alternatives.
Picking the median position:
What if we were to simply pick the 'middle record' in the most literal sense possible?
i.e. pivot_position = ( lower_bound + upper_bound ) ÷ 2
This is a more interesting question to answer, because its expected efficacy also depends
on the use case:
• By definition, it must solve the problem of 'sorting a sorted list' ◦ Since it's sorted,
the 'middle record' must also be the 'median record'
• But what about for unsorted values?
Can we think of, say, a possible sequence of values where picking the 'middle values' will
e far worse than a depth of O(log(N))? This is the second question for the exercise.
COSC 1P03 Tutorial 10
Assuming you fiddled with the numbers long enough, you probably made some decent
progress towards recognizing that there are some sequences that don't 'play as nicely' as
others. And that's the real catch: there are an infinite number of sorting tasks wherein
picking the median position would work te
ibly.
Of course, there's also an infinite number where it'd work very well. But the point is even
this approach leads to a worst-case scenario of O(N²) overall.
And that's not really going to change. Our immediate goal isn't to guarantee a worst
possible bound of O(N·log(N)). Rather, in the absence of additional domain knowledge,
we'll commonly target the average case. This is counterintuitive to how we learned
asymptotic complexity, where we normally focused on the upper-bounds, but sometimes
performance in practice is what matters.
Median of three:
We've agreed that it isn't computationally feasible to pick the median record of a slice.
But what if we simply tried to minimize the damage?
The median of three algorithm is a very simple, but powerful, pivot selection scheme. For
a given slice, you compare exactly three records: the leftmost, the rightmost, and the
middle record ('middle' refe
ing to the position).
You choose as the pivot the median record of those three.
As an exercise, and the third question at the end, re-attempt your selection of integers
from the second question, but use median of three instead.
COSC 1P03 Tutorial 10
Name: _____________________________________________
St. #: _____________________________________________
Picking the median value
Is searching each slice for the 'median record' a good idea? Why or why not?
Picking the middle value
Can you think of a sequence of integers, such that picking the 'middle value' as the pivot
would be a bad idea? You don't have to wo
y about getting to O(N) for the recursion
depth. Just a few lines is enough to show how things can go wrong.
Using the median of three
Retry your numbers from the previous question, but use median of three this time.
Did it help?
Answered Same DayApr 03, 2022

Solution

Gaurav answered on Apr 04 2022
13 Votes
1:
It depends on the algorithm which will be used to find the median element, suppose if we able to get the median element in O(n) time the during the recursion process, then the time complexity of whole sorting process would be: t(n/2) + t(n/2) + O(n) + O(n)
And whole process depends on the time...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here