Difficulty: Medium, Asked-in: Google, Microsoft, Adobe, Goldman Sachs, Qualcomm.
Quick sort algorithm was developed by computer scientist Tony Hoare in 1959.
Quick sort is one of the fastest sorting algorithms, based on the idea of divide-and-conquer. There can be several reasons to learn quick sort:
Suppose we partition an unsorted array around some input value (pivot) into two parts such that values in the left part are less than the pivot and values in the right part are greater than the pivot.
If we observe, the pivot will get placed in the correct position in the sorted output after the partition. This situation is similar to that of a sorted array, with one major difference: the left and right halves are still unsorted! If we observe further, both unsorted halves form smaller subproblems of the sorting problem.
So, if we sort both halves recursively, using the same function, the entire array will get sorted! The idea is simple: All values on the left side are less than the pivot, the pivot is at the correct position, and all values on the right side are greater than the pivot.
The above idea of quicksort is the divide-and-conquer approach to problem-solving. Here we divide the sorting problem into two smaller subproblems using the partition algorithm and solve each subproblem recursively to get the final sorted array. We repeat the same process for smaller subarrays until we reach the base case.
Divide Part: Divide the problem into two smaller subproblems by partitioning the array X[l...r]. The partition process will return the pivotIndex i.e. the index of the pivot after the partition.
Conquer Part: Sort both subarrays recursively.
Combine Part: This is a trivial case because after sorting both smaller arrays, the entire array will be sorted. In other words, there is no need for the combine part.
Suppose we use the function quickSort(int X[], int l, int r) to sort the entire array, where l and r are the left and right ends. The initial function call will be quickSort(X, 0, n - 1).
We define the partition(X, l, r) function to divide the array around the pivot and return the pivotIndex. We will select the pivot value inside the partition function. The critical question is: Why are we choosing the pivot value inside the partition function? Think and explore!
pivotIndex = partition(X, l, r)
Base case is the smallest version of the problem where recursion will terminate. So, in the case of quick sort, the base case occurs when the sub-array size is either 0 (empty) or 1. In other words, l >= r is the condition for the base case. The critical question is: When do we reach the empty subarray scenario? Think and explore!
void quickSort(int X[], int l, int r)
{
if (l < r)
{
int pivotIndex = partition(X, l, r)
quickSort(X, l, pivotIndex - 1)
quickSort(X, pivotIndex + 1, r)
}
}
How can we divide an array around the pivot so that the values in the left part are less than the pivot, and values in the right part are greater than the pivot? Can we do it in place? Let’s think!
Suppose we choose the rightmost value as the pivot, i.e., pivot = X[r], and scan the array from left to right. Now we know that the starting part of the array should contain values less than the pivot. So whenever an element is less than the pivot, we will place it at the starting part of the array and move forward. Otherwise, we will ignore that element and move forward. How can we implement it? Let’s think!
int partition (int X[], int l, int r)
{
int pivot = X[r]
int i = l - 1
for (int j = l; j < r; j = j + 1)
{
if (X[j] < pivot)
{
i = i + 1
swap (X[i], X[j])
}
}
swap (X[i + 1], X[r])
return i + 1
}
Critical questions to think:
We are running a single loop and doing constant operations at each iteration. In the worst or best case, we are doing one comparison at each iteration. So time complexity = O(n). We are using constant extra space, so space complexity = O(1). Note: Here swapping operation will depend on comparison if (X[j] < pivot).
Suppose T(n) is the worst-case time complexity of quick sort for input size n.
Now let's analyze it by breaking down the time complexities of each process:
For calculating the time complexity T(n), we add time complexities of divide, conquer, and combine parts.
T(n) = O(n) + T(i) + T(n — i — 1) + O(1)
= T(i) + T(n — i — 1) + O(n)
= T(i) + T(n — i — 1) + cn
Recurrence relation of the quick sort
T(n) = c, if n = 1
T(n) = T(i) + T(n — i — 1) + cn, if n > 1
For calculating the worst case time complexity, we put i = n - 1 in the above formula of T(n).
T(n) = T(n - 1) + T(0) + cn
T(n) = T(n - 1) + cn
Worst-case analysis of quick sort using Substitution Method
We simply expand the recurrence relation by substituting all intermediate value of T(i) (from i = n - 1 to 1). By end of this process, we ended up in a sum of arithmetic series.
T(n)
= T(n - 1) + cn
= T(n - 2) + c(n - 1) + cn
= T(n - 3) + c(n - 2) + c(n - 1) + cn
... and so on
= T(1) + 2c + 3c + ... + c(n - 2) + c(n - 1) + cn
= c + 2c + 3c + ... + c(n - 2) + c(n - 1) + cn
= c(1 + 2 + 3 + ... + n - 2 + n - 1 + n)
= c(n(n + 1)/2)
= O(n^2)
Worst case time complexity of quick sort = O(n^2).
Worst-case analysis of quick sort using Recursion Tree Method
The recursion tree method is one of the popular techniques for recursion time complexity analysis. Here we draw a tree structure of recursive calls and highlight the extra cost associated with each recursive call. To get the overall time complexity, we add the cost level by level.
We sum the total partitioning cost for each level => cn + c(n−1) + c(n−2) +⋯+ 2c + c = c (n + n−1 + n−2 + ⋯+ 2 + 1) = c[n(n+1)/2] = O(n^2).
The best-case scenario of quick sort will occur when partition process will always pick the median element as the pivot. In other words, this is a case of balanced partition, where both sub-arrays are approx. n/2 size each.
Suppose, the scenario of balanced partitioning will arise in each recursive call. Now, for calculating the time complexity in the best case, we put i = n/2 in the above formula of T(n).
T(n) = T(n/2) + T(n - 1 - n/2) + cn
= T(n/2) + T(n/2 - 1) + cn
~ 2 T(n/2)+ cn
T(n) = 2 T(n/2)+ cn
This recurrence is similar to the recurrence for merge sort, for which the solution is O(n log n). So the best-case time complexity of quick sort = O(n log n).
Suppose all permutations of input are equally likely. Now, when we run the quick sort algorithm on random input, partitioning is highly unlikely to happen in the same way at each level of recursion. The idea is simple: The behaviour of quick sort will depend on the relative order of values in the input.
Here some of the splits will be reasonably well balanced and some of the splits will be pretty unbalanced. So the partition process will generate a mix of good (balanced partition) and bad (unbalanced partition) splits in the average case. These good and bad splits will be distributed randomly throughout the recursion tree. For a better understanding of the average scenario, suppose good and bad splits appear at alternate levels.
Suppose at the root, partition generates a good split, and at the next level, partition generates a bad split. The cost of the partition will be O(n) at both levels. So the combined cost of the bad split followed by the good split is O(n). Overall, this is equivalent to a single level of partitioning, which resembles the scenario of a balanced partition.
As a result, the height of recursion tree will be O(log n), and the combined cost of each level will be O(n). This is similar to the situation of the best case. So the average-case time complexity of quick sort will be O(n log n).
Note: Average case time complexity is the same as the best case, but the value of the constant factor in O(n log n) will be slightly higher in the average case.
For better visualization, let’s assume that the partition always produces a partially unbalanced split in the ratio of 9 to 1. The recurrence relation for this: T(n) = T(9n/10) + T(n/10) + cn. Image source: CLRS Book.
We can notice following things from above diagram:
Quicksort is an in-place sorting algorithm because it does not use extra space in the code. However, every recursive program uses a call stack in the background. So the space complexity of quicksort will depend on the size of the recursion call stack, which will be equal to the height of the recursion tree. As mentioned in the above analysis, the height of the recursion tree depends on the partition process.
In the worst-case scenario, the partition will be unbalanced. So there will be only one recursive call at each level of recursion. In other words, the recursion tree will be skewed in nature, and its height will be O(n). So recursion will require a call stack of size O(n) and the worst-case space complexity of quick sort = O(n).
In the best-case scenario, the partition will be balanced. So there will be two recursive calls at each level of recursion. In such a scenario, the recursion tree will be balanced in nature. So, the height of the recursion tree = O(log n), and recursion will require a call stack of size O(log n).
In the above algorithm, we assume that all input values are distinct. How do we modify quick sort algorithm when input values are repeated? Let’s think!
We can modify the partition algorithm and separate input values into three groups: Values less than pivot, values equal to pivot and values greater than pivot. Values equal to pivot are already sorted, so only less-than and greater-than partitions need to be recursively sorted. Because of this, we will return two index startIndex and endIndex in the modified partition algorithm.
void quickSort(X[], l, r)
{
if (l < r)
{
[leftIndex, rightIndex] = partition(X, l, r)
quickSort(X, l, leftIndex - 1)
quickSort(X, rightIndex + 1, r)
}
}
Critical question: How can we implement the partition algorithm for this idea?
As we know from the analysis, the efficiency of quick sort depends on the smart selection of the pivot. So there can be many ways to choose the pivot:
In the above implementation, we have chosen the rightmost element as the pivot. This can result in a worst-case situation when the input array is sorted. The best idea would be to choose a random pivot that minimizes the chances of a worst-case at each level of recursion. Selecting the median element as the pivot can also be acceptable in the majority of cases.
mid = l + (r - l)/ 2
if X[mid] < X[l]
swap (X[l], X[mid])
if X[r] < X[l]
swap (X[l], X[r])
if X[mid] < X[r]
swap (X[mid], X[r])
pivot = X[r]
If you have any queries or feedback, please write us at contact@enjoyalgorithms.com. Enjoy learning, Enjoy coding, Enjoy algorithms!