Difficulty: Medium, Asked-in: Google, Microsoft, Amazon, Cisco, SAP Labs, VMWare
Key takeaways
Given an array X[] and a positive integer k, write a program to find the kth smallest element in the array.
Important note: Before moving on to the solutions, we recommend trying this problem on paper for at least 15 or 30 minutes. Enjoy problem-solving!
Input: X[] = [4, 3, 13, 2, 12, 7, 23], k = 4
Output: 7, i.e., 7 is the 4th smallest element in the array.
Input: X[] = [-12, -8, 16, 23], k = 2
Output: -8, i.e., -8 is the 2nd smallest element in the array.
As we know, all elements in the array are distinct. So one basic idea would be to sort the array in increasing order and directly return the kth number from the start, i.e., return X[k - 1].
int KthSmallestArray(int X[], int n, int k)
{
sort(X, n)
return X[k - 1]
}
Suppose we are using heap sort, which is an efficient O(nlogn) sorting algorithm. So time complexity is equal to the time complexity of the heap sort + the time complexity of accessing the kth smallest element, which is O(nlogn) + O(1) = O(nlogn).
The space complexity is O(1) because heap sort is an in-place sorting algorithm.
The time complexity of the above solution is dominated by the sorting algorithm. Now the critical question is: Can we improve the time complexity further? Can we solve this problem without using sorting? Can we think of using some efficient mechanism to find min or max elements like the min-priority queue or heap data structure? Think!
A min-heap is an array-based complete binary tree structure where the value in each node is smaller than or equal to the values in the children of that node. So the minimum element is always present at the root, i.e., X[0].
We also use the min-heap for the efficient implementation of a min-priority queue. Here are some critical min-heap operations:
So how do we optimize time complexity and find the kth smallest element using the above min-heap operations? Here is an idea: we first build a min-heap of all n array elements and remove k - 1 elements by continuously performing the deleteMin() operation. After this, the kth smallest element will be present at the root of the min-heap. So we can easily get this in O(1) time by calling the getMin() operation.
Finding kth smallest element using min-heap
int KthSmallestArray(int X[], int n, int k)
{
MinHeap heap(X, n)
for (int i = 0; i < k - 1; i = i + 1)
heap.deleteMin()
return heap.getMin()
}
class MinHeap
{
private:
int *heapArray;
int heapCapacity;
int heapSize;
void minHeapify(int i)
{
int l = leftChild(i);
int r = rightChild(i);
int smallest = i;
if (l < heapSize && heapArray[l] < heapArray[i])
smallest = l;
if (r < heapSize && heapArray[r] < heapArray[smallest])
smallest = r;
if(smallest != i)
{
swap(heapArray[i], heapArray[smallest]);
minHeapify(smallest);
}
}
public:
MinHeap(int X[], int size)
{
heapSize = size;
heapArray = X;
int i = (heapSize - 1)/2;
while (i >= 0)
{
minHeapify(i);
i = i - 1;
}
}
int deleteMin()
{
if (heapSize == 0)
return INT_MAX;
int min = heapArray[0];
if (heapSize > 1)
{
heapArray[0] = heapArray[heapSize - 1];
minHeapify(0);
}
heapSize = heapSize - 1;
return min;
}
int parent(int i)
{
return (i - 1)/2;
}
int leftChild(int i)
{
return (2 * i + 1);
}
int rightChild(int i)
{
return (2 * i + 2);
}
int getMin()
{
return heapArray[0];
}
};
int KthSmallestArray(int X[], int n, int k)
{
MinHeap heap(X, n);
for (int i = 0; i < k - 1; i = i + 1)
heap.deleteMin();
return heap.getMin();
}
Time complexity = Time complexity of building the min-heap of size n + Time complexity of deleting k - 1 elements from the min-heap + Time complexity of accessing the kth element from the min-heap.
So overall time complexity = O(n) + O(k log n) + O(1) = O(n + k log n).
The space complexity is O(1) because we can build the min-heap in place using the same array. Therefore, we are using constant extra memory. Now a critical question would be: can we optimize the solution further?
Similar to a min-heap, the max-heap is an array-based complete binary tree structure where the value in each node is larger than or equal to the values in the children of that node. So the maximum element is always present at the root, i.e., X[0].
We also use a max-heap for the efficient implementation of a max-priority queue. Here are some critical max-heap operations:
How do we solve this problem using a max-heap? A solution insight would be: If we have a max-heap of the k smallest elements of the array, then the kth smallest element will be present at the root of the max-heap, and we can get the root value in O(1) time. But the critical question is: how do we generate the max-heap of k smallest elements in the array? Let's think!
Finding the kth smallest element
int KthSmallestArray(int X[], int n, int k)
{
MaxHeap heap(X, k)
for (int i = k; i < n; i = i + 1)
{
if(X[i] < heap.getMax())
heap.replaceMax(X[i])
}
return heap.getMax()
}
class MaxHeap
{
private:
int* heapArray;
int heapCapacity;
int heapSize;
void maxHeapify(int i)
{
int l = leftChild(i);
int r = rightChild(i);
int largest = i;
if (l < heapSize && heapArray[l] > heapArray[i])
largest = l;
if (r < heapSize && heapArray[r] > heapArray[largest])
largest = r;
if (largest != i)
{
swap(heapArray[i], heapArray[largest]);
maxHeapify(largest);
}
}
public:
MaxHeap(int X[], int size)
{
heapSize = size;
heapArray = X;
int i = (heapSize - 1) / 2;
while (i >= 0)
{
maxHeapify(i);
i = i - 1;
}
}
int deleteMax()
{
if (heapSize == 0)
return INT_MAX;
int max = heapArray[0];
if (heapSize > 1)
{
heapArray[0] = heapArray[heapSize - 1];
maxHeapify(0);
}
heapSize = heapSize - 1;
return max;
}
int parent(int i)
{
return (i - 1)/2;
}
int leftChild(int i)
{
return (2 * i + 1);
}
int rightChild(int i)
{
return (2 * i + 2);
}
int getMax()
{
return heapArray[0];
}
void replaceMax(int value)
{
heapArray[0] = value;
maxHeapify(0);
}
};
int KthSmallestArray(int X[], int n, int k)
{
MaxHeap heap(X, k);
for (int i = k; i < n; i = i + 1)
{
if(X[i] < heap.getMax())
heap.replaceMax(X[i]);
}
return heap.getMax();
}
Now, we will discuss an interesting quick-select approach that solves the problem efficiently by using a divide-and-conquer idea similar to the quick-sort algorithm.
The solution intuition comes from the quick-sort partition process: dividing the array into two parts around a pivot and returning the sorted array's pivot index. Elements in the array will look like this after the partition: X[l...pos-1] < pivot < X[pos+1...r]. Here the pivot element is present at index pos, and pos - l elements are smaller than the pivot. So the pivot element is the (pos - l + 1)th smallest element in the array.
So from the above insight, we can develop an approach to use the partition process and find the kth smallest element recursively. But unlike the quick-sort, which processes both subarrays recursively, we process only one subarray. We recur for either the left or right side based on comparing k and the pivot position.
int partition (int X[], int l, int r)
{
int pivot = X[r];
int i = l - 1;
for (int j = l; j < r; j = j + 1)
{
if (X[j] <= pivot)
{
i = i + 1;
swap (X[i], X[j]);
}
}
swap (X[i + 1], X[r]);
return i + 1;
}
int KthSmallestArray(int X[], int l, int r, int k)
{
if(l == r)
return X[l];
int pos = partition(X, l, r);
int i = pos - l + 1;
if(i == k)
return X[pos];
else if(i > k)
return KthSmallestArray(X, l, pos - 1, k);
else
return KthSmallestArray(X, pos + 1, r, k - i);
}
int getKthSmallest(int X[], int n, int k)
{
return KthSmallestArray(X, 0, n - 1, k);
}
def partition(X, l, r):
pivot = X[r]
i = l - 1
for j in range(l, r):
if X[j] <= pivot:
i = i + 1
X[i], X[j] = X[j], X[i]
X[i + 1], X[r] = X[r], X[i + 1]
return i + 1
def KthSmallestArray(X, l, r, k):
if l == r:
return X[l]
pos = partition(X, l, r)
i = pos - l + 1
if i == k:
return X[pos]
elif i > k:
return KthSmallestArray(X, l, pos - 1, k)
else:
return KthSmallestArray(X, pos + 1, r, k - i)
def getKthSmallest(X, n, k):
return KthSmallestArray(X, 0, n - 1, k)
This is a divide and conquer algorithm, where we are solving only one subproblem.
Worst-case analysis: The worst-case situation occurs when the partition is bad and highly unbalanced. There would be two scenarios: either 0 elements in the left subarray and (n - 1) elements in the right subarray or (n - 1) elements in the left subarray and 0 elements in the right subarray. So in the worst case, the algorithm always explores the larger sub-problem of size n - 1.
The quick-select algorithm looks highly inefficient in the worst case. But the real magic would be average case analysis because the algorithm works in O(n) time complexity on average. How? Let's think.
Average case analysis: We assume that the partition process chooses the pivot randomly. In other words, all possibilities of partition are equally likely, and the probability of occurring is the worst case in 1/n.
T(n) = 1/n * [ i = 1 to n ∑ T(max(i - 1, n - i)) ] + O(n)
So, from i = 1 to n, each term T(i) appears twice in the above formula (think!). We can also place cn in place of O(n). So, we can simplify the above formula.
T(n) <= 2/n [ i = n/2 to n-1 ∑ T(i) ] + cn
We can solve this using the “guess and check” or substitution method based on our intuition. Let's assume that T(n) is a linear function, i.e., T(n) ≤ pn, where p is a constant.
T(n) <= 2/n [ i = n/2 to n-1 ∑ pi ] + cn
Let's further simplify the expression on the right-hand side.
2/n (i = n/2 to n-1 ∑pi) + cn
= 2p/n (i = n/2 to n-1 ∑ i) + cn
= 2p/n [( i = 0 to n-1 ∑ i) - (i = 0 to n/2-1 ∑ i)] + cn
= 2p/n [n(n-1)/2 - (n/2 - 1)(n/2 - 2)/2] + cn
= p/n [n(n-1) - (n/2 - 1)(n/2 - 2)] + cn
= p/n [n^2 - n - n^2/4 + n + n/2 - 2] + cn
= p/n [3n^2/4 + n/2 - 2] + cn
= p [3n/4 + 1/2 - 2/n] + cn
= 3pn/4 + p/2 - 2p/n + cn
= pn - (pn/4 - cn - p/2)
To complete the proof, we need to show that for sufficiently large n, pn - (pn/4 - cn - p/2) is at most pn, or equivalently (pn/4 - cn - p/2) > 0 => n(c- p/4) > p/2 => n > (p/2)/(c- p/4). (p/2)/(c- p/4) is a small constant, so our guess would be correct for n > (p/2)/(c- p/4). Therefore, the average case time complexity of the quick select algorithm T(n) = pn = O(n).
Regarding space complexity, the worst-case situation occurs when the partition is bad, and the height of the recursion tree will be O(n). In such a scenario, the recursion decreases by 1 and allocates O(n) stack space. But in the average case, space complexity would be O(logn). (Think!)
If you have any queries/doubts/feedback, please write us at contact@enjoyalgorithms.com. Enjoy learning, Enjoy algorithms!