Difficulty: Hard, Asked-in: Google, Microsoft, Amazon.
Key takeaway: An excellent problem to learn problem-solving using two pointers approach similar to merging, and a divide-and-conquer approach similar to binary search.
There are two sorted arrays A[] and B[] of size n each. Write a program to find the median of the larger sorted array (array of size 2n) obtained after merging both sorted arrays.
Input: A[] = [1, 3, 6], B[] = [2, 8, 12], Output: 4.5.
Explanation: After merging the sorted arrays, we obtain the larger sorted array [1, 2, 3, 6, 8, 12]. The total number of elements is 6, so the median is the average of the two middle elements, 3 and 6.
Input: A[] = [1, 3, 4, 6, 9], B[] = [2, 5, 7, 8, 10], Output: 5.5.
Explanation: After merging the sorted arrays, we obtain a larger sorted array of size 10: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]. The median is the average of the two middle elements, 5 and 6.
One basic idea is to merge both sorted arrays using extra space and find the median by taking the average of the two middle elements in the merged array. If we follow 0-based indexing, the median will be the average of the values at the (n-1)th and nth indices. Note: For merging sorted arrays, we can use an approach similar to the merging procedure of merge sort.
double medianSortedArrays(int A[], int B[], int n)
{
int mergedArray[2 * n]
int i = 0, j = 0, k = 0
while(i < n && j < n)
{
if(A[i] <= B[j])
{
mergedArray[k] = A[i]
i = i + 1
k = k + 1
}
else
{
mergedArray[k] = B[j]
j = j + 1
k = k + 1
}
}
while(i < n)
{
mergedArray[k] = A[i]
i = i + 1
k = k + 1
}
while(j < n)
{
mergedArray[k] = B[j]
j = j + 1
k = k + 1
}
return (double)(mergedArray[n - 1] + mergedArray[n]) / 2
}
We need to traverse each element to merge both sorted arrays. In other words, at each iteration of the while loop, we place one value to MergedArr[]. So the time complexity of merging loop = O(n).
The overall time complexity = Time complexity of merging + Time complexity of finding median = O(n) + O(1) = O(n). We are using 2n size extra space for merging, so space complexity = O(n).
Now, the critical questions are: Do we need to perform a complete merge of sorted arrays to find the median? Can we reduce the extra space used in the merging process? Let's think!
We are traversing both arrays from the start in incremental order. At each iteration step, we are either increasing pointer i or pointer j and placing one value from either of the smaller arrays into the larger sorted array. At any general step of iteration, k = i + j - 1.
Here is an optimized solution idea: We need to track two middle elements to find the median, i.e., the nth and (n + 1)th elements of the final merged array. While comparing elements of the two sorted arrays, we track the count of elements in the merged array and update the two middle elements using two variables. We stop when the count reaches n + 1. At this point, we calculate the median by taking the average of the two middle elements stored in the two variables. This will eliminate the need to use extra space or perform a complete merge.
double medianSortedArrays(int A[], int B[], int n)
{
int i = 0, j = 0, count = 0
int middle1 = 0, middle2 = 0
while (count <= n)
{
if (A[i] <= B[j])
{
middle1 = middle2
middle2 = A[i]
i = i + 1
}
else
{
middle1 = middle2
middle2 = B[j]
j = j + 1
}
if (i == n)
{
middle1 = middle2
middle2 = B[0]
break
}
else if (j == n)
{
middle1 = middle2
middle2 = A[0]
break
}
count = count + 1
}
return (double)(middle1 + middle2) / 2
}
We are running while loop n + 1 times and performing O(1) operation at each iteration. So the time complexity = (n + 1)* O(1) = O(n). Since we are using a constant number of extra variables, space complexity = O(1).
Now the critical questions are: How can we improve the time complexity? Since this is a searching problem, can we take advantage of the sorted array property? Let's think!
The idea is to compare the medians of both sorted arrays and recursively reduce the search space by half. Suppose the median of the first array is m1, and the median of the second array is m2. We can get these values in O(1) time using this formula: m1 = A[n/2], m2 = B[n/2] (we assume that n is odd).
Case 1: if (m1 == m2): In this case, there are n - 1 elements less than m1 and n - 1 elements greater than m2. In other words, both m1 and m2 will be the middle elements in the merged array, i.e., element at (n - 1)th and nth index. The average of both values will be equal to m1 or m2, so we return either of these values as the output.
Case 2: if (m1 < m2): In the merged array, both middle elements of the median will lie between the range [m1, m2] i.e. m1 <= (middle1, middle2) <= m2.
Case 3 if (m1 > m2): In the merged array, both middle elements of the median will lie between the range [m2, m1], i.e., m2 <= (middle1, middle2) <= m1.
The above idea is quite similar to binary search: We reject half of the elements in both sorted arrays after each comparison. So we can think of implementing a solution using an idea similar to recursive binary search.
Suppose we use the function medianSortedArrays(int A[], int B[], int n):
If (m1 > m2): Both middle elements of the median must be present in one of these two subarrays: 1) Subarray starting from the first element of A[] and ending at m1, 2) Subarray starting from m2 and ending at the last element of B[]. In other words, we are rejecting n/2 elements from both subarrays. So, we recursively call the same function with input size n - n/2, i.e. medianSortedArrays(A, B + n/2, n - n/2).
For example, if n = 7, then n/2 = 3, m1 = A[3], and m2 = B[3]. So we reject 3 elements in A[] from index 4 to 6 and 3 elements in B[] from index 0 to 2. The remaining size of arrays A and B is n - n/2 = 7 - 3 = 4.
If (m2 > m1): Both middle elements of the median must be present in one of these two subarrays: 1) Subarray starting from m1 and ending at the last element of A[], 2) Subarray starting from the first element of B[] and ending at m2. In other words, we are rejecting n/2 elements from both subarrays. So, we recursively call the same function with input size n - n/2, i.e. medianSortedArrays(A + n/2, B, n - n/2).
For example, if n = 9, then n/2 = 4, m1 = A[4], and m2 = B[4]. So we reject 4 elements in A[] from index 0 to 3 and 4 elements in B[] from index 5 to 8. The remaining size of arrays A and B is n - n/2 = 9 - 4 = 5.
Base case: n is odd, so the base case will occur when n <= 2.
if(n == 0)
return 0
if(n == 1)
return (double)(A[0] + B[0]) / 2
if(n == 2)
return (max(A[0], B[0]) + min(A[1], B[1])) / 2.0
int getMedian(int C[], int n)
{
if(n % 2 == 0)
return (C[n / 2 - 1] + C[n / 2]) / 2
else
return C[n / 2]
}
double medianSortedArrays(int A[], int B[], int n)
{
int m1, m2
if(n == 0)
return 0
if(n == 1)
return (double)(A[0] + B[0]) / 2
if(n == 2)
return (max(A[0], B[0]) + min(A[1], B[1])) / 2.0
m1 = getMedian(A, n)
m2 = getMedian(B, n)
if(m1 == m2)
return m1
else if(m1 < m2)
return medianSortedArrays(A + n / 2, B, n - n / 2)
else
return medianSortedArrays(A, B + n / 2, n - n / 2)
}
Input size given in the problem = Size of A[] + Size of B[] = 2n. At each step of recursion, we are reducing input size by 1/2 i.e. we are rejecting one-half of both smaller arrays.
The recurrence relation of the time complexity: T(2n) = T(n) + O(1), or we can write T(x) = T(x/2) + O(1), where x = 2n. This looks similar to the recurrence relation of binary search. So time complexity = O(log x) = O(log 2n) = O(log n). Space complexity = O(log n), which is equal to the size of recursion call stack.
If you have any queries or feedback, please write us at contact@enjoyalgorithms.com. Enjoy learning, Enjoy algorithms!