Difficulty: Easy, Asked-in: Google, Amazon, Adobe, Oracle, Qualcomm, SAP Labs
Given an array X of n integers sorted in ascending order and an integer key, write a program to search for the key in X. If the key exists, then return its index. Otherwise, return -1.
Example 1
Input: X[] = [-4, 2, 4, 5, 9, 12], key = 5, Output: 3
Explanation: 5 exists in X[] at index 3. So we return 3.
Example 2
Input: X[] = [-4, 2, 4, 5, 9, 12], key = 6, Output: -1
Explanation: 6 does not exist in X[]. So we return -1.
Imagine a game where the computer selects a number between 1 and 16, and we need to find this number with a minimum number of guesses. For each guess, the computer will tell us whether the guessed number is equal to, greater than, or less than the actual number.
A linear guess of all the continuous values 1, 2, . . . 16 would be inefficient because, with each question, we eliminate only one number. In the worst-case scenario, we need to guess 16 times! Can we find a better solution and discover the number with a minimum number of guesses?
Now, the critical question is: What would be the best guess at the start? Since the computer provides us with comparative insights about the guessed number and the actual number, the best first guess would be to choose the middle number (8).
After every guess, we are rejecting half of the given numbers in one go. In the worst case, we need four comparisons to identify the actual number. This is a significant improvement!
Now, based on the above idea, can we take advantage of the sorted input order to find the target value efficiently? One insight is simple: if we pick any number "x" in the sorted sequence, all the numbers on the left will be less than "x", and all the numbers on the right will be greater than "x".
So based on this idea, we can design a simple divide and conquer strategy where we compare the target value key with the value at the mid index. If the middle element is equal to the key, then we are done. Otherwise, based on the comparison, we search for the key in the left half or right half.
Similarly, we keep searching until the key is found (successful search) or the subarray size is reduced to 0 (unsuccessful search). Here, we are solving the searching problem of input size n by using the searching problem of input size n/2 (either the left half or the right half). The core idea is simple: at each stage of the process, we continuously reduce the search interval by half. Think!
Suppose we use a function binarySearch(X[], l, r, key) to search for the given key in the sorted array. Here, l and r are the indices of the left and right ends of the subarray. We start with l = 0 and r = n - 1.
Note: Why did we not use the equation (l + r)/2 to calculate the mid-index? Here is the reason: Practically, for large values of the left and right index, the value of (l + r) may exceed the range of integers in programming, even if l and r are within the range. This can result in an integer overflow for very large arrays. To solve this problem, we can use the equation mid = l + (r - l)/2 to fix the integer overflow error. For a better understanding, follow this wonderful reference.
This is a trivial step because after comparing the middle element, one sub-problem solution (either the left or right half) will return the index or return -1 . There is no need to combine the solutions to the sub-problems.
The base case would be the scenario when the left index crosses the right index or the subarray size shrinks to zero, i.e. if (l > r), we return -1. This is the case of an unsuccessful search. In other words, this would be the last stage of the recursion or the smallest version of the sub-problem.
int binarySearch(int X[], int l, int r, int key)
{
if (l > r)
return -1;
else
{
int mid = l + (r - l) / 2;
if (X[mid] == key)
return mid;
if (X[mid] > key)
return binarySearch(X, l, mid - 1, key);
else
return binarySearch(X, mid + 1, r, key);
}
}
def binarySearch(X, l, r, key):
if l > r:
return -1
else:
mid = l + (r - l) // 2
if X[mid] == key:
return mid
elif X[mid] > key:
return binarySearch(X, l, mid - 1, key)
else:
return binarySearch(X, mid + 1, r, key)
Binary search can be easy to visualize using recursion. The critical question is: Can we implement this using iteration or a loop? Let’s think! If we observe closely, only two parameters get updated during every recursive call: the left and right ends of the search subarray.
So we need to find a way to update the left and right ends of the current subarray using a loop. Here's an idea:
int binarySearch(int X[], int l, int r, int key)
{
while (l <= r)
{
int mid = l + (r - l) / 2;
if (X[mid] > key)
r = mid - 1;
if (X[mid] < key)
l = mid + 1;
else
return mid;
}
return -1;
}
def binarySearch(X, l, r, key):
while l <= r:
mid = l + (r - l) // 2
if X[mid] > key:
r = mid - 1
elif X[mid] < key:
l = mid + 1
else:
return mid
return -1
After each comparison, the input size decreases by half. Initially, we have n elements, then we have n/2 elements after the 1st comparison, n/4 elements after the 2nd comparison, and so on. The worst-case situation will occur when we reach the base case (unsuccessful search), i.e., n -> n/2 -> n/4 -> ... 1 -> unsuccessful search.
Suppose we reach the base case after k number of steps => n/2^k = 1 => n = 2^k => k = log2n. In simple words, after log2n number of steps, the algorithm will reach its base case.
At each step of the recursion, we perform O(1) operations. So the worst-case time complexity of the binary search is log2n * O(1) = O(logn).
For a clear picture, let's assume for the moment that the size of the array is a power of 2, i.e., n = 2^k => k = log2n. Now, when we compare the middle element each time, we cut the size of the subarrays by half.
Initially, the subarray size is 2^k. After the 1st step, the subarray size will be 2^(k-1) and after the second step, the subarray size will be 2^(k-2), and so on. After the k or log2n number of steps, we will reach a subarray of size 1. Now, in the next step, we will reach the base case.
So altogether, we can have at most k+1 or log2n+1 number of steps in the worst case. Within each step, we perform a constant amount of work: calculating the mid-point and a comparison operation.
Overall, when given the array size n, we perform c(log2n + 1) operations in the worst case. So the worst-case time complexity of the binary search is O(logn).
Let's assume that T(n) is the worst-case time complexity of binary search for n elements. When n > 0, we can break down the time complexities as follows:
To calculate T(n), we need to add the time complexities of the divide, conquer, and combine parts: T(n) = O(1) + T(n/2) + O(1) = T(n/2) + c. Here is the recurrence relation for the worst-case time complexity:
This recurrence relation is in the form of T(n) = aT(n/b) + O(n^k) where a ≥ 1 and b > 1. We can apply the master theorem! There are three cases for the solution via the master theorem:
If we compare the recurrence relation of binary search and the master theorem:
The above recurrence satisfies the second case of the master theorem. So, the time complexity T(n) = O(n^k * logn) = O(n^0 * logn) = O(logn). Note: You can explore analysis of recursion blog to learn more about analysis using the master theorem.
The space complexity of the binary search algorithm depends on its implementation. In iterative approach, we use constant extra space, so the space complexity is O(1). However, in the recursive method, the space complexity depends on the size of the recursion call stack, which depends on the height of the recursion tree.
The height of the recursion tree is logn + 1 because the input size is decreasing by a factor of 1/2. So the space complexity of the recursive binary search algorithm is O(log n).
Here is a note by Jon Bentley from the book Programming Pearls:
I’ve assigned binary search in courses at Bell Labs and IBM. Professional programmers had a couple of hours to convert its description into a program in the language of their choice; high-level pseudocode was fine. At the end of the specified time, almost all the programmers reported that they had the correct code for the task. We would then take thirty minutes to examine their code, which the programmers did with test cases. In several classes and with over a hundred programmers, the results varied little: ninety percent of the programmers found bugs in their programs (and I wasn’t always convinced of the correctness of the code in which no bugs were found). I was amazed: given ample time, only about ten percent of professional programmers were able to get this small program right.
If you have any queries/doubts/feedback, please write us at contact@enjoyalgorithms.com. Enjoy learning, Enjoy algorithms!