Binary Search Algorithm

Difficulty: Easy, Asked-in: Google, Amazon, Adobe, Oracle, Qualcomm, SAP Labs

Let's understand the problem

Given an array X of n integers sorted in ascending order and an integer key, write a program to search for the key in X. If the key exists, then return its index. Otherwise, return -1.

Example 1

Input: X[] = [-4, 2, 4, 5, 9, 12], key = 5, Output: 3

Explanation: 5 exists in X[] at index 3. So we return 3.

Example 2

Input: X[] = [-4, 2, 4, 5, 9, 12], key = 6, Output: -1

Explanation: 6 does not exist in X[]. So we return -1.

Let's understand the solution idea with

Imagine a game where the computer selects a number between 1 and 16, and we need to find this number with a minimum number of guesses. For each guess, the computer will tell us whether the guessed number is equal to, greater than, or less than the actual number.

A linear guess of all the continuous values 1, 2, . . . 16 would be inefficient because, with each question, we eliminate only one number. In the worst-case scenario, we need to guess 16 times! Can we find a better solution and discover the number with a minimum number of guesses?

Now, the critical question is: What would be the best guess at the start? Since the computer provides us with comparative insights about the guessed number and the actual number, the best first guess would be to choose the middle number (8).

If the actual number is equal to 8, the computer will return true, and we are done.
If the actual number is less than 8, then the computer will tell us that the actual number is less than 8. So we can ignore the range of numbers from 9 to 16 in our next guess. We repeat the same process for the numbers 1 to 8 and select the middle number 4, and so on.
If the actual number is greater than 8, then the computer will tell us that the actual number is greater than 8. So we ignore the range of numbers from 1 to 7 in our next guess. We repeat the same process for the numbers 9 to 16 and select the middle number 12, and so on.

After every guess, we are rejecting half of the given numbers in one go. In the worst case, we need four comparisons to identify the actual number. This is a significant improvement!

Binary search algorithm with example

Divide and conquer idea of binary search

Now, based on the above idea, can we take advantage of the sorted input order to find the target value efficiently? One insight is simple: if we pick any number "x" in the sorted sequence, all the numbers on the left will be less than "x", and all the numbers on the right will be greater than "x".

So based on this idea, we can design a simple divide and conquer strategy where we compare the target value key with the value at the mid index. If the middle element is equal to the key, then we are done. Otherwise, based on the comparison, we search for the key in the left half or right half.

If key is equal to X[mid], we have found the target value and return the mid index.
If key is less than X[mid], we search for the key in the left subarray.
If key is greater than X[mid], we search for the key in the right subarray.

Similarly, we keep searching until the key is found (successful search) or the subarray size is reduced to 0 (unsuccessful search). Here, we are solving the searching problem of input size n by using the searching problem of input size n/2 (either the left half or the right half). The core idea is simple: at each stage of the process, we continuously reduce the search interval by half. Think!

Recursive implementation of binary search

Suppose we use a function binarySearch(X[], l, r, key) to search for the given key in the sorted array. Here, l and r are the indices of the left and right ends of the subarray. We start with l = 0 and r = n - 1.

Divide part

We calculate the middle index as mid = l + (r - l)/2.
If (X[mid] == key), we return the value of mid.

Note: Why did we not use the equation (l + r)/2 to calculate the mid-index? Here is the reason: Practically, for large values of the left and right index, the value of (l + r) may exceed the range of integers in programming, even if l and r are within the range. This can result in an integer overflow for very large arrays. To solve this problem, we can use the equation mid = l + (r - l)/2 to fix the integer overflow error. For a better understanding, follow this wonderful reference.

Conquer part

If (X[mid] > key): The key must not be present in the right half. So we recursively search for the key in the left half, i.e. binarySearch(X, l, mid - 1, key).
If (X[mid] < key): The key must not be present in the left half. So we recursively search for the key in the right half, i.e. binarySearch(X, mid + 1, r, key).
Similarly, on the basis of comparison with the middle value, we continue searching recursively on either the left or right half until we find the key or reach the base case.

Combine part

This is a trivial step because after comparing the middle element, one sub-problem solution (either the left or right half) will return the index or return -1 . There is no need to combine the solutions to the sub-problems.

Base case

The base case would be the scenario when the left index crosses the right index or the subarray size shrinks to zero, i.e. if (l > r), we return -1. This is the case of an unsuccessful search. In other words, this would be the last stage of the recursion or the smallest version of the sub-problem.

Binary search algorithm visualisation

C++ implementation code

int binarySearch(int X[], int l, int r, int key) 
{
    if (l > r)
        return -1;
    else 
    {
        int mid = l + (r - l) / 2;
        
        if (X[mid] == key) 
            return mid;
            
        if (X[mid] > key)
            return binarySearch(X, l, mid - 1, key);
        else
            return binarySearch(X, mid + 1, r, key);
    }
}

Python implementation code

def binarySearch(X, l, r, key):
    if l > r:
        return -1
    else:
        mid = l + (r - l) // 2
        if X[mid] == key:
            return mid
        elif X[mid] > key:
            return binarySearch(X, l, mid - 1, key)
        else:
            return binarySearch(X, mid + 1, r, key)

Iterative implementation of binary search

Binary search can be easy to visualize using recursion. The critical question is: Can we implement this using iteration or a loop? Let’s think! If we observe closely, only two parameters get updated during every recursive call: the left and right ends of the search subarray.

So we need to find a way to update the left and right ends of the current subarray using a loop. Here's an idea:

If (X[mid] > key): We need to search for the key in the left half of the array. For the left array, the left end would be the same, but the right end would be mid - 1.
If (X[mid] < key): We need to search for the key in the right half of the array. For the right array, the right end would be the same, but the left end would be mid + 1.
Similarly, after comparison with the mid-value, we continue searching iteratively on either the left or right half of the array, again finding the middle element and proceeding as before.
The loop will stop if we either find the key (successful search) or if the size of the subarray shrinks to zero (l > r), which is a case of an unsuccessful search. In other words, the loop will continue until (l <= r).

C++ implementation code

int binarySearch(int X[], int l, int r, int key) 
{ 
    while (l <= r) 
    { 
        int mid = l + (r - l) / 2;
        if (X[mid] > key) 
            r = mid - 1;
        if (X[mid] < key) 
            l = mid + 1;
        else
            return mid;
    }
    return -1;
}

Python implementation code

def binarySearch(X, l, r, key):
    while l <= r:
        mid = l + (r - l) // 2
        if X[mid] > key:
            r = mid - 1
        elif X[mid] < key:
            l = mid + 1
        else:
            return mid
    
    return -1

Time complexity analysis

After each comparison, the input size decreases by half. Initially, we have n elements, then we have n/2 elements after the 1st comparison, n/4 elements after the 2nd comparison, and so on. The worst-case situation will occur when we reach the base case (unsuccessful search), i.e., n -> n/2 -> n/4 -> ... 1 -> unsuccessful search.

Suppose we reach the base case after k number of steps => n/2^k = 1 => n = 2^k => k = log2n. In simple words, after log2n number of steps, the algorithm will reach its base case.

At each step of the recursion, we perform O(1) operations. So the worst-case time complexity of the binary search is log2n * O(1) = O(logn).

Time complexity analysis of binary search using recursion tree method

Let's understand the above analysis from different perspective

For a clear picture, let's assume for the moment that the size of the array is a power of 2, i.e., n = 2^k => k = log2n. Now, when we compare the middle element each time, we cut the size of the subarrays by half.

Initially, the subarray size is 2^k. After the 1st step, the subarray size will be 2^(k-1) and after the second step, the subarray size will be 2^(k-2), and so on. After the k or log2n number of steps, we will reach a subarray of size 1. Now, in the next step, we will reach the base case.

So altogether, we can have at most k+1 or log2n+1 number of steps in the worst case. Within each step, we perform a constant amount of work: calculating the mid-point and a comparison operation.

Overall, when given the array size n, we perform c(log2n + 1) operations in the worst case. So the worst-case time complexity of the binary search is O(logn).

Time complexity analysis using master theorem

Let's assume that T(n) is the worst-case time complexity of binary search for n elements. When n > 0, we can break down the time complexities as follows:

Divide part: The time complexity of this part is O(1) because we only calculate the middle index of the array, which takes constant time.
Conquer part: We are recursively solving one sub-problem of size n/2. So the overall time complexity of the conquer part is T(n/2).
Combine part: As mentioned above, this part is trivial. So the time complexity of the combine part is O(1).

To calculate T(n), we need to add the time complexities of the divide, conquer, and combine parts: T(n) = O(1) + T(n/2) + O(1) = T(n/2) + c. Here is the recurrence relation for the worst-case time complexity:

T(n) = c, if n = 1
T(n) = T(n/2) + c, if n > 1

This recurrence relation is in the form of T(n) = aT(n/b) + O(n^k) where a ≥ 1 and b > 1. We can apply the master theorem! There are three cases for the solution via the master theorem:

If f(n) = O(n^k) where k < logb(a), then T(n) = O(n^logb(a)).
If f(n) = O(n^k) where k = logb(a), then T(n) = O(n^k * logn).
If f(n) = O(n^k) where k > logb(a), then T(n) = O(f(n)).

If we compare the recurrence relation of binary search and the master theorem:

a = 1, b = 2 where a ≥ 1 and b > 1.
f(n) = c = cn^0 = O(n^0) => k = 0.
Similarly, logb(a) = log2(1) = 0. Hence, logb(a) = k = 0.

The above recurrence satisfies the second case of the master theorem. So, the time complexity T(n) = O(n^k * logn) = O(n^0 * logn) = O(logn). Note: You can explore analysis of recursion blog to learn more about analysis using the master theorem.

Space complexity analysis

The space complexity of the binary search algorithm depends on its implementation. In iterative approach, we use constant extra space, so the space complexity is O(1). However, in the recursive method, the space complexity depends on the size of the recursion call stack, which depends on the height of the recursion tree.

The height of the recursion tree is logn + 1 because the input size is decreasing by a factor of 1/2. So the space complexity of the recursive binary search algorithm is O(log n).

An interesting story from history!

Here is a note by Jon Bentley from the book Programming Pearls:

I’ve assigned binary search in courses at Bell Labs and IBM. Professional programmers had a couple of hours to convert its description into a program in the language of their choice; high-level pseudocode was fine. At the end of the specified time, almost all the programmers reported that they had the correct code for the task. We would then take thirty minutes to examine their code, which the programmers did with test cases. In several classes and with over a hundred programmers, the results varied little: ninety percent of the programmers found bugs in their programs (and I wasn’t always convinced of the correctness of the code in which no bugs were found). I was amazed: given ample time, only about ten percent of professional programmers were able to get this small program right.

Critical ideas to think about!

We can use the binary search idea to solve several coding problems where the array has some order property similar to the sorted array. For example, for a given value, how can we modify the binary search to find the count of smaller elements, the value of the next-smallest element, all nearest neighbours, etc.?
Even if duplicate elements are in the array, the binary search may return an index that equals the target value. However, it does not always return the first or last occurrence of the element. Can we modify the binary search algorithm to find the first or last occurrence of the element? Think!
The binary search may not work efficiently on sorted arrays when insertion and deletion frequently occur while searching. The insertion and deletion operations will take O(n) time in the case of a sorted array. Think!
Exponential search is a variation of the binary search algorithm for an unbounded array, i.e., an array for which the rightmost boundary is unknown. It can also work perfectly on a bounded array, but its efficiency is better than binary search only if the target value lies near the start of the array. Think!
In a BST, elements are arranged in sorted order of the tree structure, and each data can be searched using an idea similar to binary search. Insertion and deletion work perfectly, requiring an average O(logn) time, faster than the insertion and deletion of sorted arrays.

Binary search-based problems to practice

max element in an array which is first increasing and then decreasing
First and last positions of an element in a sorted array
Find the row with the maximum number of 1s
Median of two sorted arrays of the equal size
Search in a row-wise sorted 2D matrix
Find the square root of an integer
Find the element that appears once in a sorted array
Find the missing number in arithmetic progression

If you have any queries/doubts/feedback, please write us at contact@enjoyalgorithms.com. Enjoy learning, Enjoy algorithms!

Binary Search Algorithm

Let's understand the problem

Let's understand the solution idea with

Divide and conquer idea of binary search

Recursive implementation of binary search

Divide part

Conquer part

Combine part

Base case

C++ implementation code

Python implementation code

Iterative implementation of binary search

C++ implementation code

Python implementation code

Time complexity analysis

Let's understand the above analysis from different perspective

Time complexity analysis using master theorem

Space complexity analysis

An interesting story from history!

Critical ideas to think about!

Binary search-based problems to practice

More from EnjoyAlgorithms

Self-paced Courses and Blogs