Majority Element

Difficulty: Medium, Asked-in: Microsoft, Google, Amazon, Yahoo.

Key Takeaways

An excellent problem to learn problem-solving using various approaches and step-by-step time and space complexity optimization.
The Boyer-Moore Voting algorithm is worth exploring, which uses fascinating insight from the problem to get the solution in O(n) time and O(1) space.

Let’s understand the problem

You are given an array X[] consisting of n elements, write a program to find the majority element in an array i.e. return the number which appears more than n/2 times.

Assume that array is non-empty and the majority element always exists.
A majority element appears more than n/2 times, so there is at most one such element.

Examples

Input: X[] = [2, 12, 2, 2, 2, 4, 2], Output: 2

Explanation: 2 is the majority element which appears 5 times.

Input: A[]= [3, 3, 4, 2, 4, 4, 2, 4, 4], Output: 4

Explanation: The frequency of 4 is 5 which is greater than half of the size of the array.

Input: X[] = [4, 3, 4], Output: 4

Input: X[] = [1, 1, 1, 1, 1, 1], Output: 1

Discussed solution approaches

Brute force approach using nested loops
Using sorting
Using divide and conquer approach
Using hash table
Using bit manipulation approach
Using randomized algorithm
An efficient single loop solution using the Boyer-Moore voting algorithm

Brute force approach using nested loops

Solution idea

A basic approach would be to check whether each element is the majority element or not. We can use two nested loops, where the outer loop iterates over the array, and the inner loop counts the occurrences of each element. If we find an element with a frequency greater than n/2, that is our majority element.

Solution pseudocode

int majorityElement(int X[], int n)
{
    for (int i = 0; i < n; i = i + 1)
    {
        int majorityCount = 0
        
        for (int j = 0; j < n; j = j + 1)
        {
            if (X[i] == X[j])
                majorityCount =  majorityCount + 1
        }
        
        if (majorityCount > n/2)
            return X[i]
    }
}

Solution analysis

We are running two nested loops and doing constant operations at each iteration. So time complexity = O(n²)*O(1) = O(n²). We are using constant extra space, so space complexity = O(1). Can we further improve the time complexity? Let's think!

Using sorting

Solution idea

Given that the majority element always exists in the array, we can sort the array in increasing order. By doing so, the majority element will be grouped in adjacent indices. Can we directly access the majority element in the sorted array? Let's think!

If we divide the sorted array into two halves, each part contains n/2 elements. Since the majority element appears more than n/2 times, it must be present at the middle index. This holds true for both odd and even values of n.

Solution pseudocode

int findMajorityElement(int X[], int n) 
{
    sort(X, n)
    return X[n/2]
}

Solution analysis

Suppose we are using an efficient O(nlogn) sorting algorithm like heap sort or merge sort. Time complexity = Time complexity of sorting + Time complexity of accessing the middle element = O(nlogn) + O(1) = O(nlogn).

Heap sort is an in-place sorting algorithm. So space complexity is O(1) if we use heap sort. In the case of merge sort, space complexity = O(n).

Using divide and conquer approach

Solution idea

Can we solve this problem using recursion or divide and conquer approach? Here is an insight: If we divide the array into two equal halves and recursively find the majority element of the left and right halves, we can easily determine the overall majority element in linear time. Note: This idea will only work if we assume that the majority element does not always exist. Why? Think about that! So here, if the majority element does not exist, we will return -1.

Solution steps

We define a function majorityElement(int X[], int l, int r) which takes the left and right ends of the array as parameters. The initial value of l is 0 and r is n - 1.

Divide step: We divide the array into two equal halves by calculating the mid-value, i.e., mid = l + (r - l)/2.

Conquer step: We recursively calculate the majority element of the left and right halves and store them in variables leftMajority and rightMajority.

leftMajority = majorityElement(X, l, mid)
rightMajority = majorityElement(X, mid + 1, r)

Combine step: If the majority elements of the left and right halves are equal, then it must be the global majority element, and we return this value. Otherwise, we need to determine which one is the majority. To do this, we count the frequency of leftMajority and rightMajority in the input array.

If count of leftMajority > (r - l + 1)/2, then we will return leftMajority as the majority element.
Otherwise, if count of rightMajority > (r - l + 1)/2, we will return rightMajority as the majority element.
If neither of these conditions is true, there will be no majority element in the array, and we will return -1.

if (leftMajority == rightMajority)
    return leftMajority

int leftCount = countFrequency(X, l, r, leftMajority)
int rightCount = countFrequency(X, l, r, rightMajority)

if (leftCount > (r - l + 1) / 2)
    return leftMajority
else if (rightCount > (r - l + 1) / 2)
    return rightMajority
else
    return -1; // No majority element in this subarray

Base case: This is the trivial case of a single-element array when both left and right ends are equal. We return this single element as the majority element, i.e., if (l == r), return X[l] or X[r].

Solution pseudocode

int countFrequency(int X[], int l, int r, int value)
{
    int count = 0
    for (int i = l; i <= r; i = i + 1)
    {
        if (X[i] == value)
            count = count + 1
    }
    return count
}

int majorityElement(int X[], int l, int r)
{
    if (l == r)
        return X[l]

    int mid = l + (r - l) / 2
    int leftMajority = majorityElement(X, l, mid)
    int rightMajority = majorityElement(X, mid + 1, r)

    if (leftMajority == rightMajority)
        return leftMajority

    int leftCount = countFrequency(X, l, r, leftMajority)
    int rightCount = countFrequency(X, l, r, rightMajority)

    if (leftCount > (r - l + 1) / 2)
        return leftMajority
    else if (rightCount > (r - l + 1) / 2)
        return rightMajority
    else
        return -1
}

int findMajorityElement(int X[], int n)
{
    return majorityElement(X, 0, n - 1)
}

Solution analysis

We are solving problem size n by recursively solving two smaller sub-problem of size n/2 and combing the solution of these two smaller problems by counting frequency of leftMajority and rightMajority.

Time complexity of divide part = O(1).
Time complexity of conquer part = 2T(n/2).
Time complexity of combine part = Time complexity of counting the frequency of leftMajority + Time complexity of counting the frequency of rightMajority = O(n) + O(n) = O(n).
Overall time complexity T(n) = O(1) + 2T(n/2) + O(n) = 2T(n/2) + O(n).

So time complexity can be represented by the following recurrence relation:

T(n) = 2T(n/2) + O(n), if n > 1
T(n) = O(1), if n = 1

We can use the master theorem or recursion tree method to solve this recurrence relation. If we observe closely, this is similar to merge sort recurrence. So time complexity = O(nlogn). To better understand this analysis, explore the analysis of recursion blog.

The divide and conquer method does not explicitly allocate additional memory, but it uses additional memory for call stack. The call stack size depends on the max depth of the recursion tree, which is O(logn). The idea is simple: Input size is decreasing by a factor of 2 at each recursive call. So space complexity = O(logn).

Using hash table

Solution idea

In the brute force approach, we count the frequency of each element using an inner loop. Can we remove this inner loop and count the frequency in O(1)? Yes, we can do that! The idea would be to scan the array and use a hash table to store the frequency of each distinct element. The hash table helps us efficiently search, insert, and delete elements in O(1) on average.

Solution steps

We create a hash table to store the frequency for each distinct element, representing it as a key-value pair, i.e., (element, frequency) pair. Initially, we initialize all the slots in the hash table with 0.
Now, we traverse the array from i = 0 to n - 1. At each iteration, we increment the frequency count of element X[i] by 1 and check if the frequency count of X[i] is greater than n/2. If it is, we return X[i] as the majority element.

Solution pseudocode

int majorityElement(int X[], int n)
{
    HashMap<int, int> H
    for (int i = 0; i < n; i = i + 1)
    {
        H[X[i]] = H[X[i]] + 1
        
        if (H[X[i]] > n/2)
            return X[i]
    }
}

Solution analysis

We are traversing the array once and performing constant time hash table operations on each iteration. So, time complexity = n*O(1) = O(n). Space complexity = O(n) for storing elements in the hash table.

Using bit manipulation approach

Solution idea

As given in the problem, we know that there is a majority element in the array. Can we use mathematical insights from the binary representation of the numbers to find the majority element? Let's think. Suppose each element is a 32-bit integer. Now, for all numbers, we count the frequency of set bits (1's) at each bit position in the bitwise representation (It's like polling for each bit position).

At any position in the bitwise representation:

If the frequency of 1's is in the majority (greater than n/2), then the bit value at the same position in the majority element will be 1.
If the frequency of 1's is not in the majority (less than n/2), then the bit value at the same position in the majority element will be 0.

For example, X[ ] = [6, 13, 13, 2, 13], n = 5.

6  ->      0 1 1 0

13 ->      1 1 0 1

13 ->      1 1 0 1

2  ->      0 0 1 0

13 ->      1 1 0 1
-----------------------
Majority = 1 1 0 1 = 13

In the above example, 1's are in the majority at the 1st, 2nd, and 4th positions, so the set bits at the same positions in the majority element will be 1. At the 3rd position, the number of 1's across all array elements is only 2, which is not a majority. So, the bit value at the 3rd position in the majority element will be 0. The idea is simple: By identifying the majority bits at every bit position for all the numbers, we can construct the majority element bitwise. How can we implement it? Let's think!

Solution steps

Step 1: We initialize a variable to store the majority element, i.e., majority = 0. Initially, all bits in majorityElement would be 0.

Step 2: Now we run a loop from currBit = 0 to 31, where the loop variable currBit represents the current bit position of a 32-bit integer.

For each bit position, we initialize a variable to count the frequency of 1's, i.e., countOnes = 0.
Now we iterate over the entire array and increase countOnes by 1 if the current bit position is 1 in an element.

for(i = 0; i < n; i = i + 1)
{
    if((X[i] & (1 << currBit)) != 0)
        countOnes = countOnes + 1
}

By the end of the loop, if the countOnes value is greater than n/2, we set the current bit position as 1 in the variable majority. Otherwise, we leave it because the bit at that position is already set to 0 in the majority.

if(countOnes > n/2)
    majority = majority + (1 << currBit)

Step 3: By the end of loop, we return value stored in majorityElement.

Solution pseudocode

int majorityElement(int X[], int n)
{
    int majority = 0
    for (int currBit = 0; currBit < 32; currBit = currBit + 1)
    {
        int countOnes = 0
        for (int i = 0; i < n; i = i + 1)
        {
            if ((X[i] & (1 << currBit)) != 0)
                countOnes = countOnes + 1
        }

        if (countOnes > n / 2)
            majority = majority + (1 << currBit)
    }
    
    return majority
}

Solution analysis

We are running two nested loops where the inner loop is running n times and the outer loop is running 32 times. At each iteration of the nested loops, we perform a constant number of operations. So time complexity = 32*O(n)*O(1) = O(n). Space complexity = O(1), as we are using a constant number of variables only.

Using a randomized algorithm

Solution idea

In the given problem, more than n/2 elements are equal to the majority element. So, if we choose some random element from the input, then there is more than a 50% probability that the chosen element will be the majority element. If we perform this trial more and more, the chances of success will be higher.

So, one basic idea would be to select a random element from the array and check whether it is the majority element or not. If it is, we return that element. Otherwise, we repeat the process. There are some critical questions: What is the exact probability that the algorithm will find the majority element? What is the time complexity of the algorithm? We will explore the answers to these questions in the analysis section.

Solution pseudocode

int countFrequency(int X[], int n, int value)
{
    int count = 0
    for (int i = 0; i < n; i = i + 1)
    {
        if (X[i] == value)
            count = count + 1
    }
    return count
}

int majorityElement(int X[], int n)
{
    while (true)
    {
        int randomElement = X[randRange(0, n - 1)]
        int k = countFrequency(X, n, randomElement)
        
        if (k > n / 2)
            return randomElement
    }
}

Solution analysis

There are n choices for choosing an element and out of these choices, the chance of success is atleast n/2 + 1. Suppose every choice is equally likely, then the probability of success is (n/2 + 1)/n = 1/2 + 1/n > 1/2 (for n > 0). In other words, the probability of failure is at most 1/2.

To reduce the probability of failure, each time we make independent random choices and repeat the algorithm if we don't succeed. If we repeat the algorithm 10 times, then the probability of failure is at most (1/2)^10.
If we repeat the algorithm logn times, the probability of failure is at most (1/2)^logn = 1/(n^log 2) = 1/n. In other words, the algorithm succeeds with a high probability of at least (1 – 1/n). If n is very large, this algorithm can produce the majority element in O(nlogn) time with a high probability of success. Note: The time complexity of each trial is O(n). If we repeat the trial logn times, the time complexity is n*O(logn) = O(nlogn).
If we repeat the algorithm n times, the probability of failure is at most (1/2)^n = 1/(2^n). In other words, the algorithm succeeds with a high probability, which is at least 1 – 1/(2^n).

Efficient approach using Boyer-Moore Voting algorithm

Solution idea

The intuition behind Boyer-Moore Voting algorithm: Since the majority element appears more than n/2 times, its frequency is greater than the combined frequency of all the other elements. So, if we mark the occurrence of the majority element as +1 and the occurrence of any other element as -1, then the overall sum of these markings would definitely be greater than zero.

Here is another interesting analogy to understand this algorithm: Suppose there are n people, each holding one element of the array. Whenever two people meet who do not hold the same array element as the other, they sit down. Eventually, if anyone is left standing, that person is holding the majority element. Note: As long as the majority element occurs with a frequency more than n/2, we can guarantee that this approach will always find the majority element.

Solution steps

To keep track of our current guess of the majority element, we declare a variable majorityCandidate and maintain a counter variable count. Initially, the value of both variables is 0.
Let's walk across the array. If the current element matches our guess, we increment the count by 1. If the current element doesn't match our guess, we decrement the count by 1.
Now during this process, if count == 0, we reset the current guess and make it equal to the current element, i.e., majorityCandidate = X[i]. In other words, we forget about everything up to the previous index and consider the current element as a potential candidate for the majority element.
By the end of the loop, the value of the majority element gets stored in the variable majorityCandidate.

There are many ways to think about why this algorithm works. One good intuition is to think of this algorithm as breaking the input into a chunk of consecutive copies of a particular value.

Incrementing the counter corresponds to marking multiple copies of the same value.
Decrementing the counter corresponds to some other sequences of values, "cancelling out" the accumulation of values of a particular type.

Solution pseudocode

int majorityElement(int X[], int n) 
{
    int count = 0
    int majorityCandidate = 0
    
    for (int i = 0; i < n; i = i + 1) 
    {
        if (count == 0) 
            majorityCandidate = X[i]
            
        if (X[i] == majorityCandidate) 
            count = count + 1
        else 
            count = count - 1
    }
    
    return majorityCandidate
}

Solution analysis

We are running a single loop n time and performing a constant operation at each step of the iteration. Time complexity = O(n). Space complexity = O(1), because we are using constant extra space.

Critical ideas to think!

How do we modify the solution approaches if the majority element does not always exist in the array?
Can we solve this problem using binary search tree? What will be the time and space complexity?
Prove the correctness of the Boyer-Moore Majority Vote Algorithm.
What is the best and worst-case space complexity in the 4th approach?

Comparisons of time and space complexities

Two nested loops: Time = O(n^2), Space = O(1).
Using sorting: Time = O(nlogn), Space = O(1).
Divide and conquer: Time = O(nlogn), Space = O(1).
Hash table: Time = O(n), Space = O(n).
Bit manipulation: Time = O(n), Space= O(1).
Randomized algorithm: Time = O(nlogn), with a probability of failure 1/n if n is very large, Space = O(1).
Boyer-Moore voting algorithm: Time = O(n), Space = O(1).

Find Majority Element in an Array

Let’s understand the problem

Examples

Discussed solution approaches

Brute force approach using nested loops

Solution idea

Solution pseudocode

Solution analysis

Using sorting

Solution idea

Solution pseudocode

Solution analysis

Using divide and conquer approach

Solution idea

Solution steps

Solution pseudocode

Solution analysis

Using hash table

Solution idea

Solution steps

Solution pseudocode

Solution analysis

Using bit manipulation approach

Solution idea

Solution steps

Solution pseudocode

Solution analysis

Using a randomized algorithm

Solution idea

Solution pseudocode

Solution analysis

Efficient approach using Boyer-Moore Voting algorithm

Solution idea

Solution steps

Solution pseudocode

Solution analysis

Critical ideas to think!

Comparisons of time and space complexities

Suggested coding questions to practice

More from EnjoyAlgorithms

Self-paced Courses and Blogs