Difficulty: Medium, Asked-in: Microsoft, Google, Amazon, Yahoo.
Key Takeaways
You are given an array X[] consisting of n elements, write a program to find the majority element in an array i.e. return the number which appears more than n/2 times.
Input: X[] = [2, 12, 2, 2, 2, 4, 2], Output: 2
Explanation: 2 is the majority element which appears 5 times.
Input: A[]= [3, 3, 4, 2, 4, 4, 2, 4, 4], Output: 4
Explanation: The frequency of 4 is 5 which is greater than half of the size of the array.
Input: X[] = [4, 3, 4], Output: 4
Input: X[] = [1, 1, 1, 1, 1, 1], Output: 1
A basic approach would be to check whether each element is the majority element or not. We can use two nested loops, where the outer loop iterates over the array, and the inner loop counts the occurrences of each element. If we find an element with a frequency greater than n/2, that is our majority element.
int majorityElement(int X[], int n)
{
for (int i = 0; i < n; i = i + 1)
{
int majorityCount = 0
for (int j = 0; j < n; j = j + 1)
{
if (X[i] == X[j])
majorityCount = majorityCount + 1
}
if (majorityCount > n/2)
return X[i]
}
}
We are running two nested loops and doing constant operations at each iteration. So time complexity = O(n²)*O(1) = O(n²). We are using constant extra space, so space complexity = O(1). Can we further improve the time complexity? Let's think!
Given that the majority element always exists in the array, we can sort the array in increasing order. By doing so, the majority element will be grouped in adjacent indices. Can we directly access the majority element in the sorted array? Let's think!
If we divide the sorted array into two halves, each part contains n/2 elements. Since the majority element appears more than n/2 times, it must be present at the middle index. This holds true for both odd and even values of n.
int findMajorityElement(int X[], int n)
{
sort(X, n)
return X[n/2]
}
Suppose we are using an efficient O(nlogn) sorting algorithm like heap sort or merge sort. Time complexity = Time complexity of sorting + Time complexity of accessing the middle element = O(nlogn) + O(1) = O(nlogn).
Heap sort is an in-place sorting algorithm. So space complexity is O(1) if we use heap sort. In the case of merge sort, space complexity = O(n).
Can we solve this problem using recursion or divide and conquer approach? Here is an insight: If we divide the array into two equal halves and recursively find the majority element of the left and right halves, we can easily determine the overall majority element in linear time. Note: This idea will only work if we assume that the majority element does not always exist. Why? Think about that! So here, if the majority element does not exist, we will return -1.
We define a function majorityElement(int X[], int l, int r) which takes the left and right ends of the array as parameters. The initial value of l is 0 and r is n - 1.
Divide step: We divide the array into two equal halves by calculating the mid-value, i.e., mid = l + (r - l)/2.
Conquer step: We recursively calculate the majority element of the left and right halves and store them in variables leftMajority and rightMajority.
Combine step: If the majority elements of the left and right halves are equal, then it must be the global majority element, and we return this value. Otherwise, we need to determine which one is the majority. To do this, we count the frequency of leftMajority and rightMajority in the input array.
if (leftMajority == rightMajority)
return leftMajority
int leftCount = countFrequency(X, l, r, leftMajority)
int rightCount = countFrequency(X, l, r, rightMajority)
if (leftCount > (r - l + 1) / 2)
return leftMajority
else if (rightCount > (r - l + 1) / 2)
return rightMajority
else
return -1; // No majority element in this subarray
Base case: This is the trivial case of a single-element array when both left and right ends are equal. We return this single element as the majority element, i.e., if (l == r), return X[l] or X[r].
int countFrequency(int X[], int l, int r, int value)
{
int count = 0
for (int i = l; i <= r; i = i + 1)
{
if (X[i] == value)
count = count + 1
}
return count
}
int majorityElement(int X[], int l, int r)
{
if (l == r)
return X[l]
int mid = l + (r - l) / 2
int leftMajority = majorityElement(X, l, mid)
int rightMajority = majorityElement(X, mid + 1, r)
if (leftMajority == rightMajority)
return leftMajority
int leftCount = countFrequency(X, l, r, leftMajority)
int rightCount = countFrequency(X, l, r, rightMajority)
if (leftCount > (r - l + 1) / 2)
return leftMajority
else if (rightCount > (r - l + 1) / 2)
return rightMajority
else
return -1
}
int findMajorityElement(int X[], int n)
{
return majorityElement(X, 0, n - 1)
}
We are solving problem size n by recursively solving two smaller sub-problem of size n/2 and combing the solution of these two smaller problems by counting frequency of leftMajority and rightMajority.
So time complexity can be represented by the following recurrence relation:
We can use the master theorem or recursion tree method to solve this recurrence relation. If we observe closely, this is similar to merge sort recurrence. So time complexity = O(nlogn). To better understand this analysis, explore the analysis of recursion blog.
The divide and conquer method does not explicitly allocate additional memory, but it uses additional memory for call stack. The call stack size depends on the max depth of the recursion tree, which is O(logn). The idea is simple: Input size is decreasing by a factor of 2 at each recursive call. So space complexity = O(logn).
In the brute force approach, we count the frequency of each element using an inner loop. Can we remove this inner loop and count the frequency in O(1)? Yes, we can do that! The idea would be to scan the array and use a hash table to store the frequency of each distinct element. The hash table helps us efficiently search, insert, and delete elements in O(1) on average.
int majorityElement(int X[], int n)
{
HashMap<int, int> H
for (int i = 0; i < n; i = i + 1)
{
H[X[i]] = H[X[i]] + 1
if (H[X[i]] > n/2)
return X[i]
}
}
We are traversing the array once and performing constant time hash table operations on each iteration. So, time complexity = n*O(1) = O(n). Space complexity = O(n) for storing elements in the hash table.
As given in the problem, we know that there is a majority element in the array. Can we use mathematical insights from the binary representation of the numbers to find the majority element? Let's think. Suppose each element is a 32-bit integer. Now, for all numbers, we count the frequency of set bits (1's) at each bit position in the bitwise representation (It's like polling for each bit position).
At any position in the bitwise representation:
For example, X[ ] = [6, 13, 13, 2, 13], n = 5.
6 -> 0 1 1 0
13 -> 1 1 0 1
13 -> 1 1 0 1
2 -> 0 0 1 0
13 -> 1 1 0 1
-----------------------
Majority = 1 1 0 1 = 13
In the above example, 1's are in the majority at the 1st, 2nd, and 4th positions, so the set bits at the same positions in the majority element will be 1. At the 3rd position, the number of 1's across all array elements is only 2, which is not a majority. So, the bit value at the 3rd position in the majority element will be 0. The idea is simple: By identifying the majority bits at every bit position for all the numbers, we can construct the majority element bitwise. How can we implement it? Let's think!
Step 1: We initialize a variable to store the majority element, i.e., majority = 0. Initially, all bits in majorityElement would be 0.
Step 2: Now we run a loop from currBit = 0 to 31, where the loop variable currBit represents the current bit position of a 32-bit integer.
for(i = 0; i < n; i = i + 1)
{
if((X[i] & (1 << currBit)) != 0)
countOnes = countOnes + 1
}
if(countOnes > n/2)
majority = majority + (1 << currBit)
Step 3: By the end of loop, we return value stored in majorityElement.
int majorityElement(int X[], int n)
{
int majority = 0
for (int currBit = 0; currBit < 32; currBit = currBit + 1)
{
int countOnes = 0
for (int i = 0; i < n; i = i + 1)
{
if ((X[i] & (1 << currBit)) != 0)
countOnes = countOnes + 1
}
if (countOnes > n / 2)
majority = majority + (1 << currBit)
}
return majority
}
We are running two nested loops where the inner loop is running n times and the outer loop is running 32 times. At each iteration of the nested loops, we perform a constant number of operations. So time complexity = 32*O(n)*O(1) = O(n). Space complexity = O(1), as we are using a constant number of variables only.
In the given problem, more than n/2 elements are equal to the majority element. So, if we choose some random element from the input, then there is more than a 50% probability that the chosen element will be the majority element. If we perform this trial more and more, the chances of success will be higher.
So, one basic idea would be to select a random element from the array and check whether it is the majority element or not. If it is, we return that element. Otherwise, we repeat the process. There are some critical questions: What is the exact probability that the algorithm will find the majority element? What is the time complexity of the algorithm? We will explore the answers to these questions in the analysis section.
int countFrequency(int X[], int n, int value)
{
int count = 0
for (int i = 0; i < n; i = i + 1)
{
if (X[i] == value)
count = count + 1
}
return count
}
int majorityElement(int X[], int n)
{
while (true)
{
int randomElement = X[randRange(0, n - 1)]
int k = countFrequency(X, n, randomElement)
if (k > n / 2)
return randomElement
}
}
There are n choices for choosing an element and out of these choices, the chance of success is atleast n/2 + 1. Suppose every choice is equally likely, then the probability of success is (n/2 + 1)/n = 1/2 + 1/n > 1/2 (for n > 0). In other words, the probability of failure is at most 1/2.
The intuition behind Boyer-Moore Voting algorithm: Since the majority element appears more than n/2 times, its frequency is greater than the combined frequency of all the other elements. So, if we mark the occurrence of the majority element as +1 and the occurrence of any other element as -1, then the overall sum of these markings would definitely be greater than zero.
Here is another interesting analogy to understand this algorithm: Suppose there are n people, each holding one element of the array. Whenever two people meet who do not hold the same array element as the other, they sit down. Eventually, if anyone is left standing, that person is holding the majority element. Note: As long as the majority element occurs with a frequency more than n/2, we can guarantee that this approach will always find the majority element.
There are many ways to think about why this algorithm works. One good intuition is to think of this algorithm as breaking the input into a chunk of consecutive copies of a particular value.
int majorityElement(int X[], int n)
{
int count = 0
int majorityCandidate = 0
for (int i = 0; i < n; i = i + 1)
{
if (count == 0)
majorityCandidate = X[i]
if (X[i] == majorityCandidate)
count = count + 1
else
count = count - 1
}
return majorityCandidate
}
We are running a single loop n time and performing a constant operation at each step of the iteration. Time complexity = O(n). Space complexity = O(1), because we are using constant extra space.
If you have any queries or feedback, please write us at contact@enjoyalgorithms.com. Enjoy learning, Enjoy algorithms!