Longest Substring Without Repeating Characters

Difficulty: Medium, Asked-in: Microsoft, Amazon, Morgan Stanley.

Key takeaway: An excellent problem to learn problem solving using sliding window approach. We can solve a lot of problems using a similar idea.

Let’s understand the problem

You are given a string of size n, write a program to find the length of the longest substring without repeating characters. Here substring is a continuous subpart of a string.

  • Our goal is to find the longest substring that has all unique characters.
  • There can be more than one longest substring without repeating characters of equal length. So we only need to return the length of the longest substring.

Example 1

Input: str = "abcbbcab", Output: 3

Explanation: There are three substrings of unique characters with the longest length of 3: "abc", "bca", and "cab". So the output is 3.

Example 2

Input: str = "bbbbb", Output: 1

Explanation: There is only one unique character, so the output is 1.

Example 3

Input: str = "enjoyalgorithms", Output: 11

Explanation: The only repeating character is "o", and the answer is "yalgorithms", which has a length of 11.

Discussed solution approaches

  • Brute force using nested loops
  • Optimized brute force solution
  • Sliding window approach

Brute force approach  nested loops

Solution idea

One basic idea is to explore all substrings, and for each substring, we check whether it contains unique characters or not. For a string with n characters, there will be n*(n + 1)/2 substrings.

Step 1: We initialize a variable maxLength to track longest substring with unique characters.

Step 2: Now, we explore all substrings using two nested loops: an outer loop from i = 0 to n - 1, and an inner loop from j = i to n - 1. The idea is to pick a starting character via the outer loop and explore all substrings starting from that character using the inner loop.

  • For each substring str[i, j], we use function isUnique(str, i, j) to check if all characters in the substring are unique or not. It will return true if all characters are unique. Otherwise, it will return false.
  • If function isUnique(str, i, j) returns true, we compare maxLength with the length of the substring. If maxLength < j - i + 1, we update the variable maxLength with the value j - i + 1. Note: The length of the substring from i to j is j - i + 1.

Step 3: By the end of the nested loops, we return the value stored in maxLength.

Implementation steps of isUnique(str, i, j)

  • To check whether a substring has duplicate characters or not, we use a set visited[256] to store the visited status of each possible character in the substring.
  • We go through each character from index i to j using a loop. For each character, we check if it is already present in the set visited[] or not. If it is present, we return false. Otherwise, we update the status of that character as true in the visited[].
  • By the end of the loop, we return true, i.e., the characters are unique in the substring.

Solution pseudocode

bool isUnique(string str, int i, int j)
{
    bool visited[256] = {false}
    for (int k = i; k <= j; k =  k + 1)
    {
        if (visited[str[k]] == true)
            return false
        
        visited[str[k]] = true
    }
    return true
}

int longestSubstring(string str, int n)
{
    int maxLength = 0
    for (int i = 0; i < n; i = i + 1)
    {
        for (int j = i; j < n; j = j + 1)
        {
            if (isUnique(str, i, j) == true)
                maxLength = max(maxLength, j - i + 1)
        }
    }
    return maxLength
}

Time and space complexity analysis

We are exploring all n(n + 1)/2 substrings using nested loops. For each substring, function isUnique(str, i, j) takes O(j - i + 1) time to check whether the substring str[i, j] has unique characters or not.

Time complexity = n(n + 1)/2 * O( j - i + 1) = O(n^2) * O( j - i + 1). For the upper bound of the worst-case analysis, we consider O(j - i +1) as O(n). So, overall time complexity in the worst case = O(n^2) * O(n) = O(n^3). We are using a few extra variables and a constant-size set visited[]. So space complexity is O(1).

Optimized brute force solution

Solution idea

Now, a critical question is: How can we optimize the time complexity of the brute-force approach? In the brute-force, we repeatedly check a substring starting from the character str[i] to see if it has duplicate characters. In other words, we use O(n) time for every substring. Can we reduce this computation?

Here’s an insight: If we know that the substring str[i, j-1] has no duplicate characters, then for the substring str[i, j], we only need to check if str[j] is already present in the substring str[i, j-1]. To do this, we can use a visited array to check whether the character str[j] is in the substring str[i, j-1], which can be done in O(1) time.

Solution steps

Step 1: Initialize the variable maxLength to keep track of the longest substring.

Step 2: Now we run an outer loop from i = 0 to n - 1 to explore the substring starting from the character str[i]. Inside the loop, we use visited[256] to store the status of characters in the current substring.

Step 3: Now, we run an inner loop from j = i to n - 1 to find the longest substring with unique characters starting from the character str[i]. At each iteration of the inner loop, we check if visited[str[j]] is false. If it is, we mark the status of str[j] in visited[] and update the maxLength. In other words, we are sliding the current window one position to the right by incrementing j until str[j] is present in the substring [i, j - 1].

Step 4: The inner loop will stop when the character str[j] is already present in the current window. Then, we move to the next iteration of the outer loop to check the new window of the substring starting from index i + 1.

Step 5: At the end of the outer loop, we return the value stored in the variable maxLength.

Solution pseudocode

int longestSubstring(string str, int n) 
{
    int maxLength = 0
    
    for (int i = 0; i < n; i++) 
    {
        bool visited[256] = {false}
        
        for (int j = i; j < n; j++) 
        {
            if (visited[str[j]] == false) 
            {
                maxLength = max(maxLength, j - i + 1)
                visited[str[j]] = true
            } 
            else
                break  // Stop further checking if a duplicate character is found
        }
    }
    
    return maxLength
}

Time and space complexity analysis

We are using nested loops and performing an O(1) operation at each iteration. The worst case occurs when the inner loop runs until the end every time. In other words, the worst case occurs when all characters in the string are unique. In such a situation, a nested loop will explore all possible substrings i.e. n(n + 1)/2. Time complexity = n(n + 1)/2 * O(1) = O(n^2). We are using a constant size set visited[] of size 256, so space complexity = O(1).

Sliding window solution

Solution idea

Now, the critical questions are: Can we optimize the above approach further and solve this problem in O(n) time? Can we eliminate the inner loop? Here is an insight from the previous approach!

In the ith iteration of the outer loop, we run the inner loop from j = i to find the longest substring with unique characters starting from index i. Now in the (i+1)th iteration, we ignore the value of j from the previous iteration and reset it to j = i+1 to run the inner loop. But there’s no need to do this! If the characters in the substring [i, j] are unique, then the characters in the substring [i+1, j] will also be unique. So, in the next iteration of the outer loop, there’s no need to start from j = i+1.

Note: Sliding window is an excellent technique for solving array or string problems. A window is a range of elements defined by the start and end indices, where the window "slides" forward by incrementing either the left or right end.

Solution steps

  • We scan the string from left to right using two window pointers, i and j. We also keep track of the longest length of the substring using the variable maxLength and maintain a visited[256] table to keep track of visited characters. Note: All values in the visited[] table are initially false.
  • If visited[str[j]] is false, we set visited[str[j]] = true and slide the current window to the right by incrementing the pointer j. We also update the longest length of the substring with unique characters, i.e., maxLength = max(maxLength, j - i).
  • If visited[str[j]] is true, we have found a repeated character in the current substring starting from index i. So we set visited[str[i]] = false and slide the window to the right by incrementing the pointer i.
  • This process will end when either pointer reaches the end of the string, i.e., i = n or j = n. At this stage, we return the value stored in the variable maxLength.

Solution pseudocode

int longestSubstring(string str, int n)
{
    bool visited[256] = {false}
    int maxLength = 0
    
    int i = 0, j = 0
    while (i < n && j < n)
    {
        if (visited[str[j]] == false)
        {
            visited[str[j]] = true
            j = j + 1
            maxLength = max(maxLength, j - i)
        }
        else
        {
            visited[str[i]] = false
            i = i + 1
        }
    }
    
    return maxLength
}

Time and space complexity analysis

At each iteration of the while loop, we either increment pointer i or pointer j. So, the loop will run n times and perform a constant number of operations at each iteration. The time complexity is O(n). We are using a constant-size set visited[] of size 256, so space complexity is O(1).

Critical ideas to think!

  • Can we optimize the efficient approach further?
  • Instead of using the visited[256] table, can we use a hash table to store the status of each character in a substring window?
  • Can we think of solving the problem using some other approach?
  • How can we modify the above approaches to print all the longest substrings in lexicographic order?
  • Explore patterns of problem-solving using the sliding window technique.

Suggested coding problems to practice

If you have any queries/doubts/feedback, please write us at contact@enjoyalgorithms.com. Enjoy learning, Enjoy algorithms!

More from EnjoyAlgorithms

Self-paced Courses and Blogs