Find Missing Positive - Interview Problem

First Missing Positive

Difficulty: Hard

Asked in: Amazon, Google

Understanding the Problem

Problem Description: Given an array, find the smallest missing positive integer.

For example

Input: A[] = [2, 3, -7, 6, 8, 1, -10, 15]
Output: 4

Input: A[] = [3, 2, -1, 1]
Output: 4

Input: A[] = [7, 8, 9, 11, 12]
Output: 1

Possible follow-up questions to ask the interviewer:-

Are there negative integers present in the array? ( Ans : Yes)
Can we update the array? ( Ans : Yes)
Are there any repeating values in the array? ( Ans : Yes)

Brute force and Efficient Solutions

We are discussing four ways to solve this problem :

Brute force approach
Using Sorting and searching
Using a Hash Table
Using In-place Hash

1. Brute force approach

The simple solution is to search all positive integers starting from 1 ( Think! ). At worst case, we have to search for n+1 numbers.

Pseudo Code

int smallestPositive(int A[], int n)
{
    for(i = 1 to n+1) // search for 1 to n+1 elements
    {
        bool flag = false
        for(j = 0 to n-1) // iterating through the array
        {
            if(A[j] == i)
            {
                flag = true
                break
            }
        }
        if(flag == false)
            return i
    }
}

Complexity Analysis

Time Complexity: Search positive integers from 1 to n+1 linearly in the array = O(n²)

Space Complexity: O(1)

Critical ideas to think!

Can we optimize the search if the array is in a sorted order? Think about the complexities of searching?

2.Using Sorting and Searching

We can sort the array and linearly scan the array to find the smallest positive element.

Pseudo Code

int smallestPositive(int A[], int n)
{
    sort(A)
    small = 1
    for(i = 0 to n-1)
    {
        if(small > A[i]) // to handle repetition and non positive numbers
            continue
        if(small == A[i])
            small = small + 1
        if(A[i] > small)
            return small
    }
    return small
}

Complexity Analysis

Suppose we are using heap sort to solve this problem.

Time Complexity: Sorting + Linear Scan for finding smallest positive

= O(nlogn) + O(n) = O(nlogn)

Space Complexity: O(1)

Critical ideas to think!

What if all the elements that are present in the array are negative? Does the above algorithm cover this condition? Analyze this with an example.
Compare different sorting algorithms that can be used to solve the above problem.

3. Using Hash Table

We can build a hash table to store the integers present in the given array. Now we can look out for positive integers starting from 1 to n+1 in the Hash Table.

Solution Steps

Create a hash table H.
Insert the elements of the array in the hash table.
Iterate i from 1 to n+1 and look for i in H.
If i is not present in H , return i.
Else, continue the iteration.

Pseudo Code

int smallestPositive(int A[], int n)
{
    Create a hash table H
    for(i = 0 to n-1)
        H[A[i]] = true
    // search for first n+1 positive integer
    for(i = 1 to n+1)
    {
    // integer not present in the hash table
        if(i not in H)
            return i
    }
}

Complexity Analysis

Time Complexity: Building hash table + Searching elements in Hash Table = O(n) + O(n) = O(n)

Space Complexity: O(n), To store the hash table.

Critical ideas to think!

Do we need to store non-positive integers in the hash map?

4. In-place Hash

The idea is to store the occurrences of the numbers itself in the array. Since the range for missing elements for an array of size n is [1,n+1], so we can use the value of the index in the array to mark the presence of a number in the array (keeping in mind to retrieve the original elements after updating it).How should we mark the presence of an element? ( Think! )

When we come across an element k, where 1 ≤ k ≤ N, we will update the value at index (k-1) to its negative value, i.e. A[k-1] = -A[k-1]. (Why index k-1 and not k?) But this approach can still fail for some types of cases, can you guess which cases?

We need to take care of not accidentally modifying the value at the same index more than once when we encounter duplicates. Also, this approach doesn’t work if there are non-positive numbers. So we first need to segregate positive numbers from negative numbers and then apply the above discussed in-place hashing approach.

Solution Steps

Segregate the positive numbers from others, i.e. to move all negative integers to the right side of the array and return the size of the sub-array containing the positive integers (which is N here). We will use a helper function segregate() for this purpose. This function will use the two-pointer approach similar to the partition function in the Quick sort.
Now iterate through this sub-array and mark the presence of an element A[i], if abs(A[i])-1 < N .
To ensure that we don't do so more than once on the same index, we assign A[abs(A[i])-1] = - abs( A[abs(A[i])-1] ) .
Now scan through the array till N and find the first positive element and return its index i.e i+1 . If we didn't find any positive value after the loop then return N+1.

Pseudo Code

int smallestPositive(int A[], int n)
{
    N = segregate(A, n)
    for (k = 0 to N - 1)
    {
        if(abs(A[k]) - 1 < N)
            A[abs(A[k]) - 1] = - abs(A[abs(A[k]) - 1])
    }
    for(k = 0 to N - 1)
        if(A[k] > 0)
            return k + 1
    return N + 1
}


// segregates non-negative numbers to right side of array
int segregate(int A[], int n)
{
    int j = n-1, i = 0
    while( i <= j )
    {
        if(A[i] <= 0)
        {
            swap(A[i], A[j]) // swap values at A[i] and A[j]
            j = j - 1
        }
        else
            i = i + 1
    }
    return i
}

Complexity Analysis

Time Complexity: Segregate the array and then Linear Scan.

= O(n) + O(n) = O(n)

Space Complexity: O(1)

Critical ideas to think!

Why do we need to take abs(A[i]) during each operation?
Why we are not storing the occurrences of numbers greater than N?
What if we need to find the smallest missing number in an array from a given set of numbers?
Is it possible to list all the numbers that are missing in the array in the range [1,n], where n is the size of the array?