LeetCode 28: String Matching using KMP Algorithm

November 7, 2021 · 3 min read

Senior Software Engineer, ML Infra @ Ford Pro

Understanding the Problem

String matching is a fundamental problem in computer science. Given two strings - a "haystack" and a "needle" - we need to find if and where the needle appears in the haystack.

Naive Approach

The naive solution uses two pointers:

One for the "needle" string
One for the "haystack" string

Starting from each character in the haystack, we compare every character in the needle with the corresponding character in the haystack. If all characters match, we've found our solution. If any character doesn't match, we move to the next character in the haystack and restart from the beginning of the needle string.

This results in an O(mn) time complexity, where:

m = length of haystack string
n = length of needle string

The KMP Algorithm Solution

We can do better using the Knuth-Morris-Pratt (KMP) algorithm. This algorithm uses a clever technique called the prefix-suffix array, which helps skip parts of the haystack string that we already know won't match based on the pattern in the needle.

Key Benefits

Achieves linear O(m+n) time complexity
Uses extra memory for the prefix array
Significantly faster for large strings with repeating patterns

For a visual understanding of the algorithm, check out this excellent tutorial with animations on YouTube.

Java Implementation

Here's a complete implementation of the KMP algorithm:

public int strStr(String haystack, String needle) {
    if (haystack == null || needle == null) return 0;
    char[] text = haystack.toCharArray();
    char[] pattern = needle.toCharArray();

    if (pattern.length == 0) return 0;
    if (text.length == 0) return -1;

    int[] prefixSuffix = prefixSuffix(pattern);
    int i = 0;
    int j = 0;
    
    while (i < text.length && j < pattern.length) {
        if (text[i] == text[j]) {
            i++;
            j++;
        } else {
            if (j == 0) {
                i++;
            } else {
                j = prefixSuffix[j-1];
            }
        }
    }
    
    if (j == pattern.length) {
        return true;
    }
    return false;
}

int[] prefixSuffix(char[] pattern) {
    int[] pre = new int[pattern.length];
    pre[0] = 0;
    
    int j = 0;
    for (int i = 1; i < pattern.length; i++) {
        while (j > 0 && pattern[i] != pattern[j]) {
            j = pre[j-1];
        }
        if (pattern[i] == pattern[j]) {
            j++;
        }
        pre[i] = j;
    }
    return pre;
}

Understanding the Implementation

Main Algorithm (`strStr` method)

Initial null checks and edge cases
Convert strings to char arrays for easier manipulation
Generate the prefix-suffix array for the pattern
Use two pointers to traverse the text and pattern
Return true if the entire pattern is found, false otherwise

Prefix-Suffix Array Generation (`prefixSuffix` method)

Creates an array to store prefix-suffix values
Uses two pointers to track matching prefixes and suffixes
Builds the array incrementally based on pattern matches
Returns the completed prefix-suffix array

Time and Space Complexity

Time Complexity: O(m + n)
- m = length of haystack string
- n = length of needle string
Space Complexity: O(n)
- Additional space needed for the prefix-suffix array

Conclusion

The KMP algorithm provides an efficient solution to the string matching problem by utilizing pattern information to avoid unnecessary comparisons. While it requires additional space and is more complex to implement than the naive approach, its linear time complexity makes it significantly more efficient for large strings, especially when dealing with patterns that have repeating elements.

Understanding the Problem​

Naive Approach​

The KMP Algorithm Solution​

Key Benefits​

Java Implementation​

Understanding the Implementation​

Main Algorithm (strStr method)​

Prefix-Suffix Array Generation (prefixSuffix method)​

Time and Space Complexity​

Conclusion​