Skip to main content

One post tagged with "kmp"

View All Tags

LeetCode 28: String Matching using KMP Algorithm

· 3 min read
Asif Rehan
Senior Software Engineer, ML Infra @ Ford Pro

Understanding the Problem

String matching is a fundamental problem in computer science. Given two strings - a "haystack" and a "needle" - we need to find if and where the needle appears in the haystack.

Naive Approach

The naive solution uses two pointers:

  • One for the "needle" string
  • One for the "haystack" string

Starting from each character in the haystack, we compare every character in the needle with the corresponding character in the haystack. If all characters match, we've found our solution. If any character doesn't match, we move to the next character in the haystack and restart from the beginning of the needle string.

This results in an O(mn) time complexity, where:

  • m = length of haystack string
  • n = length of needle string

The KMP Algorithm Solution

We can do better using the Knuth-Morris-Pratt (KMP) algorithm. This algorithm uses a clever technique called the prefix-suffix array, which helps skip parts of the haystack string that we already know won't match based on the pattern in the needle.

Key Benefits

  • Achieves linear O(m+n) time complexity
  • Uses extra memory for the prefix array
  • Significantly faster for large strings with repeating patterns

For a visual understanding of the algorithm, check out this excellent tutorial with animations on YouTube.

Java Implementation

Here's a complete implementation of the KMP algorithm:

public int strStr(String haystack, String needle) {
if (haystack == null || needle == null) return 0;
char[] text = haystack.toCharArray();
char[] pattern = needle.toCharArray();

if (pattern.length == 0) return 0;
if (text.length == 0) return -1;

int[] prefixSuffix = prefixSuffix(pattern);
int i = 0;
int j = 0;

while (i < text.length && j < pattern.length) {
if (text[i] == text[j]) {
i++;
j++;
} else {
if (j == 0) {
i++;
} else {
j = prefixSuffix[j-1];
}
}
}

if (j == pattern.length) {
return true;
}
return false;
}

int[] prefixSuffix(char[] pattern) {
int[] pre = new int[pattern.length];
pre[0] = 0;

int j = 0;
for (int i = 1; i < pattern.length; i++) {
while (j > 0 && pattern[i] != pattern[j]) {
j = pre[j-1];
}
if (pattern[i] == pattern[j]) {
j++;
}
pre[i] = j;
}
return pre;
}

Understanding the Implementation

Main Algorithm (strStr method)

  1. Initial null checks and edge cases
  2. Convert strings to char arrays for easier manipulation
  3. Generate the prefix-suffix array for the pattern
  4. Use two pointers to traverse the text and pattern
  5. Return true if the entire pattern is found, false otherwise

Prefix-Suffix Array Generation (prefixSuffix method)

  1. Creates an array to store prefix-suffix values
  2. Uses two pointers to track matching prefixes and suffixes
  3. Builds the array incrementally based on pattern matches
  4. Returns the completed prefix-suffix array

Time and Space Complexity

  • Time Complexity: O(m + n)
    • m = length of haystack string
    • n = length of needle string
  • Space Complexity: O(n)
    • Additional space needed for the prefix-suffix array

Conclusion

The KMP algorithm provides an efficient solution to the string matching problem by utilizing pattern information to avoid unnecessary comparisons. While it requires additional space and is more complex to implement than the naive approach, its linear time complexity makes it significantly more efficient for large strings, especially when dealing with patterns that have repeating elements.