LeetCode 28: String Matching using KMP Algorithm
Understanding the Problem
String matching is a fundamental problem in computer science. Given two strings - a "haystack" and a "needle" - we need to find if and where the needle appears in the haystack.
Naive Approach
The naive solution uses two pointers:
- One for the "needle" string
- One for the "haystack" string
Starting from each character in the haystack, we compare every character in the needle with the corresponding character in the haystack. If all characters match, we've found our solution. If any character doesn't match, we move to the next character in the haystack and restart from the beginning of the needle string.
This results in an O(mn) time complexity, where:
- m = length of haystack string
- n = length of needle string
The KMP Algorithm Solution
We can do better using the Knuth-Morris-Pratt (KMP) algorithm. This algorithm uses a clever technique called the prefix-suffix array, which helps skip parts of the haystack string that we already know won't match based on the pattern in the needle.
Key Benefits
- Achieves linear O(m+n) time complexity
- Uses extra memory for the prefix array
- Significantly faster for large strings with repeating patterns
For a visual understanding of the algorithm, check out this excellent tutorial with animations on YouTube.
Java Implementation
Here's a complete implementation of the KMP algorithm:
public int strStr(String haystack, String needle) {
if (haystack == null || needle == null) return 0;
char[] text = haystack.toCharArray();
char[] pattern = needle.toCharArray();
if (pattern.length == 0) return 0;
if (text.length == 0) return -1;
int[] prefixSuffix = prefixSuffix(pattern);
int i = 0;
int j = 0;
while (i < text.length && j < pattern.length) {
if (text[i] == text[j]) {
i++;
j++;
} else {
if (j == 0) {
i++;
} else {
j = prefixSuffix[j-1];
}
}
}
if (j == pattern.length) {
return true;
}
return false;
}
int[] prefixSuffix(char[] pattern) {
int[] pre = new int[pattern.length];
pre[0] = 0;
int j = 0;
for (int i = 1; i < pattern.length; i++) {
while (j > 0 && pattern[i] != pattern[j]) {
j = pre[j-1];
}
if (pattern[i] == pattern[j]) {
j++;
}
pre[i] = j;
}
return pre;
}
Understanding the Implementation
Main Algorithm (strStr
method)
- Initial null checks and edge cases
- Convert strings to char arrays for easier manipulation
- Generate the prefix-suffix array for the pattern
- Use two pointers to traverse the text and pattern
- Return true if the entire pattern is found, false otherwise
Prefix-Suffix Array Generation (prefixSuffix
method)
- Creates an array to store prefix-suffix values
- Uses two pointers to track matching prefixes and suffixes
- Builds the array incrementally based on pattern matches
- Returns the completed prefix-suffix array
Time and Space Complexity
- Time Complexity: O(m + n)
- m = length of haystack string
- n = length of needle string
- Space Complexity: O(n)
- Additional space needed for the prefix-suffix array
Conclusion
The KMP algorithm provides an efficient solution to the string matching problem by utilizing pattern information to avoid unnecessary comparisons. While it requires additional space and is more complex to implement than the naive approach, its linear time complexity makes it significantly more efficient for large strings, especially when dealing with patterns that have repeating elements.