The quick brown fox jumps over the lazy dog

What if you had to implement a text matching logic in Java that would result in an affirmation for search strings such as Th*ick*b*ox*over the*dog, *fox*dog and a negative for ones such as *fox*brown and *abcd*?

I faced a similar challenge few days back as part of another tool I was working on. I wanted a simple solution that not only worked but worked faster. Using a regular expression wasn't feasible as the search pattern was dynamic and hence, precompiling the expression for faster performance was ruled out.

I thought for a while and arrived at this solution. Hope it helps some hapless programmer like me slogging the night out in some corner of the globe.

Algorithm

Lets denote the search string/pattern as pattern and the text to be searched as text. The pattern can contain wildcards (*) in it.

  • Split the pattern by * to get several string fragments. Lets denote these fragments cards. We shall denote each individual fragment of cards as card.
  • For each of these cards, test for an occurrence in the text. If any of these cards is not contained within the text, the result is a no match and the iteration is ended.
  • If an occurrence is detected, this process is repeated for other cards but the text string now excludes the contents of the previous card. This means that the text is scanned from left to right, eliminating portions of it starting from left whenever a match with a card is detected.

The following shows the states of the card, text and the result of matching as the process is repeated over the list (or array) of cards.

text : "java/lang/StringBuffer.class"

pattern : "*Str*Bu*er*class"

cards : {"Str", "Bu", "er", "class"}

Card             Text                           Result
-------------------------------------------------------
Str              java/lang/StringBuffer.class   true
Bu               ingBuffer.class                true
er               ffer.class                     true
class            .class                         true
-------------------------------------------------------
                 Final Result                   true

Implementation

The following listing demonstrates the way this algorithm can be implemented.

/**
 * Performs a wildcard matching for the text and pattern 
 * provided.
 * 
 * @param text the text to be tested for matches.
 * 
 * @param pattern the pattern to be matched for.
 * This can contain the wildcard character '*' (asterisk).
 * 
 * @return <tt>true</tt> if a match is found, <tt>false</tt> 
 * otherwise.
 */
public static boolean wildCardMatch(String text, String pattern) {
    // Create the cards by splitting using a RegEx. If more speed 
    // is desired, a simpler character based splitting can be done.
    String [] cards = pattern.split("\\*");

    // Iterate over the cards.
    for (String card : cards) {
        int idx = text.indexOf(card);

        // Card not detected in the text.
        if(idx == -1) {
            return false;
        }

        // Move ahead, towards the right of the text.
        text = text.substring(idx + card.length());
    }

    return true;
}

Listing 1: Java code showing the way this algorithm can be implemented.

Comments


comments powered by Disqus