Simple Implementation of Wildcard (*) Text Matching using Java

13th May 2008


"The quick brown fox jumps over the lazy dog"

What if you had to implement a text matching logic in Java that would result in an affirmation for search strings such as "Th*ick*b*ox*over the*dog", "*fox*dog" and a negative for ones such as "*fox*brown" and "*abcd*"?

I faced a similar challenge few days back as part of another tool I was working on. I wanted a simple solution that not only worked but worked faster. Using a regular expression wasn't feasible as the search pattern was dynamic and hence, precompiling the expression for faster performance was ruled out.

I thought for a while and arrived at this solution. Hope it helps some hapless programmer like me slogging the night out in some corner of the globe.



Algorithm

Lets denote the search string/pattern as pattern and the text to be searched as text. The pattern can contain wildcards (*) in it.

  1. Split the pattern by * to get several string fragments. Lets denote these fragments cards. We shall denote each individual fragment of cards as card.
  2. For each of these cards, test for an occurrence in the text. If any of these cards is not contained within the text, the result is a no match and the iteration is ended.
  3. If an occurrence is detected, this process is repeated for other cards but the text string now excludes the contents of the previous card. This means that the text is scanned from left to right, eliminating portions of it starting from left whenever a match with a card is detected.

The following shows the states of the card, text and the result of matching as the process is repeated over the list (or array) of cards.

text : "java/lang/StringBuffer.class"
pattern : "*Str*Bu*er*class"
cards : {"Str", "Bu", "er", "class"}



Card Text Result
Str java/lang/StringBuffer.class true
Bu ingBuffer.class true
er ffer.class true
class .class true
Final Result true


Implementation

The following listing demonstrates the way this algorithm can be implemented.

    /**
     * Performs a wildcard matching for the text and pattern 
     * provided.
     * 
     * @param text the text to be tested for matches.
     * 
     * @param pattern the pattern to be matched for.
     * This can contain the wildcard character '*' (asterisk).
     * 
     * @return <tt>true</tt> if a match is found, <tt>false</tt> 
     * otherwise.
     */
    public static boolean wildCardMatch(String text, String pattern)
    {
        // Create the cards by splitting using a RegEx. If more speed 
        // is desired, a simpler character based splitting can be done.
        String [] cards = pattern.split("\\*");

        // Iterate over the cards.
        for (String card : cards)
        {
            int idx = text.indexOf(card);
            
            // Card not detected in the text.
            if(idx == -1)
            {
                return false;
            }
            
            // Move ahead, towards the right of the text.
            text = text.substring(idx + card.length());
        }
        
        return true;
    }
    

Listing 1: Java code showing the way this algorithm can be implemented.



What others say

You keep rocking maga!

Seems to be yet another master piece. Gimme sometime, i shall get back a little later about its usage and comments!:)

Cheers!

Pat Williams 7th October 2008
nice site.

priya V 13th October 2008
excellent it is!!keep bloging like this..!!!!!:-))

Adarsh R 13th October 2008
Thanks everyone :-)

shadkam 9th April 2009
u rock man ... :)

newoutlaw 27th May 2009
Is the line for(String card : cards) syntactically correct?

Adarsh R 27th May 2009
Yes, this is enhanced for loop for you

newoutlaw 28th May 2009
The line for(String card : cards) never worked for me. Gives me syntax error.

I had to do this:

for(int i=0; i<cards.length; i++)
{
    String card = cards[i];
    // the rest is the same
}

I dont know how u made that line work!

Adarsh R 29th May 2009
You need to set the Java source compliance to 1.5 for this to work. This is true for other features of Java like generics, etd.

newoutlaw 2nd Jun 2009
How can I make wildcard search so that all values are retrieved when I just put in *? Also, how can I make it work so that if I do this *ed, the search only returns everthing that ends in ed. If I do this St* the search only returns everything that starts with St, and if I do this, *bb*, the search returns everything that contains bb?

Thanks in advance.



Let me know what you think

Thank you. Please note that comments are moderated and will take some time to show up here.
Name (required)
Email (required but never published)
Website
How much is two + three? (kill spam)
Your Comments (2000 characters max) Characters left: 2000