com.limegroup.gnutella.util
Class ApproximateMatcher

java.lang.Object
  extended bycom.limegroup.gnutella.util.ApproximateMatcher

public final class ApproximateMatcher
extends java.lang.Object

An approximate string matcher. Two strings are considered "approximately equal" if one can be transformed into the other through some series of inserts, deletes, and substitutions.

The approximate matcher has options to ignore case and whitespace. It also has switches to make it perform better by comparing strings backwards and reusing a buffer. However, these do not affect the match methods directly; they only affect the results of the process(String) method. This method is used to preprocess strings before passing to match(..). Typical use:

       String s1, s2;
       ApproximateMatcher matcher=new ApproximateMatcher();
       matcher.setIgnoreCase(true);
       matcher.setCompareBackwards(true);
       String s1p=matcher.process(s1);         //pre-process s1
       String s2p=matcher.process(s2);         //pre-process s2
       int matches=matcher.match(s1p, s2p);    //compare processed strings
       ...
 
The reason for this design is to reduce the pre-processing overhead when a string is matched against many other strings. Preprocessing really is required to support the ignoreWhitespace option; it is simply not possible to do the k-difference dynamic programming algorithm effienctly in one pass. Note that this class is not thread-safe if the buffering constructor is used.


Constructor Summary
ApproximateMatcher()
           
ApproximateMatcher(int size)
          Like ApproximateMatcher() except that the new matcher can compare strings of the given size without any significant allocations.
 
Method Summary
 int match(java.lang.String s1, java.lang.String s2)
           
 boolean matches(java.lang.String s1, java.lang.String s2, float precision)
          Returns true if s1 can be transformed into s2 without changing more than the given fraction of s1's letters.
 boolean matches(java.lang.String s1, java.lang.String s2, int maxOps)
          Returns true if the edit distance between s1 and s2 is less than or equal to maxOps.
 java.lang.String process(java.lang.String s)
          Returns a version of s suitable for passing to match(..).
 void setCompareBackwards(boolean compareBackwards)
           
 void setIgnoreCase(boolean ignoreCase)
           
 void setIgnoreWhitespace(boolean ignoreWhitespace)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

ApproximateMatcher

public ApproximateMatcher()

ApproximateMatcher

public ApproximateMatcher(int size)
Like ApproximateMatcher() except that the new matcher can compare strings of the given size without any significant allocations. This is a useful optimization if you need to make many comparisons with one matcher. The matcher will still be able to compare larger strings, but it will require an allocation. The buffer is not released until this is garbage collected. This method breaks thread safety; only one match(..) call can be done at a time with a matcher created by this constructor.

Method Detail

setIgnoreCase

public void setIgnoreCase(boolean ignoreCase)

setIgnoreWhitespace

public void setIgnoreWhitespace(boolean ignoreWhitespace)

setCompareBackwards

public void setCompareBackwards(boolean compareBackwards)

process

public java.lang.String process(java.lang.String s)
Returns a version of s suitable for passing to match(..). This means that s could be stripped of whitespace, lower-cased, or reversed depending on the calls to setIgnoreWhitespace, setIgnoreWhitespace, and setCompareBackwards. The returned value may be == to s.


match

public final int match(java.lang.String s1,
                       java.lang.String s2)

matches

public final boolean matches(java.lang.String s1,
                             java.lang.String s2,
                             int maxOps)
Returns true if the edit distance between s1 and s2 is less than or equal to maxOps. That is, returns true if s1 can be transformed into s2 through no more than maxOps insertions, deletions, or replacements. This method is generally more efficient than match(..) if you only care whether two strings approximately match.

If you want to ignore case or whitespace, or compare backwards, s1 and s2 should be the return values of a call to process(..).


matches

public final boolean matches(java.lang.String s1,
                             java.lang.String s2,
                             float precision)
Returns true if s1 can be transformed into s2 without changing more than the given fraction of s1's letters. For example, matches(1.) is the same as an exact comparison, while matches(0.) always returns true as long as |s1|>=|s2|. matches(0.9) means "s1 and s2 match pretty darn closely".

If you want to ignore case or whitespace, or compare backwards, s1 and s2 should be the return values of a call to process(..).