org.apache.lucene.analysis
Class StopFilter

java.lang.Object
  extended by org.apache.lucene.analysis.TokenStream
      extended by org.apache.lucene.analysis.TokenFilter
          extended by org.apache.lucene.analysis.StopFilter

public final class StopFilter
extends TokenFilter

Removes stop words from a token stream.


Field Summary
 
Fields inherited from class org.apache.lucene.analysis.TokenFilter
input
 
Constructor Summary
StopFilter(TokenStream in, Set stopWords)
          Constructs a filter which removes words from the input TokenStream that are named in the Set.
StopFilter(TokenStream input, Set stopWords, boolean ignoreCase)
          Construct a token stream filtering the given input.
StopFilter(TokenStream input, String[] stopWords)
          Construct a token stream filtering the given input.
StopFilter(TokenStream in, String[] stopWords, boolean ignoreCase)
          Constructs a filter which removes words from the input TokenStream that are named in the array of words.
 
Method Summary
static Set makeStopSet(String[] stopWords)
          Builds a Set from an array of stop words, appropriate for passing into the StopFilter constructor.
static Set makeStopSet(String[] stopWords, boolean ignoreCase)
           
 Token next()
          Returns the next input Token whose termText() is not a stop word.
 
Methods inherited from class org.apache.lucene.analysis.TokenFilter
close
 
Methods inherited from class org.apache.lucene.analysis.TokenStream
reset
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

StopFilter

public StopFilter(TokenStream input,
                  String[] stopWords)
Construct a token stream filtering the given input.


StopFilter

public StopFilter(TokenStream in,
                  String[] stopWords,
                  boolean ignoreCase)
Constructs a filter which removes words from the input TokenStream that are named in the array of words.


StopFilter

public StopFilter(TokenStream input,
                  Set stopWords,
                  boolean ignoreCase)
Construct a token stream filtering the given input.

Parameters:
input -
stopWords - The set of Stop Words, as Strings. If ignoreCase is true, all strings should be lower cased
ignoreCase - -Ignore case when stopping. The stopWords set must be setup to contain only lower case words

StopFilter

public StopFilter(TokenStream in,
                  Set stopWords)
Constructs a filter which removes words from the input TokenStream that are named in the Set. It is crucial that an efficient Set implementation is used for maximum performance.

See Also:
makeStopSet(java.lang.String[])
Method Detail

makeStopSet

public static final Set makeStopSet(String[] stopWords)
Builds a Set from an array of stop words, appropriate for passing into the StopFilter constructor. This permits this stopWords construction to be cached once when an Analyzer is constructed.

See Also:
passing false to ignoreCase

makeStopSet

public static final Set makeStopSet(String[] stopWords,
                                    boolean ignoreCase)
Parameters:
stopWords -
ignoreCase - If true, all words are lower cased first.
Returns:
a Set containing the words

next

public final Token next()
                 throws IOException
Returns the next input Token whose termText() is not a stop word.

Specified by:
next in class TokenStream
Throws:
IOException


Copyright © 2000-2008 Apache Software Foundation. All Rights Reserved.