org.apache.lucene.analysis.standard
Class StandardTokenizer
- StandardTokenizerConstants
public class StandardTokenizer
A grammar-based tokenizer constructed with JavaCC.
This should be a good tokenizer for most European-language documents.
Many applications have specific tokenizer needs. If this tokenizer does
not suit your application, please consider copying this source code
directory to your project and maintaining your own grammar-based tokenizer.
ACRONYM , ALPHA , ALPHANUM , APOSTROPHE , CJK , COMPANY , DEFAULT , DIGIT , EMAIL , EOF , HAS_DIGIT , HOST , LETTER , NOISE , NUM , P , tokenImage |
StandardTokenizer
public StandardTokenizer(Reader reader)
Constructs a tokenizer for this Reader.
StandardTokenizer
public StandardTokenizer(CharStream stream)
disable_tracing
public final void disable_tracing()
enable_tracing
public final void enable_tracing()
getNextToken
public final Token getNextToken()
getToken
public final Token getToken(int index)
Copyright © 2000-2006 Apache Software Foundation. All Rights Reserved.