org.exist.storage
Class TextSearchEngine

java.lang.Object
  extended by java.util.Observable
      extended by org.exist.storage.TextSearchEngine
Direct Known Subclasses:
NativeTextEngine

public abstract class TextSearchEngine
extends Observable

This is the base class for all classes providing access to the fulltext index. The class has methods to add text and attribute nodes to the fulltext index, or to search for nodes matching selected search terms.

Author:
wolf

Field Summary
static String CONFIGURATION_STOPWORDS_ELEMENT_NAME
           
static String INDEX_NUMBERS_ATTRIBUTE
           
static String PROPERTY_INDEX_NUMBERS
           
static String PROPERTY_STEM
           
static String PROPERTY_STOPWORD_FILE
           
static String PROPERTY_STORE_TERM_FREQUENCY
           
static String PROPERTY_TOKENIZER
           
static String STEM_ATTRIBUTE
           
static String STOPWORD_FILE_ATTRIBUTE
           
static String STORE_TERM_FREQUENCY_ATTRIBUTE
           
static String TOKENIZER_ATTRIBUTE
           
 
Constructor Summary
TextSearchEngine(DBBroker broker, Configuration conf)
          Construct a new instance and configure it.
 
Method Summary
abstract  boolean close()
           
abstract  void dropIndex(Collection collection)
          Remove index entries for an entire collection.
abstract  void dropIndex(DocumentImpl doc)
          Remove all index entries for the given document.
abstract  void flush()
           
abstract  String[] getIndexTerms(DocumentSet docs, TermMatcher matcher)
           
abstract  NodeSet getNodes(XQueryContext context, DocumentSet docs, NodeSet contextSet, int axis, QName qname, TermMatcher matcher, CharSequence startTerm)
           
 NodeSet getNodesContaining(XQueryContext context, DocumentSet docs, NodeSet contextSet, int axis, QName qname, String expr, int type)
           
abstract  NodeSet getNodesContaining(XQueryContext context, DocumentSet docs, NodeSet contextSet, int axis, QName qname, String expr, int type, boolean matchAll)
          For each of the given search terms and each of the documents in the document set, return a node-set of matching nodes.
 Tokenizer getTokenizer()
          Returns the Tokenizer used for tokenizing strings into words.
 int getTrackMatches()
           
abstract  Occurrences[] scanIndexTerms(DocumentSet docs, NodeSet contextSet, QName[] qnames, String start, String end)
           
abstract  Occurrences[] scanIndexTerms(DocumentSet docs, NodeSet contextSet, String start, String end)
          Queries the fulltext index to retrieve information on indexed words contained in the index for the current collection.
 void setTrackMatches(int flags)
           
abstract  void storeText(StoredNode parent, ElementContent text, int indexingHint, FulltextIndexSpec indexSpec, boolean remove)
           
abstract  void storeText(TextImpl node, int indexingHint, FulltextIndexSpec indexSpec, boolean remove)
          Tokenize and index the given text node.
 
Methods inherited from class java.util.Observable
addObserver, countObservers, deleteObserver, deleteObservers, hasChanged, notifyObservers, notifyObservers
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

INDEX_NUMBERS_ATTRIBUTE

public static final String INDEX_NUMBERS_ATTRIBUTE
See Also:
Constant Field Values

STEM_ATTRIBUTE

public static final String STEM_ATTRIBUTE
See Also:
Constant Field Values

STORE_TERM_FREQUENCY_ATTRIBUTE

public static final String STORE_TERM_FREQUENCY_ATTRIBUTE
See Also:
Constant Field Values

TOKENIZER_ATTRIBUTE

public static final String TOKENIZER_ATTRIBUTE
See Also:
Constant Field Values

CONFIGURATION_STOPWORDS_ELEMENT_NAME

public static final String CONFIGURATION_STOPWORDS_ELEMENT_NAME
See Also:
Constant Field Values

STOPWORD_FILE_ATTRIBUTE

public static final String STOPWORD_FILE_ATTRIBUTE
See Also:
Constant Field Values

PROPERTY_INDEX_NUMBERS

public static final String PROPERTY_INDEX_NUMBERS
See Also:
Constant Field Values

PROPERTY_STEM

public static final String PROPERTY_STEM
See Also:
Constant Field Values

PROPERTY_STORE_TERM_FREQUENCY

public static final String PROPERTY_STORE_TERM_FREQUENCY
See Also:
Constant Field Values

PROPERTY_TOKENIZER

public static final String PROPERTY_TOKENIZER
See Also:
Constant Field Values

PROPERTY_STOPWORD_FILE

public static final String PROPERTY_STOPWORD_FILE
See Also:
Constant Field Values
Constructor Detail

TextSearchEngine

public TextSearchEngine(DBBroker broker,
                        Configuration conf)
Construct a new instance and configure it.

Parameters:
broker -
conf -
Method Detail

getTokenizer

public Tokenizer getTokenizer()
Returns the Tokenizer used for tokenizing strings into words.

Returns:
tokenizer

storeText

public abstract void storeText(TextImpl node,
                               int indexingHint,
                               FulltextIndexSpec indexSpec,
                               boolean remove)
Tokenize and index the given text node.

Parameters:
indexSpec -
node -

storeText

public abstract void storeText(StoredNode parent,
                               ElementContent text,
                               int indexingHint,
                               FulltextIndexSpec indexSpec,
                               boolean remove)

flush

public abstract void flush()

close

public abstract boolean close()
                       throws DBException
Throws:
DBException

getTrackMatches

public int getTrackMatches()

setTrackMatches

public void setTrackMatches(int flags)

getNodesContaining

public NodeSet getNodesContaining(XQueryContext context,
                                  DocumentSet docs,
                                  NodeSet contextSet,
                                  int axis,
                                  QName qname,
                                  String expr,
                                  int type)
                           throws TerminatedException
Throws:
TerminatedException

getNodesContaining

public abstract NodeSet getNodesContaining(XQueryContext context,
                                           DocumentSet docs,
                                           NodeSet contextSet,
                                           int axis,
                                           QName qname,
                                           String expr,
                                           int type,
                                           boolean matchAll)
                                    throws TerminatedException
For each of the given search terms and each of the documents in the document set, return a node-set of matching nodes. The type-argument indicates if search terms should be compared using a regular expression. Valid values are DBBroker.MATCH_EXACT or DBBroker.MATCH_REGEXP.

Throws:
TerminatedException

getNodes

public abstract NodeSet getNodes(XQueryContext context,
                                 DocumentSet docs,
                                 NodeSet contextSet,
                                 int axis,
                                 QName qname,
                                 TermMatcher matcher,
                                 CharSequence startTerm)
                          throws TerminatedException
Throws:
TerminatedException

scanIndexTerms

public abstract Occurrences[] scanIndexTerms(DocumentSet docs,
                                             NodeSet contextSet,
                                             String start,
                                             String end)
                                      throws PermissionDeniedException
Queries the fulltext index to retrieve information on indexed words contained in the index for the current collection. Returns a list of Occurrences for all words contained in the index. If param end is null, all words starting with the string sequence param start are returned. Otherwise, the method returns all words that come after start and before end in lexical order.

Throws:
PermissionDeniedException

scanIndexTerms

public abstract Occurrences[] scanIndexTerms(DocumentSet docs,
                                             NodeSet contextSet,
                                             QName[] qnames,
                                             String start,
                                             String end)
                                      throws PermissionDeniedException
Throws:
PermissionDeniedException

getIndexTerms

public abstract String[] getIndexTerms(DocumentSet docs,
                                       TermMatcher matcher)

dropIndex

public abstract void dropIndex(Collection collection)
Remove index entries for an entire collection.

Parameters:
collection -

dropIndex

public abstract void dropIndex(DocumentImpl doc)
Remove all index entries for the given document.

Parameters:
doc -


Copyright (C) Wolfgang Meier. All rights reserved.