it.unimi.dsi.mg4j.search.visitor
Class TermCollectionVisitor

java.lang.Object
  extended by it.unimi.dsi.mg4j.search.visitor.AbstractDocumentIteratorVisitor
      extended by it.unimi.dsi.mg4j.search.visitor.TermCollectionVisitor
All Implemented Interfaces:
DocumentIteratorVisitor

public class TermCollectionVisitor
extends AbstractDocumentIteratorVisitor

A visitor collecting information about terms appearing in a DocumentIterator.

The purpose of this visitor is that of exploring before iteration the structure of a DocumentIterator to count how many terms are actually used, and set up some preliminary access data. More precisely, we count the distinct pairs index/term appearing in all leaves of nonzero frequency (the latter condition is used to skip empty iterators). For this visitor to work, all leaves of nonzero frequency must return a non-null value on a call to IndexIterator.term().

During the visit, we keep track of which index/term pair have been already seen. Each pair is assigned an distinct offset—a number between zero and the overall number of distinct pairs—which is stored into each index iterator id and is used afterwards to access quickly data about the pair. Note that duplicate index/term pairs get the same offset. The overall number of distinct pairs is returned by numberOfPairs() after a visit.

During the visit, the indices actually appearing in some nonzero-frequency leaf are gathered; they are accessible as a vector returned by indices(), and the map from positions in this vector to indices is inverted by indexMap().

The offset assigned to each pair index/term is returned by offset(Index, String). Should you need to know the terms associated to each index, they are returned by terms(Index).

The after a term collection, usually counters are set up by a visit of CounterSetupVisitor.


Constructor Summary
TermCollectionVisitor()
          Creates a new term-collection visitor.
 
Method Summary
 Reference2IntMap<Index> indexMap()
          Returns a map from indices met during term collection to their position into indices().
 Index[] indices()
          Returns the indices met during pair collection.
 int numberOfPairs()
          Returns the number of distinct index/term pair corresponding to nonzero-frequency index iterators in the last visit.
 int offset(Index index, String term)
          Returns the offset associated to a given pair index/term.
 TermCollectionVisitor prepare()
          Prepares the internal state of this visitor for a(nother) visit.
 String[] terms(Index index)
          Returns the terms associated to the given index.
 String toString()
           
 boolean visit(IndexIterator indexIterator)
          Visits a leaf.
 
Methods inherited from class it.unimi.dsi.mg4j.search.visitor.AbstractDocumentIteratorVisitor
visitPost, visitPre
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

TermCollectionVisitor

public TermCollectionVisitor()
Creates a new term-collection visitor.

Method Detail

prepare

public TermCollectionVisitor prepare()
Description copied from interface: DocumentIteratorVisitor
Prepares the internal state of this visitor for a(nother) visit.

By specification, it must be safe to call this method any number of times.

Specified by:
prepare in interface DocumentIteratorVisitor
Overrides:
prepare in class AbstractDocumentIteratorVisitor
Returns:
this visitor.

visit

public boolean visit(IndexIterator indexIterator)
              throws IOException
Description copied from interface: DocumentIteratorVisitor
Visits a leaf.

Parameters:
indexIterator - the leaf to be visited.
Returns:
true if the visit should continue.
Throws:
IOException

numberOfPairs

public int numberOfPairs()
Returns the number of distinct index/term pair corresponding to nonzero-frequency index iterators in the last visit.

Returns:
the number distinct index/term pair corresponding to nonzero-frequency index iterators.

indices

public Index[] indices()
Returns the indices met during pair collection.

Note that the returned array does not include indices only associated to index iterators of zero frequency.

Returns:
the indices met during term collection.

indexMap

public Reference2IntMap<Index> indexMap()
Returns a map from indices met during term collection to their position into indices().

Note that the returned array does not include indices only associated to index iterators of zero frequency.

Returns:
a map from indices met during term collection to their position into indices().

terms

public String[] terms(Index index)
Returns the terms associated to the given index.

Parameters:
index - an index.
Returns:
the terms associated to index, in the same order in which they appeared during the visit, skipping duplicates, if some nonzero-frequency iterator based on index was found; null otherwise.

offset

public int offset(Index index,
                  String term)
Returns the offset associated to a given pair index/term.

Parameters:
index - an index appearing in indices().
term - a term appearing in the array returned by terms(Index) with argument index.
Returns:
the offset associated to the pair index/term.

toString

public String toString()
Overrides:
toString in class Object