it.unimi.dsi.mg4j.index
Class BitStreamIndexReader.BitStreamIndexReaderIndexIterator

java.lang.Object
  extended by it.unimi.dsi.fastutil.ints.AbstractIntIterator
      extended by it.unimi.dsi.mg4j.index.AbstractIndexIterator
          extended by it.unimi.dsi.mg4j.index.BitStreamIndexReader.BitStreamIndexReaderIndexIterator
All Implemented Interfaces:
IntIterator, IndexIterator, DocumentIterator, Iterable<Interval>, Iterator<Integer>
Enclosing class:
BitStreamIndexReader

protected static final class BitStreamIndexReader.BitStreamIndexReaderIndexIterator
extends AbstractIndexIterator
implements IndexIterator


Field Summary
protected  int b
          The parameter b for Golomb coding of pointers.
protected  int count
          The current count (if this index contains counts).
protected  CompressionFlags.Coding countCoding
          The cached copy of index.countCoding.
protected  int currentDocument
          The last document pointer we read from current list, -1 if we just read the frequency, Integer.MAX_VALUE if we are beyond the end of list.
protected  int currentTerm
          The current term.
protected  int frequency
          The current frequency.
protected  boolean hasCounts
          The cached copy of index.hasCounts.
protected  boolean hasPayloads
          The cached copy of index.hasPayloads.
protected  boolean hasPointers
          Whether the current terms has pointers at all (this happens when the frequency is smaller than the number of documents).
protected  boolean hasPositions
          The cached copy of index.hasPositions.
protected  boolean hasSkips
          Whether the underlying index has skips.
 int height
          The parameter h (the maximum height of a skip tower).
protected  InputBitStream ibs
          The underlying input bit stream.
protected  BitStreamIndex index
          The reference index.
protected  int log2b
          The parameter log2b for Golomb coding of pointers; it is the most significant bit of b.
protected  int numberOfDocumentRecord
          The number of the document record we are going to read inside the current inverted list.
protected  Payload payload
          The payload, in case the index of this reader has payloads, or null.
protected  CompressionFlags.Coding pointerCoding
          The cached copy of index.pointerCoding.
protected  int[] positionCache
          The cached position array.
protected  CompressionFlags.Coding positionCoding
          The cached copy of index.positionCoding.
 int quantum
          The quantum.
 int quantumDivisionShift
          The shift giving result of the division by quantum.
 int quantumModuloMask
          The bit mask giving the remainder of the division by quantum.
protected  int state
          This variable tracks the current state of the reader.
 
Fields inherited from class it.unimi.dsi.mg4j.index.AbstractIndexIterator
id, term, weight
 
Constructor Summary
BitStreamIndexReader.BitStreamIndexReaderIndexIterator(BitStreamIndexReader parent, InputBitStream ibs)
           
 
Method Summary
protected  IndexIterator advance()
           
 int count()
          Returns the count, that is, the number of occurrences of the term in the current document.
 void dispose()
          Disposes this document iterator, releasing all resources.
 int document()
          Returns the last document returned by DocumentIterator.nextDocument().
 int frequency()
          Returns the frequency, that is, the number of documents that will be returned by this iterator.
 boolean hasNext()
           
 Index index()
          Returns the index over which this iterator is built.
 ReferenceSet<Index> indices()
          Returns the set of indices over which this iterator is built.
 IntervalIterator intervalIterator()
          Returns the interval iterator of this document iterator for single-index queries.
 IntervalIterator intervalIterator(Index index)
          Returns the interval iterator of this document iterator for the given index.
 Reference2ReferenceMap<Index,IntervalIterator> intervalIterators()
          Returns an unmodifiable map from indices to interval iterators.
 int nextDocument()
          Returns the next document provided by this document iterator, or -1 if no more documents are available.
 int nextInt()
          Returns the next document.
 Payload payload()
          Returns the payload, if any, associated with the current document.
protected  void position(int term)
          Positions the index on the inverted list of a given term.
 int[] positionArray()
          Returns the positions at which the term appears in the current document in an array.
 IntIterator positions()
          Returns the positions at which the term appears in the current document.
 int positions(int[] position)
          Stores the positions at which the term appears in the current document in a given array.
 int skipTo(int p)
          Skips all documents smaller than n.
 int termNumber()
          Returns the number of the term whose inverted list is returned by this index iterator.
 String toString()
           
protected  void updatePositionCache()
          We read positions, assuming state <= BEFORE_POSITIONS
 
Methods inherited from class it.unimi.dsi.mg4j.index.AbstractIndexIterator
accept, acceptOnTruePaths, id, id, iterator, term, term, weight, weight
 
Methods inherited from class it.unimi.dsi.fastutil.ints.AbstractIntIterator
next, remove, skip
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 
Methods inherited from interface it.unimi.dsi.mg4j.index.IndexIterator
id, id, term, term, weight
 
Methods inherited from interface it.unimi.dsi.mg4j.search.DocumentIterator
accept, acceptOnTruePaths, iterator, weight
 
Methods inherited from interface it.unimi.dsi.fastutil.ints.IntIterator
skip
 
Methods inherited from interface java.util.Iterator
next, remove
 

Field Detail

index

protected final BitStreamIndex index
The reference index.


ibs

protected final InputBitStream ibs
The underlying input bit stream.


hasPositions

protected final boolean hasPositions
The cached copy of index.hasPositions.


hasCounts

protected final boolean hasCounts
The cached copy of index.hasCounts.


hasPayloads

protected final boolean hasPayloads
The cached copy of index.hasPayloads.


hasSkips

protected final boolean hasSkips
Whether the underlying index has skips.


pointerCoding

protected final CompressionFlags.Coding pointerCoding
The cached copy of index.pointerCoding.


countCoding

protected final CompressionFlags.Coding countCoding
The cached copy of index.countCoding.


positionCoding

protected final CompressionFlags.Coding positionCoding
The cached copy of index.positionCoding.


payload

protected final Payload payload
The payload, in case the index of this reader has payloads, or null.


b

protected int b
The parameter b for Golomb coding of pointers.


log2b

protected int log2b
The parameter log2b for Golomb coding of pointers; it is the most significant bit of b.


currentTerm

protected int currentTerm
The current term.


frequency

protected int frequency
The current frequency.


hasPointers

protected boolean hasPointers
Whether the current terms has pointers at all (this happens when the frequency is smaller than the number of documents).


count

protected int count
The current count (if this index contains counts).


currentDocument

protected int currentDocument
The last document pointer we read from current list, -1 if we just read the frequency, Integer.MAX_VALUE if we are beyond the end of list.


numberOfDocumentRecord

protected int numberOfDocumentRecord
The number of the document record we are going to read inside the current inverted list.


state

protected int state
This variable tracks the current state of the reader.


height

public final int height
The parameter h (the maximum height of a skip tower).


quantum

public int quantum
The quantum.


quantumModuloMask

public int quantumModuloMask
The bit mask giving the remainder of the division by quantum.


quantumDivisionShift

public int quantumDivisionShift
The shift giving result of the division by quantum.


positionCache

protected int[] positionCache
The cached position array.

Constructor Detail

BitStreamIndexReader.BitStreamIndexReaderIndexIterator

public BitStreamIndexReader.BitStreamIndexReaderIndexIterator(BitStreamIndexReader parent,
                                                              InputBitStream ibs)
Method Detail

position

protected void position(int term)
                 throws IOException
Positions the index on the inverted list of a given term.

This method can be called at any time. Note that it is always possible to call this method with argument 0, even if offsets have not been loaded.

Parameters:
term - a term.
Throws:
IOException

termNumber

public int termNumber()
Description copied from interface: IndexIterator
Returns the number of the term whose inverted list is returned by this index iterator.

Usually, the term number is automatically set by IndexReader.documents(CharSequence) or IndexReader.documents(int).

Specified by:
termNumber in interface IndexIterator
Returns:
the number of the term over which this iterator is built.
See Also:
IndexIterator.term()

advance

protected IndexIterator advance()
                         throws IOException
Throws:
IOException

index

public Index index()
Description copied from interface: IndexIterator
Returns the index over which this iterator is built.

Specified by:
index in interface IndexIterator
Returns:
the index over which this iterator is built.

frequency

public int frequency()
Description copied from interface: IndexIterator
Returns the frequency, that is, the number of documents that will be returned by this iterator.

Specified by:
frequency in interface IndexIterator
Returns:
the number of documents that will be returned by this iterator.

document

public int document()
Description copied from interface: DocumentIterator
Returns the last document returned by DocumentIterator.nextDocument().

Specified by:
document in interface DocumentIterator
Returns:
the last document returned by DocumentIterator.nextDocument(), or -1 if no document has been returned yet.

payload

public Payload payload()
                throws IOException
Description copied from interface: IndexIterator
Returns the payload, if any, associated with the current document.

Specified by:
payload in interface IndexIterator
Returns:
the payload associated with the current document.
Throws:
IOException

count

public int count()
          throws IOException
Description copied from interface: IndexIterator
Returns the count, that is, the number of occurrences of the term in the current document.

Specified by:
count in interface IndexIterator
Returns:
the count (number of occurrences) of the term in the current document.
Throws:
IOException

updatePositionCache

protected void updatePositionCache()
                            throws IOException
We read positions, assuming state <= BEFORE_POSITIONS

Throws:
IOException

positions

public IntIterator positions()
                      throws IOException
Description copied from interface: IndexIterator
Returns the positions at which the term appears in the current document.

Specified by:
positions in interface IndexIterator
Returns:
the positions of the current document in which the current term appears.
Throws:
IOException

positionArray

public int[] positionArray()
                    throws IOException
Description copied from interface: IndexIterator
Returns the positions at which the term appears in the current document in an array.

Implementations are allowed to return the same array across different calls to this method.

Specified by:
positionArray in interface IndexIterator
Returns:
an array whose first IndexIterator.count() elements contain the document positions.
Throws:
IOException

positions

public int positions(int[] position)
              throws IOException
Description copied from interface: IndexIterator
Stores the positions at which the term appears in the current document in a given array.

If the array is not large enough (i.e., it does not contain IndexIterator.count() elements), this method will return a negative number (the opposite of the count).

Specified by:
positions in interface IndexIterator
Parameters:
position - an array that will be used to store positions.
Returns:
the count; it will have the sign changed if positions cannot hold all positions.
Throws:
IOException

nextDocument

public int nextDocument()
                 throws IOException
Description copied from interface: DocumentIterator
Returns the next document provided by this document iterator, or -1 if no more documents are available.

Warning: the specification of this method has significantly changed as of MG4J 1.2. The special return value -1 is used to mark the end of iteration (a NoSuchElementException would have been thrown before in that case, so ho harm should be caused by this change). The reason for this change is providing fully lazy iteration over documents. Fully lazy iteration does not provide an hasNext() method—you have to actually ask for the next element and check the return value. Fully lazy iteration is much lighter on method calls (half) and in most (if not all) MG4J classes leads to a much simpler logic. Moreover, DocumentIterator.nextDocument() can be specified as throwing an IOException, which avoids the pernicious proliferation of try/catch blocks in very short, low-level methods (it was having a detectable impact on performance).

Specified by:
nextDocument in interface DocumentIterator
Returns:
the next document, or -1 if no more documents are available.
Throws:
IOException

skipTo

public int skipTo(int p)
           throws IOException
Description copied from interface: DocumentIterator
Skips all documents smaller than n.

Define the current document k associated with this document iterator as follows:

If k is larger than or equal to n, then this method does nothing and returns k. Otherwise, a call to this method is equivalent to

 while( ( k = nextDocument() ) < n && k != -1 );
 return k == -1 ? Integer.MAX_VALUE : k;
 

Thus, when a result kInteger.MAX_VALUE is returned, the state of this iterator will be exactly the same as after a call to DocumentIterator.nextDocument() that returned k. In particular, the first document larger than or equal to n (when returned by this method) will not be returned by the next call to DocumentIterator.nextDocument().

Specified by:
skipTo in interface DocumentIterator
Parameters:
p - a document pointer.
Returns:
a document pointer larger than or equal to n if available, Integer.MAX_VALUE otherwise.
Throws:
IOException

dispose

public void dispose()
             throws IOException
Description copied from interface: DocumentIterator
Disposes this document iterator, releasing all resources.

This method should propagate down to the underlying index iterators, where it should release resources such as open files and network connections. If you're doing your own resource tracking and pooling, then you do not need to call this method.

Specified by:
dispose in interface DocumentIterator
Throws:
IOException

hasNext

public boolean hasNext()
Specified by:
hasNext in interface Iterator<Integer>

nextInt

public int nextInt()
Description copied from interface: DocumentIterator
Returns the next document.

Specified by:
nextInt in interface IntIterator
Specified by:
nextInt in interface DocumentIterator
Overrides:
nextInt in class AbstractIntIterator
See Also:
DocumentIterator.nextDocument()

toString

public String toString()
Overrides:
toString in class Object

intervalIterators

public Reference2ReferenceMap<Index,IntervalIterator> intervalIterators()
                                                                 throws IOException
Description copied from interface: DocumentIterator
Returns an unmodifiable map from indices to interval iterators.

After a call to DocumentIterator.nextDocument(), this map can be used to retrieve the intervals in the current document. An invocation of Map.get(java.lang.Object) on this map with argument index yields the same result as intervalIterator(index).

Specified by:
intervalIterators in interface DocumentIterator
Returns:
a map from indices to interval iterators over the current document.
Throws:
IOException
See Also:
DocumentIterator.intervalIterator(Index)

intervalIterator

public IntervalIterator intervalIterator()
                                  throws IOException
Description copied from interface: DocumentIterator
Returns the interval iterator of this document iterator for single-index queries.

This is a commodity method that can be used only for queries built over a single index.

Specified by:
intervalIterator in interface DocumentIterator
Returns:
an interval iterator.
Throws:
IOException
See Also:
DocumentIterator.intervalIterator(Index)

intervalIterator

public IntervalIterator intervalIterator(Index index)
                                  throws IOException
Description copied from interface: DocumentIterator
Returns the interval iterator of this document iterator for the given index.

After a call to DocumentIterator.nextDocument(), this iterator can be used to retrieve the intervals in the current document (the one returned by DocumentIterator.nextDocument()) for the index index.

Note that if all indices have positions, it is guaranteed that at least one index will return an interval. However, for disjunctive queries it cannot be guaranteed that all indices will return an interval.

Indices without positions always return IntervalIterators.TRUE. Thus, in presence of indices without positions it is possible that no intervals at all are available.

Specified by:
intervalIterator in interface DocumentIterator
Parameters:
index - an index (must be one over which the query was built).
Returns:
an interval iterator over the current document in index.
Throws:
IOException

indices

public ReferenceSet<Index> indices()
Description copied from interface: DocumentIterator
Returns the set of indices over which this iterator is built.

Specified by:
indices in interface DocumentIterator
Returns:
the set of indices over which this iterator is built.