it.unimi.dsi.mg4j.search
Interface IntervalIterator

All Superinterfaces:
Iterator<Interval>
All Known Implementing Classes:
AbstractCompositeDocumentIterator.AbstractCompositeIndexIntervalIterator, AbstractCompositeDocumentIterator.AbstractCompositeIntervalIterator, AbstractDocumentIterator.AbstractIntervalIterator, AbstractOrderedIntervalDocumentIterator.AbstractOrderedIndexIntervalIterator, AbstractOrderedIntervalDocumentIterator.AbstractOrderedIntervalIterator, IntervalIterators.FakeIterator, OrDocumentIterator.OrIndexIntervalIterator

public interface IntervalIterator
extends Iterator<Interval>

An iterator over intervals. Apart for the usual methods of a (type-specific) iterator, it has a special (optional) reset() method that allows one to reset the iterator: the exact meaning of this operation is decided by the implementing classes. Typically, after a reset(), one can iterate over a new sequence.

Warning: from MG4J 1.2, most methods throw an IOException (such exceptions used to be catched and wrapped into a RuntimeException).

This interface also specifies a method extent() returning a positive integer that is supposed to approximate the minimum possible length of an interval returned by this iterator. This method returns -1 if this extent cannot be computed.


Method Summary
 int extent()
          Returns an approximation of a lower bound for the length of an interval returned by this iterator.
 void intervalTerms(IntSet terms)
          Provides the set of terms that span the current interval.
 Interval next()
          Deprecated. As of MG4J 1.2, the suggested way of iterating over interval iterators is nextInterval(), which has been reintroduced with a fully lazy semantics. After a couple of releases, however, this annotation will be removed, as it is very practical to have interval iterators implementing Iterator<Interval>. Its main purpose is to warn people about performance issues solved by nextInterval().
 Interval nextInterval()
          Returns the next interval provided by this interval iterator, or null if no more intervals are available.
 void reset()
          Resets the internal state of this iterator for a new document.
 
Methods inherited from interface java.util.Iterator
hasNext, remove
 

Method Detail

reset

void reset()
           throws IOException
Resets the internal state of this iterator for a new document.

To reduce object creation, interval iterators are usually created in a lazy fashion by document iterator when they are needed. However, this implies that every time the document iterator is moved, some internal state of the interval iterator must be reset (e.g., because on the new document some of the component interval iterators are now IntervalIterators.TRUE).

Throws:
IOException

extent

int extent()
Returns an approximation of a lower bound for the length of an interval returned by this iterator.

Returns:
an approximation of a lower bound for the length of an interval.

nextInterval

Interval nextInterval()
                      throws IOException
Returns the next interval provided by this interval iterator, or null if no more intervals are available.

This method has been reintroduced in MG4J 1.2 with a different semantics. The special return value null is used to mark the end of iteration. The reason for this change is providing fully lazy iteration over intervals. Fully lazy iteration does not provide an hasNext() method—you have to actually ask for the next element and check the return value. Fully lazy iteration is much lighter on method calls (half) and in most (if not all) MG4J classes leads to a much simpler logic. Moreover, nextInterval() can be specified as throwing an IOException, which avoids the pernicious proliferation of try/catch blocks in very short, low-level methods (it was having a detectable impact on performance).

Returns:
the next interval, or null if no more intervals are available.
Throws:
IOException

next

@Deprecated
Interval next()
Deprecated. As of MG4J 1.2, the suggested way of iterating over interval iterators is nextInterval(), which has been reintroduced with a fully lazy semantics. After a couple of releases, however, this annotation will be removed, as it is very practical to have interval iterators implementing Iterator<Interval>. Its main purpose is to warn people about performance issues solved by nextInterval().

Returns the next interval.

Specified by:
next in interface Iterator<Interval>
See Also:
nextInterval()

intervalTerms

void intervalTerms(IntSet terms)
Provides the set of terms that span the current interval.

For each interval returned by MG4J, there is a set of terms that caused the interval to be returned. The terms appear inside the interval, and certainly at its extremes.

Note that the results of this method must be taken with a grain of salt: there might be different sets of terms causing the current interval, and only one will be returned.

Parameters:
terms - a set of integers that will be filled with the terms spanning the current interval.