|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
public interface DocumentIterator
An iterator over documents (pointers) and their intervals.
Each call to nextDocument()
will return a document pointer, or -1 if no more documents are available. Just
after the call to nextDocument()
, intervalIterator(Index)
will return an interval iterator
enumerating intervals in the last returned document for the specified index. The latter method may return, as a special result, a
special TRUE
value: this means that
albeit the current document satisfies the query, there is only a generic
empty witness to prove it (see TRUE
for some elaboration).
Note that this class implements IntIterator
. Nonetheless, for performance reasons,
the preferred access to the document pointers is nextDocument()
.
The iterator()
method must be an alias for intervalIterator()
, and shares
the same limitations.
A document iterator is usually structured as composite,
with operators as internal nodes and IndexIterator
s
as leaves. The methods accept(DocumentIteratorVisitor)
and acceptOnTruePaths(DocumentIteratorVisitor)
implement the visitor pattern.
The dispose()
method is intended to recursively release all resources associated
to a composite document iterator. Note that this is not always what you want, as you might
be, say, pooling index readers to reduce the number
of file open/close operations. For this reason, we intentionally avoid calling the method “close”.
Warning: the interval enumeration can be carried out only just after a call
to nextDocument()
. Subsequent calls to nextDocument()
or even to Iterator.hasNext()
will reset the internal state of the iterator. In particular, trying to enumerate intervals after a call
to Iterator.hasNext()
will usually throw an IllegalStateException
.
Method Summary | ||
---|---|---|
|
accept(DocumentIteratorVisitor<T> visitor)
Accepts a visitor. |
|
|
acceptOnTruePaths(DocumentIteratorVisitor<T> visitor)
Accepts a visitor after a call to nextDocument() ,
limiting recursion to true paths. |
|
void |
dispose()
Disposes this document iterator, releasing all resources. |
|
int |
document()
Returns the last document returned by nextDocument() . |
|
ReferenceSet<Index> |
indices()
Returns the set of indices over which this iterator is built. |
|
IntervalIterator |
intervalIterator()
Returns the interval iterator of this document iterator for single-index queries. |
|
IntervalIterator |
intervalIterator(Index index)
Returns the interval iterator of this document iterator for the given index. |
|
Reference2ReferenceMap<Index,IntervalIterator> |
intervalIterators()
Returns an unmodifiable map from indices to interval iterators. |
|
IntervalIterator |
iterator()
An alias for intervalIterator() , that has the same limitations (i.e., it will work only if
there is just one index), and that catches IOException s. |
|
int |
nextDocument()
Returns the next document provided by this document iterator, or -1 if no more documents are available. |
|
int |
nextInt()
Deprecated. As of MG4J 1.2, the suggested way of iterating over document iterators is nextDocument() , which has been modified so to provide fully lazy
iteration. After a couple of releases, however, this annotation will be removed, as it
is very practical to have document iterators implementing IntIterator . Its
main purpose is to warn people about performance issues solved by nextDocument() . |
|
int |
skipTo(int n)
Skips all documents smaller than n . |
|
double |
weight()
Returns the weight associated to this iterator. |
|
DocumentIterator |
weight(double weight)
Sets the weight of this index iterator. |
Methods inherited from interface it.unimi.dsi.fastutil.ints.IntIterator |
---|
skip |
Methods inherited from interface java.util.Iterator |
---|
hasNext, next, remove |
Method Detail |
---|
IntervalIterator intervalIterator() throws IOException
This is a commodity method that can be used only for queries built over a single index.
IllegalStateException
- if this document iterator is not built on a single index.
IOException
intervalIterator(Index)
IntervalIterator intervalIterator(Index index) throws IOException
After a call to nextDocument()
, this iterator
can be used to retrieve the intervals in the current document (the
one returned by nextDocument()
) for
the index index
.
Note that if all indices have positions, it is guaranteed that at least one index will return an interval. However, for disjunctive queries it cannot be guaranteed that all indices will return an interval.
Indices without positions always return IntervalIterators.TRUE
.
Thus, in presence of indices without positions it is possible that no
intervals at all are available.
index
- an index (must be one over which the query was built).
index
.
IOException
Reference2ReferenceMap<Index,IntervalIterator> intervalIterators() throws IOException
After a call to nextDocument()
, this map
can be used to retrieve the intervals in the current document. An invocation of Map.get(java.lang.Object)
on this map with argument index
yields the same result as
intervalIterator(index)
.
UnsupportedOperationException
- if this index does not contain positions.
IOException
intervalIterator(Index)
ReferenceSet<Index> indices()
@Deprecated int nextInt()
nextDocument()
, which has been modified so to provide fully lazy
iteration. After a couple of releases, however, this annotation will be removed, as it
is very practical to have document iterators implementing IntIterator
. Its
main purpose is to warn people about performance issues solved by nextDocument()
.
nextInt
in interface IntIterator
nextDocument()
int nextDocument() throws IOException
Warning: the specification of this method has significantly changed as of MG4J 1.2.
The special return value -1 is used to mark the end of iteration (a NoSuchElementException
would have been thrown before in that case, so ho harm should be caused by this change). The reason
for this change is providing fully lazy iteration over documents. Fully lazy iteration
does not provide an hasNext()
method—you have to actually ask for the next
element and check the return value. Fully lazy iteration is much lighter on method calls (half) and
in most (if not all) MG4J classes leads to a much simpler logic. Moreover, nextDocument()
can be specified as throwing an IOException
, which avoids the pernicious proliferation
of try/catch blocks in very short, low-level methods (it was having a detectable impact on performance).
IOException
int document()
nextDocument()
.
nextDocument()
, or -1 if no document has been returned yet.int skipTo(int n) throws IOException
n
.
Define the current document k
associated with this document iterator
as follows:
nextDocument()
and this method have never been called;
Integer.MAX_VALUE
, if a call to this method returned Integer.MAX_VALUE
;
nextDocument()
or this method, otherwise.
If k
is larger than or equal to n
, then
this method does nothing and returns k
. Otherwise, a
call to this method is equivalent to
while( ( k = nextDocument() ) < n && k != -1 ); return k == -1 ? Integer.MAX_VALUE : k;
Thus, when a result k
≠ Integer.MAX_VALUE
is returned, the state of this iterator
will be exactly the same as after a call to nextDocument()
that returned k
.
In particular, the first document larger than or equal to n
(when returned
by this method) will not be returned by the next call to
nextDocument()
.
n
- a document pointer.
n
if available, Integer.MAX_VALUE
otherwise.
IOException
<T> T accept(DocumentIteratorVisitor<T> visitor) throws IOException
A document iterator is usually structured as composite,
with operators as internal nodes and IndexIterator
s
as leaves. This method implements the visitor pattern.
visitor
- the visitor.
null
if the visit was interrupted.
IOException
<T> T acceptOnTruePaths(DocumentIteratorVisitor<T> visitor) throws IOException
nextDocument()
,
limiting recursion to true paths.
After a call to nextDocument()
, a document iterator
is positioned over a document. This call is equivalent to accept(DocumentIteratorVisitor)
,
but visits only along true paths.
We define a true path as a path from the root of the composite that passes only through
nodes whose associated subtree is positioned on the same document of the root. Note that OrDocumentIterator
s
detach exhausted iterators from the composite tree, so true paths define the subtree that is causing
the current document to satisfy the query represented by this document iterator.
For more elaboration, and the main application of this method, see CounterCollectionVisitor
.
visitor
- the visitor.
null
if the visit was interrupted.
IOException
accept(DocumentIteratorVisitor)
,
CounterCollectionVisitor
double weight()
The number returned by this method has no fixed semantics: different scorers might choose different interpretations, or even ignore it.
DocumentIterator weight(double weight)
weight
- the weight of this index iterator.
void dispose() throws IOException
This method should propagate down to the underlying index iterators, where it should release resources such as open files and network connections. If you're doing your own resource tracking and pooling, then you do not need to call this method.
IOException
IntervalIterator iterator()
intervalIterator()
, that has the same limitations (i.e., it will work only if
there is just one index), and that catches IOException
s.
iterator
in interface Iterable<Interval>
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |