|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectit.unimi.dsi.fastutil.ints.AbstractIntIterator
it.unimi.dsi.mg4j.search.AbstractDocumentIterator
it.unimi.dsi.mg4j.search.AbstractCompositeDocumentIterator
it.unimi.dsi.mg4j.search.AbstractIntersectionDocumentIterator
public abstract class AbstractIntersectionDocumentIterator
An abstract iterator on documents, generating the intersection of the documents returned by a number of document iterators.
To be usable, this class must be subclassed so to provide also an iterator on intervals.
Such iterators must be instantiable using getComposedIntervalIterator(Index)
.
The latter is an example of a non-static factory method, that is, a factory method which
depends on the enclosing instance. This pattern allows to easily specialise this class to iterators
that do different things, such as AndDocumentIterator
and
ConsecutiveDocumentIterator
, but that have a similar semantics at the document
level (the semantics may in fact be slightly different: for instance, not all document belonging
to all components will actually appear in a consecutive iterator, as there may be documents filtered
at the interval level).
The important invariant is that only after a call to nextDocument()
, a call
to intervalIterator(Index)
will return an interval iterator over the document
just returned, and that for at least one index in AbstractCompositeDocumentIterator.indices()
the iterator will not be empty
or TRUE
.
Since MG4J 1.1, this class implements a new intersection algorithm that should be significantly faster than the previous one. The main idea is that of letting sparser iterator interact as much as possible to obtain a candidate common document, and then trying to align the others. At construction time, the component iterators are sorted so that index iterators are separated, and sorted by frequency. Then, each time we have to align the iterators we align them greedily starting from the index iterators, in frequency order. This has the effect of skipping very quickly (and usually by large jumps, which are handled nicely by indices with skips), as the main interaction happens between low-frequency index iterators.
Moreover, this class treats in a special way
index iterators coming from payload-based indices. Such
iterators are checked at the end of the alignment process,
after all standard index iterators (and general document iterators)
are aligned. At that point, the special method PayloadPredicateDocumentIterator.skipUnconditionallyTo(int)
is used to position unconditionally such iterators and check whether the payload predicate is satisfied.
If this doesn't happen, the current candidate (obtained by alignment of standard iterators) is increased and the
whole process is restarted. This procedure guarantees that we will never search exhaustively in a
payload-based index a document record satisfying the predicate (unless, of course, we have a query
containing just PayloadPredicateDocumentIterator
s), which is very efficient if the payload-based
index uses skipping.
Nested Class Summary |
---|
Nested classes/interfaces inherited from class it.unimi.dsi.mg4j.search.AbstractCompositeDocumentIterator |
---|
AbstractCompositeDocumentIterator.AbstractCompositeIndexIntervalIterator, AbstractCompositeDocumentIterator.AbstractCompositeIntervalIterator |
Nested classes/interfaces inherited from class it.unimi.dsi.mg4j.search.AbstractDocumentIterator |
---|
AbstractDocumentIterator.AbstractIntervalIterator |
Field Summary | |
---|---|
protected Reference2ReferenceArrayMap<Index,IntervalIterator> |
currentIterators
A map from indices to the iterators returned for the current document. |
protected Reference2ReferenceArrayMap<Index,IntervalIterator> |
intervalIterators
A map from indices to interval iterators. |
protected Reference2ReferenceMap<Index,IntervalIterator> |
unmodifiableCurrentIterators
An unmodifiable wrapper around currentIterators . |
Fields inherited from class it.unimi.dsi.mg4j.search.AbstractCompositeDocumentIterator |
---|
documentIterator, indexIterator, indices, n, soleIndex |
Fields inherited from class it.unimi.dsi.mg4j.search.AbstractDocumentIterator |
---|
last, next, weight |
Constructor Summary | |
---|---|
protected |
AbstractIntersectionDocumentIterator(DocumentIterator[] documentIterator)
Creates a new intersection iterator using a given array of iterators. |
protected |
AbstractIntersectionDocumentIterator(Index index,
DocumentIterator[] documentIterator)
Creates a new intersection iterator using a given array of iterators and a given index. |
Method Summary | |
---|---|
protected abstract IntervalIterator |
getComposedIntervalIterator(Index index)
|
IntervalIterator |
intervalIterator(Index index)
Returns the interval iterator of this document iterator for the given index. |
Reference2ReferenceMap<Index,IntervalIterator> |
intervalIterators()
Returns an unmodifiable map from indices to interval iterators. |
int |
nextDocument()
Returns the next document provided by this document iterator, or -1 if no more documents are available. |
int |
skipTo(int n)
Skips all documents smaller than n . |
Methods inherited from class it.unimi.dsi.mg4j.search.AbstractCompositeDocumentIterator |
---|
accept, acceptOnTruePaths, dispose, indices, intervalIterator, toString |
Methods inherited from class it.unimi.dsi.mg4j.search.AbstractDocumentIterator |
---|
document, hasNext, iterator, nextInt, weight, weight |
Methods inherited from class it.unimi.dsi.fastutil.ints.AbstractIntIterator |
---|
next, remove, skip |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Methods inherited from interface it.unimi.dsi.mg4j.search.DocumentIterator |
---|
document, iterator, nextInt, weight, weight |
Methods inherited from interface it.unimi.dsi.fastutil.ints.IntIterator |
---|
skip |
Methods inherited from interface java.util.Iterator |
---|
hasNext, next, remove |
Field Detail |
---|
protected final Reference2ReferenceArrayMap<Index,IntervalIterator> intervalIterators
protected final Reference2ReferenceArrayMap<Index,IntervalIterator> currentIterators
intervalIterators
because it could be TRUE
(in fact, in that case it may even
happen that intervalIterators
does not contain the index).
protected final Reference2ReferenceMap<Index,IntervalIterator> unmodifiableCurrentIterators
currentIterators
.
Constructor Detail |
---|
protected AbstractIntersectionDocumentIterator(Index index, DocumentIterator[] documentIterator) throws IOException
index
- an index that will be passed to AbstractCompositeDocumentIterator.AbstractCompositeDocumentIterator(Index, DocumentIterator...)
.documentIterator
- the iterators to be intersected (at least one).
IOException
protected AbstractIntersectionDocumentIterator(DocumentIterator[] documentIterator) throws IOException
documentIterator
- the iterators to be insersected (at least one).
IOException
Method Detail |
---|
public int skipTo(int n) throws IOException
DocumentIterator
n
.
Define the current document k
associated with this document iterator
as follows:
DocumentIterator.nextDocument()
and this method have never been called;
Integer.MAX_VALUE
, if a call to this method returned Integer.MAX_VALUE
;
DocumentIterator.nextDocument()
or this method, otherwise.
If k
is larger than or equal to n
, then
this method does nothing and returns k
. Otherwise, a
call to this method is equivalent to
while( ( k = nextDocument() ) < n && k != -1 ); return k == -1 ? Integer.MAX_VALUE : k;
Thus, when a result k
≠ Integer.MAX_VALUE
is returned, the state of this iterator
will be exactly the same as after a call to DocumentIterator.nextDocument()
that returned k
.
In particular, the first document larger than or equal to n
(when returned
by this method) will not be returned by the next call to
DocumentIterator.nextDocument()
.
n
- a document pointer.
n
if available, Integer.MAX_VALUE
otherwise.
IOException
public int nextDocument() throws IOException
DocumentIterator
Warning: the specification of this method has significantly changed as of MG4J 1.2.
The special return value -1 is used to mark the end of iteration (a NoSuchElementException
would have been thrown before in that case, so ho harm should be caused by this change). The reason
for this change is providing fully lazy iteration over documents. Fully lazy iteration
does not provide an hasNext()
method—you have to actually ask for the next
element and check the return value. Fully lazy iteration is much lighter on method calls (half) and
in most (if not all) MG4J classes leads to a much simpler logic. Moreover, DocumentIterator.nextDocument()
can be specified as throwing an IOException
, which avoids the pernicious proliferation
of try/catch blocks in very short, low-level methods (it was having a detectable impact on performance).
IOException
public Reference2ReferenceMap<Index,IntervalIterator> intervalIterators() throws IOException
DocumentIterator
After a call to DocumentIterator.nextDocument()
, this map
can be used to retrieve the intervals in the current document. An invocation of Map.get(java.lang.Object)
on this map with argument index
yields the same result as
intervalIterator(index)
.
IOException
DocumentIterator.intervalIterator(Index)
public IntervalIterator intervalIterator(Index index) throws IOException
DocumentIterator
After a call to DocumentIterator.nextDocument()
, this iterator
can be used to retrieve the intervals in the current document (the
one returned by DocumentIterator.nextDocument()
) for
the index index
.
Note that if all indices have positions, it is guaranteed that at least one index will return an interval. However, for disjunctive queries it cannot be guaranteed that all indices will return an interval.
Indices without positions always return IntervalIterators.TRUE
.
Thus, in presence of indices without positions it is possible that no
intervals at all are available.
index
- an index (must be one over which the query was built).
index
.
IOException
protected abstract IntervalIterator getComposedIntervalIterator(Index index)
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |