it.unimi.dsi.mg4j.document
Interface DocumentSequence

All Superinterfaces:
Closeable
All Known Subinterfaces:
DocumentCollection
All Known Implementing Classes:
AbstractDocumentCollection, AbstractDocumentSequence, CompositeDocumentSequence, CSVDocumentCollection, FileSetDocumentCollection, InputStreamDocumentSequence, JavamailDocumentCollection, JdbcDocumentCollection, TRECDocumentCollection, WikipediaDocumentCollection, ZipDocumentCollection

public interface DocumentSequence
extends Closeable

A sequence of documents.

This is the most basic class available in MG4J for representing a sequence to documents to be indexed. Its only duty is to be able to return once an iterator over the documents in sequence.

The iterator returned by iterator() must always return the same documents in the same order, given the same external conditions (standard input, file system, etc.).

Document sequences must always return documents of the same type. This is usually accomplished by providing at construction time a DocumentFactory that will be used to build and parse documents. Of course, it is possible to create document sequences with a hardwired factory (see, e.g., ZipDocumentCollection).


Method Summary
 void close()
          Closes this document sequence, releasing all resources.
 DocumentFactory factory()
          Returns the factory used by this sequence.
 DocumentIterator iterator()
          Returns an iterator over the sequence of documents.
 

Method Detail

iterator

DocumentIterator iterator()
                          throws IOException
Returns an iterator over the sequence of documents.

Warning: this method can be safely called just one time. For instance, implementations based on standard input will usually throw an exception if this method is called twice.

Implementations may decide to override this restriction (in particular, if they implement DocumentCollection). Usually, however, it is not possible to obtain two iterators at the same time on a collection.

Returns:
an iterator over the sequence of documents.
Throws:
IOException
See Also:
DocumentCollection

factory

DocumentFactory factory()
Returns the factory used by this sequence.

Every document sequence is based on a document factory that transforms raw bytes into a sequence of characters. The factory contains useful information such as the number of fields.

Returns:
the factory used by this sequence.

close

void close()
           throws IOException
Closes this document sequence, releasing all resources.

You should always call this method after having finished with this document sequence. Implementations are invited to call this method in a finaliser as a safety net (even better, implement SafelyCloseable), but since there is no guarantee as to when finalisers are invoked, you should not depend on this behaviour.

Specified by:
close in interface Closeable
Throws:
IOException