it.unimi.dsi.mg4j.index
Interface IndexWriter

All Known Implementing Classes:
AbstractBitStreamIndexWriter, BitStreamHPIndexWriter, BitStreamIndexWriter, SkipBitStreamIndexWriter

public interface IndexWriter

An interface for classes that generate indices.

Implementations of this interface are used to write inverted lists in sequential order, as follows:

newDocumentRecord() returns an OutputBitStream that must be used to write the document-record data. Note that there is no guarantee that the returned OutputBitStream coincides with the underlying bit stream. Moreover, there is no guarantee as to when the bits will be actually written on the underlying stream, except that when starting a new inverted list, the previous inverted list, if any, will be written onto the underlying stream.

Since:
1.2
Author:
Paolo Boldi, Sebastiano Vigna

Method Summary
 void close()
          Closes this index writer, completing the index creation process and releasing all resources.
 OutputBitStream newDocumentRecord()
          Starts a new document record.
 long newInvertedList()
          Starts a new inverted list.
 void printStats(PrintStream stats)
          Writes to the given print stream statistical information about the index just built.
 Properties properties()
          Returns properties of the index generated by this index writer.
 int writeDocumentPointer(OutputBitStream out, int pointer)
          Writes a document pointer.
 int writeDocumentPositions(OutputBitStream out, int[] occ, int offset, int len, int docSize)
          Writes the positions of the occurrences of the current term in the current document to the given OutputBitStream.
 int writeFrequency(int frequency)
          Writes the frequency.
 int writePayload(OutputBitStream out, Payload payload)
          Writes the payload for the current document.
 int writePositionCount(OutputBitStream out, int count)
          Writes the count of the occurrences of the current term in the current document to the given OutputBitStream.
 long writtenBits()
          Returns the overall number of bits written onto the underlying stream(s).
 

Method Detail

newInvertedList

long newInvertedList()
                     throws IOException
Starts a new inverted list. The previous inverted list, if any, is actually written to the underlying bit stream.

Returns:
the position (in bytes) of the underlying bit stream where the new inverted list starts.
Throws:
IllegalStateException - if too few records were written for the previous inverted list.
IOException

writeFrequency

int writeFrequency(int frequency)
                   throws IOException
Writes the frequency.

Parameters:
frequency - the (positive) number of document records that this inverted list will contain.
Returns:
the number of bits written.
Throws:
IOException

newDocumentRecord

OutputBitStream newDocumentRecord()
                                  throws IOException
Starts a new document record.

This method must be called exactly exactly f times, where f is the frequency specified with writeFrequency(int).

Returns:
the output bit stream where the next document record data should be written.
Throws:
IllegalStateException - if too many records were written for the current inverted list, or if there is no current inverted list.
IOException

writeDocumentPointer

int writeDocumentPointer(OutputBitStream out,
                         int pointer)
                         throws IOException
Writes a document pointer.

This method must be called immediately after newDocumentRecord().

Parameters:
out - the output bit stream where the pointer will be written.
pointer - the document pointer.
Returns:
the number of bits written.
Throws:
IOException

writePayload

int writePayload(OutputBitStream out,
                 Payload payload)
                 throws IOException
Writes the payload for the current document.

This method must be called immediately after writeDocumentPointer(OutputBitStream, int).

Parameters:
out - the output bit stream where the payload will be written.
payload - the payload.
Returns:
the number of bits written.
Throws:
IOException

writePositionCount

int writePositionCount(OutputBitStream out,
                       int count)
                       throws IOException
Writes the count of the occurrences of the current term in the current document to the given OutputBitStream.

Parameters:
out - the output stream where the occurrences should be written.
count - the count.
Returns:
the number of bits written.
Throws:
IOException

writeDocumentPositions

int writeDocumentPositions(OutputBitStream out,
                           int[] occ,
                           int offset,
                           int len,
                           int docSize)
                           throws IOException
Writes the positions of the occurrences of the current term in the current document to the given OutputBitStream.

Parameters:
out - the output stream where the occurrences should be written.
occ - the position vector (a sequence of strictly increasing natural numbers).
offset - the first valid entry in occ.
len - the number of valid entries in occ.
docSize - the size of the current document (only for Golomb and interpolative coding; you can safely pass -1 otherwise).
Returns:
the number of bits written.
Throws:
IllegalStateException - if there is no current inverted list.
IOException

writtenBits

long writtenBits()
Returns the overall number of bits written onto the underlying stream(s).

Returns:
the number of bits written, according to the variables keeping statistical records.

properties

Properties properties()
Returns properties of the index generated by this index writer.

This method should only be called after close(). It returns a new property object containing values for (whenever appropriate) Index.PropertyKeys.DOCUMENTS, Index.PropertyKeys.TERMS, Index.PropertyKeys.POSTINGS, Index.PropertyKeys.MAXCOUNT, Index.PropertyKeys.INDEXCLASS, Index.PropertyKeys.CODING, Index.PropertyKeys.PAYLOADCLASS, BitStreamIndex.PropertyKeys.SKIPQUANTUM, and BitStreamIndex.PropertyKeys.SKIPHEIGHT.

Returns:
properties a new set of properties for the just created index.

close

void close()
           throws IOException
Closes this index writer, completing the index creation process and releasing all resources.

Throws:
IllegalStateException - if too few records were written for the last inverted list.
IOException

printStats

void printStats(PrintStream stats)
Writes to the given print stream statistical information about the index just built. This method must be called after close().

Parameters:
stats - a print stream where statistical information will be written.