|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectit.unimi.dsi.mg4j.index.AbstractBitStreamIndexWriter
it.unimi.dsi.mg4j.index.BitStreamIndexWriter
it.unimi.dsi.mg4j.index.SkipBitStreamIndexWriter
public class SkipBitStreamIndexWriter
Provides facilities to write skip inverted indices, that is, inverted indices with an additional skip structure. A skip inverted index allows one to skip ahead when reading inverted lists. More specifically, when reading the inverted list relative to a certain term, one may want to decide to skip all document records that concern documents with pointer less than a given integer. In a normal inverted index this is impossible: one would have to read all document records sequentially.
The skipping structure used by this class is new: details can be found here.
Nested Class Summary | |
---|---|
static class |
SkipBitStreamIndexWriter.TowerData
A structure maintaining statistical data about tower construction. |
Field Summary | |
---|---|
long |
bitsForEntryBitLengths
The number of bits written for entry lenghts. |
long |
bitsForQuantumBitLengths
The number of bits written for quantum lengths. |
static int |
DEFAULT_TEMP_BUFFER_SIZE
The size of the buffer for the temporary file used to build an inverted list. |
long |
numberOfBlocks
The number of written blocks. |
int |
prevEntryBitLength
An estimate on the number of bits occupied per tower entry in the last written cache, or -1 if no cache has been written for the current inverted list. |
int |
prevQuantumBitLength
An estimate on the number of bits occupied per quantum in the last written cache, or -1 if no cache has been written for the current inverted list. |
SkipBitStreamIndexWriter.TowerData |
towerData
The sum of all tower data computed so far. |
Fields inherited from class it.unimi.dsi.mg4j.index.BitStreamIndexWriter |
---|
b, BEFORE_COUNT, BEFORE_DOCUMENT_RECORD, BEFORE_FREQUENCY, BEFORE_INVERTED_LIST, BEFORE_PAYLOAD, BEFORE_POINTER, BEFORE_POSITIONS, currentDocument, FIRST_UNUSED_STATE, frequency, lastDocument, log2b, maxCount, obs, state, writtenDocuments |
Fields inherited from class it.unimi.dsi.mg4j.index.AbstractBitStreamIndexWriter |
---|
bitsForCounts, bitsForFrequencies, bitsForPayloads, bitsForPointers, bitsForPositions, countCoding, currentTerm, flags, frequencyCoding, hasCounts, hasPayloads, hasPositions, numberOfDocuments, numberOfOccurrences, numberOfPostings, pointerCoding, positionCoding |
Constructor Summary | |
---|---|
SkipBitStreamIndexWriter(CharSequence basename,
int numberOfDocuments,
boolean writeOffsets,
int tempBufferSize,
Map<CompressionFlags.Component,CompressionFlags.Coding> flags,
int q,
int h)
Creates a new skip index writer, with the specified basename. |
|
SkipBitStreamIndexWriter(CharSequence basename,
int numberOfDocuments,
boolean writeOffsets,
Map<CompressionFlags.Component,CompressionFlags.Coding> flags,
int q,
int h)
Creates a new skip index writer, with the specified basename. |
|
SkipBitStreamIndexWriter(OutputBitStream obs,
OutputBitStream offset,
int N,
int tempBufferSize,
Map<CompressionFlags.Component,CompressionFlags.Coding> flags,
int q,
int h)
Creates a new skip index writer. |
Method Summary | |
---|---|
void |
close()
Closes this index writer, completing the index creation process and releasing all resources. |
OutputBitStream |
newDocumentRecord()
Starts a new document record. |
long |
newInvertedList()
Starts a new inverted list. |
void |
printStats(PrintStream stats)
Writes to the given print stream statistical information about the index just built. |
Properties |
properties()
Returns properties of the index generated by this index writer. |
int |
writeDocumentPointer(OutputBitStream out,
int pointer)
Writes a document pointer. |
int |
writeFrequency(int frequency)
Writes the frequency. |
long |
writtenBits()
Returns the overall number of bits written onto the underlying stream(s). |
Methods inherited from class it.unimi.dsi.mg4j.index.BitStreamIndexWriter |
---|
writeDocumentPositions, writePayload, writePositionCount |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final int DEFAULT_TEMP_BUFFER_SIZE
public final SkipBitStreamIndexWriter.TowerData towerData
public long bitsForQuantumBitLengths
public long bitsForEntryBitLengths
public long numberOfBlocks
public int prevEntryBitLength
public int prevQuantumBitLength
Constructor Detail |
---|
public SkipBitStreamIndexWriter(CharSequence basename, int numberOfDocuments, boolean writeOffsets, Map<CompressionFlags.Component,CompressionFlags.Coding> flags, int q, int h) throws IOException
writeOffsets
, also an offset file will be produced (stemmed with .offsets).
The size of the internal temporary buffer will be DEFAULT_TEMP_BUFFER_SIZE
.
basename
- the basename.numberOfDocuments
- the number of documents in the collection to be indexed.writeOffsets
- if true
, the offset file will also be produced.flags
- a flag map setting the coding techniques to be used (see CompressionFlags
).q
- the cache contains at most 2h document records.h
- the maximum height of a skip tower.
IOException
public SkipBitStreamIndexWriter(CharSequence basename, int numberOfDocuments, boolean writeOffsets, int tempBufferSize, Map<CompressionFlags.Component,CompressionFlags.Coding> flags, int q, int h) throws IOException
writeOffsets
, also an offset file will be produced (stemmed with .offsets).
basename
- the basename.numberOfDocuments
- the number of documents in the collection to be indexed.writeOffsets
- if true
, the offset file will also be produced.tempBufferSize
- the size in bytes of the internal temporary buffer (inverted lists shorter than this size will never be flushed to disk).flags
- a flag map setting the coding techniques to be used (see CompressionFlags
).q
- the cache contains at most 2h document records.h
- the maximum height of a skip tower.
IOException
public SkipBitStreamIndexWriter(OutputBitStream obs, OutputBitStream offset, int N, int tempBufferSize, Map<CompressionFlags.Component,CompressionFlags.Coding> flags, int q, int h) throws IOException
obs
- the underlying output bit stream.offset
- the offset bit stream.N
- the number of documents in the collection to be indexed.tempBufferSize
- the size in bytes of the internal temporary buffer (inverted lists shorter than this size will never be flushed to disk).flags
- a flag map setting the coding techniques to be used (see CompressionFlags
).q
- the cache contains at most 2h document records.h
- the maximum height of a skip tower.
IOException
Method Detail |
---|
public long newInvertedList() throws IOException
IndexWriter
newInvertedList
in interface IndexWriter
newInvertedList
in class BitStreamIndexWriter
IOException
public int writeFrequency(int frequency) throws IOException
IndexWriter
writeFrequency
in interface IndexWriter
writeFrequency
in class BitStreamIndexWriter
frequency
- the (positive) number of document records that this inverted list will contain.
IOException
public OutputBitStream newDocumentRecord() throws IOException
IndexWriter
This method must be called exactly exactly f times, where f is the frequency specified with
IndexWriter.writeFrequency(int)
.
newDocumentRecord
in interface IndexWriter
newDocumentRecord
in class BitStreamIndexWriter
IOException
public int writeDocumentPointer(OutputBitStream out, int pointer) throws IOException
IndexWriter
This method must be called immediately after IndexWriter.newDocumentRecord()
.
writeDocumentPointer
in interface IndexWriter
writeDocumentPointer
in class BitStreamIndexWriter
out
- the output bit stream where the pointer will be written.pointer
- the document pointer.
IOException
public void close() throws IOException
IndexWriter
close
in interface IndexWriter
close
in class BitStreamIndexWriter
IOException
public long writtenBits()
IndexWriter
writtenBits
in interface IndexWriter
writtenBits
in class BitStreamIndexWriter
public Properties properties()
IndexWriter
This method should only be called after IndexWriter.close()
.
It returns a new property object
containing values for (whenever appropriate)
Index.PropertyKeys.DOCUMENTS
, Index.PropertyKeys.TERMS
,
Index.PropertyKeys.POSTINGS
, Index.PropertyKeys.MAXCOUNT
,
Index.PropertyKeys.INDEXCLASS
, Index.PropertyKeys.CODING
, Index.PropertyKeys.PAYLOADCLASS
,
BitStreamIndex.PropertyKeys.SKIPQUANTUM
, and BitStreamIndex.PropertyKeys.SKIPHEIGHT
.
properties
in interface IndexWriter
properties
in class BitStreamIndexWriter
public void printStats(PrintStream stats)
IndexWriter
IndexWriter.close()
.
printStats
in interface IndexWriter
printStats
in class AbstractBitStreamIndexWriter
stats
- a print stream where statistical information will be written.
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |